The Cox regression model is clearly the most used hazards model when analyzing survival data as it provides a convenient way of summarizing covariate effects in terms of relative risks. The proportional hazards assumption may not always hold, however. A typical violation of the assumption is timechanging covariate effects which is often encountered in biomedical applications. A typical example is a treatment effect that varies with time such as treatment efficacy fading away over time due to, for example, tolerance developed by the patient. Under such circumstances, the Aalen additive hazard model (Aalen, 1980) may provide a useful alternative to the Cox model as it easily incorporates timevarying covariate effects. For Aalen’s model, the hazard function for an individual with a pdimensional covariate X vector takes the form
The Cox regression model is clearly the most used hazards model when analyzing survival data as it provides a convenient way of summarizing covariate effects in terms of relative risks. The proportional hazards assumption may not always hold, however. A typical violation of the assumption is timechanging covariate effects which is often encountered in biomedical applications. A typical example is a treatment effect that varies with time such as treatment efficacy fading away over time due to, for example, tolerance developed by the patient. Under such circumstances, the Aalen additive hazard model (Aalen, 1980) may provide a useful alternative to the Cox model as it easily incorporates timevarying covariate effects. For Aalen’s model, the hazard function for an individual with a pdimensional covariate X vector takes the form
The timevarying regression function β(t) is a vector of locally integrable functions, and usually the X will have 1 as its first component allowing for an intercept in the model. Model (3.1) is very flexible and can be seen as a first order Taylor series expansion of a general hazard function α(tX) around the zero covariate:
with X* on the line segment between 0 and X. As we shall see below, it is the cumulative regression function
that is easy to estimate and its corresponding estimator converges at the usual n ^{1}/^{2}rate. Estimation and asymptotical properties were derived by Aalen (Aalen, 1980), and further results for this model were given in Aalen (1989), Aalen (1993), and Huffer and McKeague (1991) derived efficient estimators.
The full flexibility of the model (3.1) may not always be needed and it is usually also of interest to try to simplify the model investigating whether or not a specific covariate effect really varies with time. A very useful model in this respect is the semiparametric additive hazards model of McKeague and Sasieni (1994). It assumes that the hazard function has the form
where X and Z are pdimensional and çdimensional covariate vectors, respectively. Apart from its use for testing timevarying effects, this model is useful in its own right because it is then possible to make a sensible biasvariance tradeoff, where effects that are almost constant can be summarized as such and effects that are strongly timevarying can be described as such. Note also that model (3.2) covers model (3.1) simply by leaving out the last part of model (3.2) by leaving Z empty. On the other hand, in practice, one may start out with model (3.1) and then test for a specific covariate whether its effect varies significantly with time. If this is not the case one may then simplify to model (3.2) with Z being that specific covariate, and one can then proceed in that way with some or all of the remaining covariates; the starting point, though, will now be model (3.2). Below we show how to do such inference in model (3.2). Lin and Ying (1994) considered a special case of (3.2), where only the intercept term is allowed to depend on time; this model is often referred to as the Lin and Ying model. A nice discussion of pros and cons of the additive hazards model can be found in Aalen et al. (2008).
We present estimation and inferential procedures based on model (3.2) since, as mentioned above, model (3.1) is covered by model (3.2) simply by leaving Z empty and therefore the below results also cover model (3.1).
Let T be the survival time of interest with conditional hazard function α(tX, Z) given the covariate vectors X and Z .Inpractice T may be rightcensored by U so that we observe $(\tilde{T}=T\wedge U,\text{\Delta}=I(T\le U),X,Z)$
. We assume that T and U are conditionally independent given $(X,Z)$ . Let $({T}_{i},{\text{\Delta}}_{i},{X}_{i},{Z}_{i})$ be n iid replicates from this generic model. Under the above independent rightcensoring scheme, the ith counting process ${N}_{i}(t)=I({\tilde{T}}_{i}\le t,{\text{\Delta}}_{i}=1)$ has intensitywhere ${Y}_{i}(t)=I(t\le {\tilde{T}}_{i})$
is the at risk indicator. We assume that all counting processes are observed in the timeinterval [0,τ], where τ< ∞. Each counting process has compensator ${\text{\Lambda}}_{i}(t)={\displaystyle {\int}_{0}^{t}{\text{\lambda}}_{i}}(s)ds$ such that M _{ i }(t) = N _{ i }(t) — Λ_{ i }(t) is a martingale. Define the ndimensional counting process N =(N _{1},…,N _{ n })^{ T } and the ndimensional martingale M =(M _{1},…,M _{ n })^{ T }. Let also X = (Y _{1} X _{1},...,Y _{ n } X _{ n })^{ T }, Z =(Y _{1} Z _{1},...,Y _{ n } Z _{ n })^{ τ }. The results presented below hold under some regularity conditions; see Martinussen and Scheike (2006). Writing the model in incremental formand since E{dM(t)} = 0, this suggests the following (unweighted) least squares estimator of $\{B(t)={\displaystyle {\int}_{0}^{t}\beta (s)ds,}\text{\hspace{0.17em}}\text{\gamma )}}$
McKeague and Sasieni (1994):where X ^{−} denotes the generalized inverse (X^{T} X) ^{−1} X ^{ T } and H = I — XX ^{−} assuming here that the required inverses exist. Considering the model with only nonparametric terms, model (3.1), means that Z is empty and is readily seen from the latter display that the estimator of B(t) in that case is given by
which is the estimator initially proposed by Aalen (1980). Weighted estimators, and in theory efficient estimators, exist. However, they depend on the unknown parameters, and therefore, to calculate them, some kind of smoothing is needed which may compromise efficiency in practice; see McKeague and Sasieni (1994) for more on the efficient estimators. Here we will only deal with the unweighted estimators that can be calculated without any use of smoothing. It is quite obvious that these simple and direct estimators have sound properties as the following arguments show. Considering 7, the counting process integral can be written as
since HX = 0, and therefore γ is an essentially unbiased estimator of γ since the martingale term has zeromean. Similar arguments concerning $\widehat{B}(t)$
are readily given. The limit distributions of the estimators arewhere
These limit distributions may be written as sums of essentially iid terms:
where
where c(t) and p(t) denote the limits in probability of C(t) and P(t), respectively. Also, x ^{ T } x is used as notation for the limit in probability of n ^{−1} X ^{ T } X, and similarly with z ^{ T } x. The limit distributions may be simulated likewise what Lin et al. (1993) suggested in a Coxmodel setting:
where we let ~ indicate that two quantities have the same limit distribution. In the latter display, G _{1} ,...,G _{ n } are independent standard normals and ${\widehat{\u03f5}}_{i}^{\text{\gamma}}$
is obtained from ${\u03f5}_{i}^{\text{\gamma}}$ by replacing deterministic quantities with their empirical counterparts and by replacing M _{ i }(t) with ${\widehat{M}}_{i}(t),i=1,\dots ,n,$ , and similarly with ${\widehat{\u03f5}}_{i}^{B}(t)$ . The result is that, conditional on the data,will have the same limit distribution as
The hypothesis of timeinvariance
focusing without loss of generality on the pth regression coefficient, may be reformulated in terms of the cumulative regression function ${B}_{p}(t)={\displaystyle {\int}_{0}^{t}{\beta}_{p}(s)\text{\hspace{0.17em}}ds}$
asMartinussen and Scheike (2006) studied the test process
that is easy to compute and considered test statistics as
Under H_{0},we have
Clearly, the limit distribution of V _{ n }(t) is not a martingale due to the term ${\widehat{B}}_{p}(\tau )$
, but one may use the above resampling technique of Lin et al. (1993) to approximate its limit distribution. Specifically, the limit distribution of V _{ n }(t) may be approximated bywhere v _{ k } is the kth element of a given vector v and where we are fixing the data. The above limit distributions and the conditional multiplier approach may also be used to test other hypothesis such as overall effect of a given covariate. Such inferential procedures and estimation are implemented in the contributed R package timereg; for examples of its use, see Martinussen and Scheike (2006).
The additive hazard model is very versatile but it is still of importance to check whether the model provides an adequate fit to the data at hand. The starting point will then usually be model (3.1) and if this model is found to give an adequate fit then one may proceed to investigate the timedynamics of the covariate effects. The goodnessoffit procedures presented below are based on the martingale residuals
Under the model, the martingale residuals are thus themselves martingales. To validate the model fit, one possibility is to sum these residuals depending on the level of the covariates (Aalen, 1993; Grønnesby and Borgan, 1996). A Kcumulative residual process may then be defined as
where $K(t)={\{{K}_{1}^{T}(t),\dots ,{K}_{n}^{T}(t)\}}^{T}$
is an n × m matrix, possibly depending on time. A typical choice of K is to let it be a factor with levels defined by the quartiles of the considered covariates. Based on a iid representation of M _{ K }(t) one may calculate pvalues using for instance a supremumtest, see Martinussen and Scheike (2006). With a continuous covariate one may also target a test to investigate whether the specific covariate is included in the model on the right scale, a typical alternative is to use a logtransformed version of the covariate. This construction is similar to what Lin et al. (1993) proposed under the Cox model and is based onhere assuming that the covariate under investigation is X _{1}. This process in z has a decomposition similarly to M _{ K } developed above, and resampling may thus be used to validate the fit of the model (Martinussen and Scheike, 2006). These goodnessoffit procedures are implemented in the Rpackage timereg. For other goodnessoffit testing concerning the additive hazard model, see Gandy and Jensen (2005a) and Gandy and Jensen (20056).
The additive hazard is applied in some more specialized areas of survival analysis. Below we describe some of these. But first we present some structural properties of the model that underline that this model may indeed provide an appealing alternative to the proportional hazard model.
Consider the additive hazard model
and let ${B}_{X}(t)={\displaystyle {\int}_{0}^{t}{\beta}_{X}(s)\text{\hspace{0.17em}}ds}$
and similarly with B _{ z }(t). We will explore when the additive hazard structure is preserved when only conditioning on X which was first investigated in Aalen (1993). The model above is formulated with X and Z both being scalar valued but the following can easily be generalized to cover the vector situation also. The hazard function, when only conditioning on X,iswhere ψ _{ zχ }(•) denotes minus the derivative of the log Laplace transform of Z given X. It is therefore readily seen that additive structure is preserved if X and Z are independent as only the intercept term is changed. This also holds true if (X, Z) has a normal distribution also allowing for dependence between the two. This is in contrast to the proportional hazard model where the proportional structure is not preserved even when X and Z are independent Struthers and Kalbfleisch (1986).
When dealing with data from a nonrandomized study it may be of interest to calculate the causal hazard function as it corresponds to the hazard function under a randomized study. To do so we briefly introduce Pearl's (Pearl, 2000) do operator, $do(X=x)\text{\hspace{0.17em}}\text{\hspace{0.17em}}(\text{or}\widehat{X}=x)$
, which is an intervention setting variable X to x. Therefore, the conditional distribution of T given X is set to x, $P(T\widehat{X}=x)$ , is different from P(TX = x) where the latter is the conditional distribution of T given that we find X equal to x. Taking again model (3.6) as the starting point it may be seen, using the socalled “Gcomputation formula” (Robins, 1986; Pearl, 2000), thatwhere ψ _{ z }(•) denotes minus the derivative of the log Laplace transform of Z. It is seen from display (3.7) that the additive hazard structure is again preserved and therefore the causal effect of X can be estimated from model (3.6) formulated for the observed data. Such a property does not hold true for the proportional hazard model; see Martinussen and Vansteelandt (2012) for more details on these issues. The additive hazard model also forms the basis for investigation of socalled “direct effects” in Fosen et al. (2006); Martinussen (2010); Lange and Hansen (2011); and Martinussen, Vansteelandt, Gerster and Hjelmborg (2011). In a situation as depicted in Figure 3.1, the direct effect of X, formulated via the hazard function on absolute scale, is
Figure 3.1 Situation with L and K being observed confounders and U being unobserved.
Clustered failure time data arise in many research areas, for example, in studies where time to disease onset is recorded for members of a sample of families. One approach to analyze such data is to use a frailty model, and recently Martinussen, Scheike and Zucker (2011) proposed the use of the additive hazard model in such a setting. The Aalen additive gamma frailty hazard model specifies that individual i in the kth cluster has conditional hazard function
where Z _{ k } is the unobserved frailty variable, taken to be gamma distributed with mean 1 and variance θ, X _{ ik } is the covariate vector and β(t) is the vector of timedependent regression coefficients. The counting process associated with individual i in the kth cluster has the following intensity with respect to the marginal history:
where Y _{ ik }(t) is the atrisk indicator. Thus the marginal intensity still has an additive structure, but with the design depending on B(t−). Based on this, an estimator was developed that has a recursive structure, and large sample properties were presented; see Martinussen, Scheike and Zucker (2011) for more details. Another approach was taken by Cai and Zeng (2011) where the frailty effect also enters at the additive scale
using the Lin and Ying version of the additive model and where Z _{ k } follows a oneparameter distribution with zeromean. Estimators and large sample are provided in Cai and Zeng (2011).
As alternative to the frailty approach one may instead use a marginal regression model approach if interest centers on estimating regression effects at the population level and estimation of correlation is of less interest. For use of the fully nonparametric version of the additive hazards model in this context, see Martinussen and Scheike (2006). Instead one may also apply the Lin and Ying version of the additive model; see Pipper and Martinussen (2004) and Yin and Cai (2004).
As described in Section 3.1 the Aalen additive hazard model is well suited to investigate the time dynamics of the effect of a given covariate. Hence one can test the constant effects model against a model where the structure of the effect is left fully unspecified. An appealing model in between the constant effects model and the full Aalen model is the changepoint model in which the effect of a covariate is constant up to an unknown point in time and changes thereafter to a new value. This model can be written as
where α(tX, Z) is the hazard rate, X is a pdimensional covariate and Z is a scalar covariate, β(t) is a vector of unknown locally integrable timedependent regression functions, and γ _{1} , γ _{2} and θ are unknown parameters with the last being the changepoint parameter that is also taken as an unknown parameter. To test whether or not there is evidence of a changepoint effect corresponds to testing the hypothesis H _{0} : γ _{2} =0, which is a nontrivial problem as the changepoint parameter θ is only present under the alternative. Martinussen and Scheike (2007) derived a test tailored to investigate this hypothesis, and also gave estimators for all the unknown parameters defining the model. They also proposed a test that can be used to compare the change point model to full Aalen model with no prespecified form of the covariate effects.
It has become of increasing interest to develop statistical methods to explore a potential relationship between a highdimensional regressor, such as geneexpression data, to the timing of an event such as death. It has been demonstrated that the additive hazard model may also be useful in this respect. The starting point for these developments has so far been the additive hazard model with all effects being constant with time, that is, the Lin and Ying model,
where Z is pdimensional covariate vector; in this case with p large. With gene expression data, p may be very large and often much larger than the sample size n. This is referred to as the p>>nsituation. Based on the Lin and Ying model Martinussen and Scheike (2009a) derived a partial least squares estimator via the socalled “Krylov sequence,” and Ma et al. (2006) derived a principal components analysis in this context. Other popular approaches in a highdimensional regressor setting is the Lasso and Ridge regression to mention some; see Bovelstad et al. (2009) for a comparison of methods using the Cox model as the base model. One complication with the additive hazard model compared with the proportional hazard model is, however, that there is no simple likelihood to work with for the additive model. Leng and Ma (2007) suggested to use a least squares criterion, similarly to the partial likelihood for Cox's regression model, for the Lin and Ying model:
using the notation introduced in Section 3.1. This criterion was also suggested independently by Martinussen and Scheike (2009b) that provided further motivation. This criterion opens the route to many popular estimation procedures applied in the highdimensional regressor setting when the underlying model is the Lin and Ying model. Recently GorstRasmussen and Scheike (2012) took the same model to propose a sure independence screening method that shows good performance. For estimation in the highdimensional case one may use the contributed R package ahaz that can also do other types of estimation and inference within the additive hazard model context.
The Cox model and the Aalen model may be combined in various ways to improve model fit or to enhance interpretation of parameters. Scheike and Zhang (2002) considered the model
Figure 3.2 Gastrointestinal tumour data. Left display: KaplanMeier estimates of survival probabilities of the chemotherapy (solid line) and chemotherapy plus radiotherapy (dashed line) groups. Right display: effect of combined therapy  Aalens least squares estimate of B 1(t) with 95% pointwise confidence bands with the dashed curve being the estimate based on a changepoint model).
termed the CoxAalen model, where the baseline hazard function of the Cox model is replaced by an additive hazard term depending on X. Martinussen and Scheike (2002) considered a proportional excess risk model that also combines the Cox model and the additive model.
It is also of interest to decide whether the Cox model or the additive hazard is most appropriate to apply in a given application. This is a nontrivial problem as these two classes of models are nonnested. Martinussen et al. (2008) used the MizonRichard encompassing test for this particular problem, which corresponds to fitting the Aalen model to the martingale residuals obtained from the Cox regression analysis.
A typical instance where the Cox model provides a poor fit is when crossing survival curves is observed as shown in Figure 3.2 (left display) that displays the KaplanMeier estimates of survival probabilities of the chemotherapy and chemotherapy plus radiotherapy groups of gastric cancer patients in a randomized clinical trial (Stablein and Koutrouvelis, 1985); these data were also considered in Zeng and Lin (20076). The crossing of the survival curves can be explained by the patients receiving the more aggressive treatment (radiotherapy) who are at elevated risk of death initially but may benefit from the treatment in the long run if they are able to tolerate the treatment initially. That the Cox model provides a poor fit to the data is confirmed by the supremum test based on the score process Lin et al. (1993) with p< 0.001. We fitted Aalens additive hazards model to the gastrointestinal tumour data:
where X is the indicator of chemotherapy plus radiotherapy treatment. The cumulative regression coefficient corresponding to the combined therapy group is depicted in Figure 3.2 (right display), showing nicely that a time varying effect is indeed the case for these data with a negative effect of the combined treatment in the first 300 days or so, and then apparently with an adverse effect thereafter. For these data one may be interested in applying the changepoint model
with θ being the changepoint parameter. For the gastrointestinal tumour data we obtain the estimates θ = 315 and $({\widehat{\text{\gamma}}}_{1},{\widehat{\text{\gamma}}}_{2})=(0.00175,0.00255)$
with the estimated cumulative regression function superimposed on the Aalen estimator in Figure 3.2 indicating that this model gives a good fit to these data.Another useful model in survival analysis is the accelerated failure time (AFT) model that can be formulated using survival functions. Let T be a survival time and Z acovariate vector that does not depend on time and let S(tZ) be the conditional survival function. The AFT model assumes that
where S _{0} is a baseline survival function. From (3.13) it is seen that covariates act multiplicatively on time so that their effect is to accelerate (or decelerate) timetofailure relative to S _{0}. An equivalent and popular formulation of AFTmodel is the following linear regression for the logtransformed event time, log(T), given Z such that
where the distribution of e ^{ ε } is given by S _{0} and ε is assumed to be independent of Z .The AFT model has received considerable attention in the statistical literature in the last two decades or so. Although, in its semiparametric version (see below) it is computationally intensive, it is now considered as an alternative to the Cox model. It is appealing due to its direct relationship between the failure time and covariates, which was also noted by David Cox in Reid (1994). One may extend the model to cover the case where covariates may change with time by considering
see for example Zeng and Lin (2007a), but we will only consider the simpler case here. The AFT model can also be formulated in terms of the hazard function for T given Z:
where λ_{0}(t) is the hazard associated with the distribution of e ^{ ε }.
If we are willing to specify a specific form of S _{0} then the AFT model is fully parametric. For instance, the Weibull model is given by
In such a case, estimation and statistical inference is straightforward using standard likelihood methods for survival data. For the Weibull model we have the hazard function λ_{0}(t) = kαt ^{ α−1} and hence, by (3.15),
being also a proportional hazard function with a Weibull baseline hazard function, so in this specific example the two models coincide.
We will now look into the case where S _{0}(t) is left unspecified, or, equivalently, ε in (3.14) is not specified, which is referred to as the “semiparametric AFT model.” There exist two approaches to tackle this situation. One develops estimation based on the hazard specification (3.15), while the other one uses the additive mean model specification (3.14) as a starting point.
Let C be the censoring time for T, and put $\tilde{T}=T\wedge C$
and Δ = I(T ≤ C). Hence the intensity of the counting process $N(t)=I(\tilde{T}\le t)\text{\Delta}$ is thus given by Y(t)λ(t) with $Y(t)=I(t\le \tilde{T})$ being the atrisk indicator and λ(t) given in (3.15). Assume that n i.i.d. counting processes are being observed subject to this generic hazard model. We thus consider N(t) = {N _{1}(t),.., N _{ n }(t)} the ndimensional counting process of all subjects. Define also the timetransformed counting processwith associated atrisk process $\begin{array}{ccc}{Y}_{i}^{*}(t,\beta )& =& {Y}_{i}(t{e}^{{Z}_{i}^{T}\beta}),\end{array}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\begin{array}{ccc}i& =& 1,\dots ,n\end{array}$
.This timetransformation is made because then the intensity of ${N}_{i}(t{e}^{{Z}_{i}^{T}\beta})$ iswhich immediately suggests that ${\text{\Lambda}}_{0}(t)={\displaystyle {\int}_{0}^{t}{\text{\lambda}}_{0}}(s)\text{\hspace{0.17em}}ds$
should be estimated by the Breslowtype estimatorwhere $d{N}_{\xb7}^{*}(t)={\displaystyle {\sum}_{i=1}^{n}d{N}_{i}^{*}(t)}$
, andif β were known. The efficient score function for β is
and inserting $d{\widehat{\text{\Lambda}}}_{0}(u,\beta )$
for dΛ_{0}(u) giveswhere
and
is the efficient weight function. We cannot use (3.18) directly for estimation purposes since the weight function W(u) involves the unknown baseline hazard function λ _{0}(u) and its derivative ${\text{\lambda}}_{0}^{\prime}(u)$
. These can be estimated and inserted into (3.18) but it is not recommendable since it is hard to get reliable estimates of especially ${\text{\lambda}}_{0}^{\prime}(t)$ . A way around this is to take (3.18) and replace the weight function W(u) with one that can be computed as for example W(u)=1 or $W(u)={n}^{1}{S}_{0}^{*}(u,\beta )$ referred to as the “logrank and Gehan weight functions,” respectively. A practical complication is that the score function U _{ W }(β) is a step function of β so U _{ W }(β) = 0 may not have a solution. The score function may furthermore not be componentwise monotone in β. It is actually monotone in each component of β if the Gehan weight is chosen (Fygenson and Ritov, 1994). The estimator $\widehat{\beta}$ is usually chosen as the one which minimizes ‖U _{ W }(β)‖. The contributed R package lss calculates the Gehan rank estimator. It has been established, under regularity conditions, that ${n}^{1/2}(\widehat{\beta}\beta )$ is asymptotically zeromean normal with covariance matrix ${A}_{W}^{1}{B}_{W}{A}_{W}^{1}$ where A _{ W } and B _{ W }are the limits in probability ofrespectively; see Tsiatis (1990) and Ying (1993). It is seen that A _{ W } and B _{ W } coincide in the case where W(u) is taken as the efficient weight function (3.19). The asymptotic covariance matrix depends on ${\text{\lambda}}_{0}^{\prime}$
, which is difficult to estimate. One may, however, apply a resampling technique avoiding estimation of ${\text{\lambda}}_{0}^{\prime}$ ; see Lin et al. (1998) and Jin et al. (2003).The score function can also be written on the (log)transformed time scale
where ${\tilde{N}}_{i}(t)=I(\mathrm{log}({\tilde{T}}_{i})\le t){\text{\Delta}}_{i},{\tilde{Y}}_{i}(t)=I(t\le \mathrm{log}({\tilde{T}}_{i}))$
,with λ_{0ε}(t) the hazard function for ε. Another way of motivating (3.20) is by classical nonparametric testing theory; see Tsiatis (1990) and Kalbfleisch and Prentice (2002) for much more details on that approach.
An interesting variant of (3.15), considered by Chen and Jewell (2001), is
which contains the proportional hazard model (β _{1} = 0) and the accelerated failure time model (β_{1} = β _{ 2 }), and for β _{2} = 0 what is called “the accelerated hazard model” (Chen and Wang, 2000). Chen and Jewell (2001) suggested estimating equations for estimation of β = (β _{1},β _{2}) and showed for the resulting estimator, $\widehat{\beta}$
, that ${n}^{1/2}(\widehat{\beta}\beta )$ is asymptotically zeromean normal with a covariance that also involves the unknown baseline hazard (and its derivative). They also suggested an alternative resampling approach for estimating the covariance matrix without having to estimate the baseline hazard function or its derivative. With these tools at hand, one may then investigate whether it is appropriate to simplify (3.21) to either the Cox model or the AFT model.There exist other ways of estimating the regression parameters β of the semiparametric AFTmodel, which build more on classical linear regression models estimation. Starting with (3.14), let V = log(T), U = log(C) and Ṽ = V Λ U, and write model (3.14) as
assuming that ε is independent of Z =(Z_{1},...,Z _{ p })^{ T } and has zero mean. If V was not rightcensored, then it is of course an easy task to estimate the regression parameters using least squares regression. A number of authors (Miller, 1976; Buckley and James, 1979; and Koul et al., 1981) have extended the leastsquares principle to accommodate censoring; we describe the approach of Buckley and James. The idea is to replace V with a quantity that has the same mean as V, and which can be computed based on the rightcensored sample. With
and Δ = I(V ≤ U), then E(V* Z) = E(V  Z). Still, V* is not observable but it can be estimated as follows. Since
with F the distribution of V + Z ^{ T } β, one can construct the socalled “synthetic data points:”
where $\widehat{F}$
is the KaplanMeier estimator based on $({\tilde{V}}_{i}+{Z}_{i}^{T}\beta ,{\text{\Delta}}_{i})$ , i =1,...,n.One may then estimate the parameters from the normal equations leading to the following estimating equation for the regression parameter vector β:where $\overline{Z}={n}^{1}{\displaystyle {\sum}_{i}{Z}_{i}}$
. Equation (3.22) needs to be solved iteratively if it has a solution. This estimator is referred to as the “BuckleyJames estimator.” The large sample properties of the resulting estimator were studied by Ritov (1990). Equation (3.22) can also be written as, with S(v)= v,that may be derived from a likelihood principle; the efficient choice of S(v) being S(v) = f′(v)/f(v) with f(v) = F′(v) the density function. Ritov (1990) also established an asymptotic equivalence between the two classes of estimators given by (3.20) and (3.23).
The estimating function U(β) in (3.22) is neither continuous nor monotone in β,which makes it difficult to calculate the estimator in practice. Jin et al. (2006) proposed an iterative solution to (3.22) with a preliminary consistent estimator as starting value; this estimator is also available in the R package lss.
In survival analysis, quantile regression (Koenker and Bassett, 1978) offers a significant extension of the accelerated failure time (AFT) model. Let T denote the failure time of interest and Z ≡ (Z _{1},...,Z _{ p })^{T} denote a p × 1 vector of covariates. Define $\tilde{Z}={(1,{Z}^{\u22ba})}^{\u22ba}$
and V = log(T). For the response V ,the τ th conditional quantiles of V given $\tilde{Z}$ is defined as ${Q}_{V}(\tau \tilde{Z})\equiv \mathrm{inf}\{t:\mathrm{Pr}(V\le t\tilde{Z})\ge \tau \}$ ,where τ ∊ (0,1). We adopt the same quantile definition for other response random variables. With survival data, a common quantile regression modeling strategy is to link the conditional quantile ${Q}_{V}(\tau \tilde{Z})\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{to}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\tilde{Z}$ through a linear model:where β_{0}(τ) is a (p +1) × 1 vector of unknown regression coefficients possibly depending on τ,and0≤ τL ≤ τ _{ U } ≤ 1. Like the AFT model, the quantile regression model (3.24) offers an easy coefficient interpretation as a covariate effect on event time. Specifically, a nonintercept coefficient in β_{0}(τ ) represents the change in the τ th quantile of log(T) given one unit change in the corresponding covariate.
It is important to note that the AFT model,
is a special case of model (3.24) with β_{0}(τ) = (Q_{ε}(τ), b ^{T})^{T}.Here ⊥ stands for independence, and Q _{ε}(τ ) represents the τ th quantile of ε. From this view, we see that the AFT model (3.25) requires the effects of covariates on ${Q}_{V}(\tau \tilde{Z})$
be constant for all τ ∊ ↓τ _{ L },τ _{ U }] (i.e., location shift effects). In contrast, the quantile regression model (3.24) flexibly formulates coefficients as functions of τ, thereby permitting covariate effects to vary across different quantile levels. As a result, it can accommodate a more general relationship between T and Z compared to the AFT model (3.25).Note, when τ _{ L } = τ _{ U } , model (3.24) would only assert “local” linearity between the quantile of log(T) and $\tilde{Z}$
at a single quantile level, and therefore imposes weaker restrictions than a “global” quantile regression model corresponding to a case with τ _{ L } <τ _{ U }. A practical advantage of conducting “global” quantile regression is the capability of investigating the dynamic pattern of covariate effects.In this section, we shall primarily focus on quantile regression with randomly rightcensored data. Let C denote time to censoring. Define $\tilde{T}=T\wedge C$
and $\text{\Delta}=I(T\le C)$ . The observed data under right censorship consists of n i.i.d. replicates of $(\tilde{T},\text{\Delta},\tilde{Z})$ , denoted by $({\tilde{T}}_{i},{\text{\Delta}}_{i},{\tilde{Z}}_{i})$ . In addition, we define $\tilde{V}=\mathrm{log}(\tilde{T}),{\tilde{V}}_{i}=\mathrm{log}({\tilde{T}}_{i}),U=\mathrm{log}(C)$ , and U _{ i } = log(C _{ i }). In Sections 3.3.2, 3.3.3 and 3.3.4, we consider three different random censoring scenarios, with general inference procedures discussed in Section 3.3.5. Extensions to more complex survival settings are briefly described in Section 3.3.6.Early work on quantile regression with censored data by Powell (1984, 1986) was targeted at type I censoring cases where the censoring time C is fixed and prespecified. Motivated by the fact that ${Q}_{\tilde{V}}(\tau \tilde{Z})=\{{\tilde{Z}}^{\u22ba}{\beta}_{0}(\tau )\}\wedge U$
, an estimator of β _{0}(τ) is defined as the minimizer ofwith respect to b,where ρ _{ τ }(x) = x{τ − I(x< 0)}. This estimation method can also be applied to a more general case where C is independent of T given $\tilde{Z}$
, and is always known but not necessarily fixed. In the absence of right censoring (i.e., C _{ i } = ∞), r( b ,τ) reduces to the check function of Koenker and Bassett (1978), the objective function for defining sample regression quantiles with complete data without censoring. Note that r( b ,τ) is not convex in b and thus it may have multiple local minima. Further efforts have been made to improve the numerical performance of this approach by several authors, for example: Fitzenberger (1997); Buchinsky and Hahn (1998); and Chernozhukov and Hong (2001). An implementation of Powell's method is available in the crq function in the contributed R package quantreg by Koenker (2012).Under the assumption that T and C are independent and C is independent of $\tilde{Z}$
(i.e., covariateindependent random censoring), a natural estimating equation for β _{0}(τ), derived from Ying et al. (1995)'s work, is given bywhere Ĝ(•) is the KaplanMeier estimate for G(•), the survival function of $\tilde{C}$
. The estimating function, the lefthand side (LHS) of (3.26), is not continuous in β(τ), and thus an exact zerocrossing may not exist. The solution to Equation (3.26) may be alternatively defined as a minimizer of the L _{2} norm of the estimating function. Such an objective function is discontinuous and may have multiple minima. To solve Equation (3.26), as suggested by Ying et al. (1995), the grid search method may be used for cases with lowdimensional $\tilde{Z}$ and simulated annealing algorithm (Lin and Geyer, 1992) may be used for highdimensional cases.For the same right censoring scenario, an alternative estimating equation for β_{0}(τ) is suggested by the work of Peng and Fine (2009). Specifically, the inverse probability of censoring weighting (IPCW) technique (Robins and Rotnitzky, 1992) can be used to estimate β_{0}(τ) based on the fact that
This leads to the following estimating equation for β_{0}(τ):
Note, the estimating function in (3 27) is not continuous but monotone (Fygenson and Ritov, 1994). By the monotonicity of (3.27), the solution to Equation (3.27) can be reformulated as the minimizer of the L type convex function (of b ),
where M is an extremely large positive number selected to bound ${b}^{\u22ba}{\displaystyle {\sum}_{l=1}^{n}\left(\frac{{z}_{l}I({\text{\Delta}}_{l}=1)}{\widehat{G}({\tilde{V}}_{l})}+2{Z}_{l}\tau \right)}$
from the above for all b 's in the compact parameter space for β_{0}(τ). This minimization problem can be readily solved by the l1fit function in SPLUS or the rq() function in the contributed R package quantreg (Koenker, 2012).Standard random right censoring refers to a censoring mechanism where C is only assumed to be independent of T given $\tilde{Z}$
. It is less restrictive than the censorship considered in Sections 3.3.2 and 3.3.3.Two major types of approaches have been developed for quantile regression with survival data subject to standard random right censoring; one type employs the principle of selfconsistency (Efron, 1967) and the other type utilizes the martingale structure associated with randomly rightcensored data. Hereafter, we shall refer to them as a selfconsistent approach and martingalebased approach, respectively. Here we focus on approaches that are oriented to the following global linear quantile regression model,
which is a special case of model (3.24) with τ _{ L } = 0. Of note, model (3.28) entails $\{{\tilde{Z}}^{\u22ba}{\beta}_{0}(0)\}=0$
, which is useful boundary information to be employed in the estimation of β_{0}(τ) with τ> 0. Some efforts, for example, by Wang and Wang (2009), have been made to address a local linear quantile regression model (i.e., τ _{ L } = τ _{ U }) by incorporating nonparametric estimation of the distribution function of C given $\tilde{Z}$ .Portnoy (2003) made the first attempt to tackle the estimation of model (3.28) under the standard random right censoring assumption by employing the principle of selfconsistency (Efron, 1967). The critical idea behind his algorithm is about splitting a censored event time, Ṽ _{ i } with Δ_{ i } = 0, between U _{ i } and ∞ with appropriately designed weights, sharing the same spirit as that adopted for Efron's selfconsistent estimator of survival function (Efron, 1967). The initial iterative selfconsistent algorithm (Portnoy, 2003) was simplified into a gridbased sequential estimation procedure (Neocleous et al., 2006), which is implemented by the crq function in the contributed R package quantreg (Koenker, 2012). The corresponding asymptotic studies were established by Portnoy and Lin (2010).
The selfconsistent approach can be formulated through stochastic integral equations (Peng, 2012). Define N _{ i }(t) = I(Ṽ _{ i } ≤ t, Δ_{ i } = 1), R _{ i }(t)= I(Ṽ _{ i } ≤ t, Δ_{ i } = 0), and F _{ V }(t) ≡ Pr(V ≤ t). First, consider Efron's (Efron, 1967) selfconsistent estimating equation for F _{ V }(t) in the onesample case:
Expressing the righthand side (RHS) of (3.29) by a stochastic integral and further applying stochastic integral by parts, one can rewrite Equation (3.29) as
With t replaced by ${\tilde{Z}}_{i}^{\u22ba}\beta (\tau )$
, Equation (3.30) evolves into an estimating equation for β_{0}(τ):where ${S}_{n}^{(SC)}(\beta ,\tau )$
equalsDefine a grid of τvalues, G,as0 <τ _{1} <τ _{2} <… <τ _{ M } = τ _{ U } .Let ‖G‖ denote the size of G,max_{ k=0,…}, _{ M }(τ _{ k+1} − τ _{ k }). Without further mentioning, G will be adopted throughout Section 3.3.4. A selfconsistent estimator, ${\widehat{\beta}}_{SC}(\xb7)$
, is defined as a cadlag step function that only jumps at the grid points of G and approximates the solution to Equation (3.31). The algorithm taken to obtain ${\widehat{\beta}}_{SC}(\xb7)$ is outlined as follows.Large sample studies for ${\widehat{\beta}}_{SC}(\xb7)$
are facilitated by the stochastic integral equation representation of (3.31). Specifically, under certain regularity conditions and given lim_{ n→}∞ ‖G‖ = 0, ${\mathrm{sup}}_{\tau \in \left[v,\text{\hspace{0.17em}}\tau U\right]}\Vert {\widehat{\beta}}_{SC}\left(\tau \right){\beta}_{0}\left(\left(\tau \right)\right)\Vert {\to}_{p}0$ , where 0 <ν <τ _{ U } .If n ^{ 1/2 } lim _{ n→∞ } ‖G‖ = 0 is further satisfied, then ${n}^{1/2}\left\{{\widehat{\beta}}_{SC}\left(\tau \right){\beta}_{0}\left(\tau \right)\right\}$ converges weakly to a Gaussian process for τ ∊ [v,τ _{ U }]. Peng (2012) also investigated several variants of ${\widehat{\beta}}_{SC}(\xb7)(\xb7)$ , showing their asymptotic equivalence to ${\widehat{\beta}}_{SC}$ as well as their connection with the selfconsistent estimator proposed by Portnoy (Portnoy, 2003; Neocleous et al., 2006).Model (3.28) can also be estimated based on the martingale structure of randomly rightcensored data (Peng and Huang, 2008). Define ${\Lambda}_{V}\left(t\tilde{Z}\right)=\mathrm{log}\left\{1\mathrm{Pr}\left(V\le t\tilde{Z}\right)\right\},\text{\hspace{0.17em}}N\left(t\right)=I\left(\tilde{V}\le t,\text{\hspace{0.17em}}\Delta =1\right)$
, and $M\left(t\right)=N\left(t\right){\Lambda}_{V}\left(t\wedge Y\tilde{Z}\right)$ . Let N _{ i }(t) and M _{ i }(t) be sample analogs of N(t) and M(t), respectively, i =1,...,n.Since M _{ i }(t) is the martingale process associated with the counting process ${N}_{i}\left(t\right),\text{\hspace{0.17em}}E\left\{{M}_{i}\left(t\right){\tilde{Z}}_{i}\right\}=0$ for all t > 0. This impliesBy the monotonicity of ${\tilde{Z}}_{i}^{\text{T}}{\beta}_{0}\left(\tau \right)$
under model (3.28), we havewhere H(x) = − log(1 −x). Coupling the equalities (3.33) and (3.34) suggests the estimating equation,
where
An estimator of ß _{0}(τ), denoted by ${\widehat{\beta}}_{PH}\left(\tau \right)$
, can be obtained through approximating the stochastic solution to Equation (3.35) by a cadlag step function. The sequential algorithm for obtaining ${\widehat{\beta}}_{PH}\left(\tau \right)$ is outlined as follows:The crq function in the contributed R package quantreg (Koenker, 2012) provides an implementation of ${\widehat{\beta}}_{PH}\left(\tau \right)$
based on an algorithm slightly different from the one presented above. More recently, Huang (2010) derived a gridfree estimation procedure for model (3.28) by using the concept of quantile calculus.Peng and Huang (2008) established the uniform consistency and weak convergence of ${\widehat{\beta}}_{PH}(\cdot )$
. Moreover, ${\widehat{\beta}}_{PH}(\cdot )$ was shown to be asymptotically equivalent to the selfconsistent estimator ${\widehat{\beta}}_{SC}(\cdot )$ (Peng, 2012). The numerical results reported in Koenker (2008) and Peng (2012) confirm this theoretical result and show comparable computational performance between these two approaches.The estimators of ß _{0}(τ) discussed in Sections 3.3.2–3.3.4 generally have asymptotic variances that involve unknown density functions. Under random right censoring with known censoring time or covariateindependent censoring, we can adapt Huang (2002)'s technique to avoid density estimation. Specifically, let ß(τ) be general notation for an estimator of ß _{0}(τ), and S _{ n }(ß(τ),τ) denote the estimating function associated with $\widehat{\beta}\left(\tau \right)$
, for example, the LHS of (3.26) and (3.27). Asymptotic theory may show that Sn{ßo(τ),τ} converges to a multivariate normal distribution with variance matrix Σ(τ). Suppose one can obtain a consistent estimator of Σ(τ ), denoted by $\widehat{\sum}\left(\tau \right)$ . The following are the main steps to estimate the asymptotic variance of $\widehat{\beta}\left(\tau \right)$ :Bootstrapping procedures are also frequently used for variance estimation under quantile regression. Resampling methods that follow the idea of Parzen and Ying (1994) were shown to yield valid variance estimates and other inference (Peng and Huang, 2008). Simple bootstrapping procedures based on resampling with replacement also seem to have satisfactory performance (Portnoy, 2003; Peng, 2012).
One practical appeal of quantile regression with survival data is its capability of accommodating and exploring varying covariate effects. Secondstage inference can be employed to serve this need. Given $\widehat{\beta}(\tau )$
for a range of τ, it is often of interest to investigate: (1) how to summarize the information provided by these estimators to help understand the underlying effect mechanism; and (2) how to determine whether some covariates have constant effects so that a simpler model may be considered.A general formulation of problem (1) may correspond to the estimation of some functional of ß _{0}(•), denoted by Ψ(ß _{0}). A natural estimator for Ψ(ß) is given by $\Psi ({\widehat{\beta}}_{0})$
. Such an estimator may be justified by the functional delta method provided that Ψ is compactly differentiable at ß _{0} (Andersen et al., 1998).Addressing question (2) can be formulated as a testing problem for the null hypothesis ${\tilde{H}}_{0,j}:{\beta}_{0}^{(j)}(\tau )={\rho}_{0},\tau \in \left[{\tau}_{L},{\tau}_{U}\right]$
, where the superscript ^{(j)} indicates the jth component of a vector, and ρ _{0} is an unspecified constant, j = 2,..., or p + 1. An example test procedure is presented in Peng and Huang (2008). Of note, accepting ${\tilde{H}}_{0,j}$ for all j ∊{2,...,p +1} may indicate the adequacy of an AFT model. This naturally renders a procedure for testing the goodnessoffit of an AFT model.Model checking is often of practical importance. When the interest only lies in checking the linearity between covariates and conditional quantile at a single quantile level, one can adapt the approaches developed for uncensored data, for example, the work by Zheng (2000), Horowitz and Spokoiny (2002), and He and Zhu (2003). When the focus is to test a global linear relationship between conditional quantiles and covariates, a natural approach is to use a stochastic process which has mean zero under the assumed model. For example, a diagnostic process for model (3.28) (Peng and Huang, 2008) may take the form,
where α(•) is a known bounded function, and
It can be shown that K(τ) converges weakly to a zeromean Gaussian process, whose distribution can be approximated by that of
Here β ^{ * }(τ) denote the resampling estimator obtained by perturbing the estimating equation (3.35) by ${\left\{{\zeta}_{i}\right\}}_{i=1}^{n}$
, which are independent variates from a nonnegative known distribution with mean 1 and variance 1. An unusual pattern of K(•) compared to that of K*(•) would suggest a lackoffit of model (3.28). A formal lack of fit test is given by the supremum statistic, ${\mathrm{sup}}_{\tau \in [l,u]}\leftK(\tau )\right$ ,where 0 <l <u<τ _{ U } .The pvalue may be approximated by the empirical proportion of sup_{τ∊}[_{ l,u }] K*(τ) exceeding sup_{τ∊[l,u]} (τ)In practice, survival data may involve more complex censoring mechanism than random right censoring. Truncation can also present, for example, in many observational studies. There are recent method developments for quantile regression in more complex survival scenarios. For example, Ji et al. (2012) proposed a modification of Peng and Huang (2008)'s martingalebased approach for survival data subject to known random left censoring and/or random left truncation in addition to random right censoring. Quantile regression with competing risks or semicompeting risks data was addressed by the work of Peng and Fine (2009) and Li and Peng (2011).
There are continued research efforts to address quantile regression for other important survival scenarios, such as interval censored data and dependently censored data. As a promising regression tool for survival analysis, quantile regression may be recommended in a greater extent in real applications to provide complementary and yet useful secondary analysis.
To illustrate quantile regression for survival analysis, we use a dataset from a dialysis study that investigated predictors of mortality risk in a cohort of 191 incident dialysis patients (Kutner et al., 2002). Analysis covariates included patient's age (AGE), the indicator of fish consumption over the first year of dialysis (FISHH), the indicator of baseline HD dialysis modality (BHDPD), the indicator of moderate to severe symptoms of restless leg symptoms (BLEGS), the indicator of education equal or higher than college (HIEDU), and the indicator of being black (BLACK). We first fit the data with AFT model (3.25), where T stands for timetodeath. In Table 3.1, we present the estimation results based on the logrank estimator, Gehan's estimator, and the leastsquares estimator. All covariates except for BLEGS are consistently shown to have significant effects on survival by all three different estimators. For the coefficient of BLEGS, Gehan's estimator and the leastsquares estimator yield significant p values while the logrank estimator does not.
We next conduct quantile regression based on model (3.28) for the same dataset. Figure 3.3 displays Peng and Huang (2008)'s estimator of β_{0}(τ) along with 95% pointwise confidence intervals. In Figure 3.3, we observe that the coefficient for BLEGS diminishes gradually with τ whereas estimates for the other coefficients seem to be fairly constant. We apply the secondstage inference to formally investigate the constancy of each coefficient. The results confirm our observation from Figure 3.3, suggesting a varying effect of BLEGS and constant effects of the other covariates. This may lead to an interesting scientific implication that BLEGS may affect the survival experience of dialysis patients with short survival times, but may have little impact on that of longterm survivors. In addition, the evidence for the nonconstancy of BLEGS coefficient may indicate some degree of lackoffit of the AFT model for the dialysis data.
Gehan's estimator 
Logrank estimator 
Leastsquares estimator 


Coef 
SE 
p value 
Coef 
SE 
p value 
Coef 
SE p 
value 

AGE 
−.031 
.005 
<. 001 
− .035 
.004 
<.001 
.033 
.006 
<.001 
FISHH 
.402 
.139 
.004 
.485 
.128 
<.001 
.507 
.163 
.002 
BHDPD 
−.505 
.156 
.001 
−.473 
.136 
<.001 
−.509 
.164 
.002 
BLEGS 
−.340 
.166 
.040 
−.173 
.145 
.232 
−.412 
.176 
.019 
HIEDU 
—.352 
.133 
.008 
−.364 
.161 
.024 
.335 
.139 
.016 
BLACK 
.640 
.144 
<.001 
.591 
.138 
<.001 
.643 
.153 
<.001 
We further estimate the average quantile effects defined as ${\int}_{l}^{u}{\beta}_{0}^{(i)}(u)du(i=2,\dots ,7)$
. The results are given in Table 3.2. We observe that the estimated average effects are similar to the AFT coefficients obtained by Gehan's estimator and the leastsquares estimator, but have relatively larger discrepancies with the logrank estimates. This may indirectly reflect the presence of some nonconstant covariate effect. When quantile regression model (3.28) holds without any varying covariate effects, we would expect to see more consistent results between Gehan's estimator, which emphasizes early survival, and the logrank estimator, which treats early and late survival information equally.
AveEff 
SE 
p value 


AGE 
− .030 
.003 
<.001 
FISHH 
.327 
.116 
.005 
BHDPD 
−.489 
.162 
.003 
BLEGS 
−.369 
.161 
.022 
HIEDU 
−.350 
.137 
.011 
BLACK 
.654 
.144 
<.001 
Figure 3.3 Peng and Huang's estimator (solid lines) and 95% pointwise confidence intervals (dotted lines) of regression quantiles in the dialysis example.