Learning Outcomes
By the end of this section, students will be able to:
- Understand the theory of proportional hazards models
- Calculate and interpret the results of a Cox proportional hazards regression model
- Test the assumptions underlying a Cox regression model
Video C2.3a – Cox Model (8 minutes)Â
C2.3a PRACTICAL: Stata
Unadjusted Cox regression
The ‘stcox’ command is used to run a Cox proportional hazards regression model. You must use the ‘stset’ command before using stcox, and due to this you do not need to specify an outcome variable or a time variable again. You only need to specify the explanatory variables that you are examining in your model.
stcox [varlist] [if] [in] [, options]
For example, for frailty, the command would be:
stcox i.frailty
The output looks like this:
Cox regression with Breslow method for ties
No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327
No. of failures = Â Â Â 1,526
Time at risk   = 29,476.9825
                                       LR chi2(4)   = 577.22
Log likelihood = -12128.151Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000
 ——————————————————————————
     _t | Haz. ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
   frailty |
     2  |   1.45484  .1522779   3.58  0.000   1.185005   1.786119
     3  |  1.944776   .195232   6.63  0.000   1.597421   2.367664
     4  |  2.884121  .2643057   11.56  0.000   2.409949   3.451588
     5  |  5.740893  .5005441   20.04  0.000   4.839091   6.810752
——————————————————————————
Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant.
- Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?
C2.3a.i. Answer
stcox i.vitd_med
   Failure _d: death
 Analysis time _t: fu_years
    ID variable: whl1_id
Iteration 0: Â log likelihood = -12416.762
Iteration 1: Â log likelihood = -12356.639
Iteration 2: Â log likelihood = -12356.629
Refining estimates:
Iteration 0: Â log likelihood = -12356.629
Cox regression with Breslow method for ties
No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327
No. of failures = Â Â Â 1,526
Time at risk   = 29,476.9825
LR chi2(1) Â Â = 120.27
Log likelihood = -12356.629 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000
——————————————————————————
_t | Haz. ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
2.vitd_med | Â Â .566297 Â .0298226 Â -10.80 Â 0.000 Â Â .5107612 Â Â .6278712
——————————————————————————
Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).
Multivariable Cox Regression
We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group, we type:
stcox vitd_med i.age_grp
The output should look like this:
    Failure _d: death
 Analysis time _t: fu_years
    ID variable: whl1_id
Iteration 0: Â log likelihood = -12416.762
Iteration 1: Â log likelihood = -12186.532
Iteration 2: Â log likelihood = -12137.041
Iteration 3: Â log likelihood = -12136.832
Iteration 4: Â log likelihood = -12136.832
Refining estimates:
Iteration 0: Â log likelihood = -12136.832
Cox regression with Breslow method for ties
No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327
No. of failures = Â Â Â 1,526
Time at risk   = 29,476.9825
                            LR chi2(4)   = 559.86
Log likelihood = -12136.832 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000
——————————————————————————
     _t | Haz. ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
  vitd_med |  .6728851   .035984   -7.41  0.000    .605928   .7472412
       |
   age_grp |
   71-75  |  1.860847  .1792291   6.45  0.000   1.540729   2.247475
   76-80  |  3.128741  .3046026   11.72  0.000   2.585233   3.786513
   81-95  |  5.800825  .5781615   17.64  0.000   4.771463   7.052256
——————————————————————————
We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.57 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.
- Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?
C2.3a.ii. Answer
stcox i.frailty i.age_grp
    Failure _d: death
 Analysis time _t: fu_years
    ID variable: whl1_id
Iteration 0: Â log likelihood = -12416.762
Iteration 1: Â log likelihood = -12076.963
Iteration 2: Â log likelihood = Â -11986.28
Iteration 3: Â log likelihood = -11985.881
Iteration 4: Â log likelihood = -11985.881
Refining estimates:
Iteration 0: Â log likelihood = -11985.881
Cox regression with Breslow method for ties
No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327
No. of failures = Â Â Â 1,526
Time at risk   = 29,476.9825
                            LR chi2(7)   = 861.76
Log likelihood = -11985.881 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000
——————————————————————————
     _t | Haz. ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
   frailty |
     2  |  1.321821  .1386221   2.66  0.008    1.07623   1.623453
     3  |  1.612825  .1631257   4.73  0.000    1.3228   1.966437
     4  |  2.226646  .2074741   8.59  0.000   1.854975   2.672787
     5  |  4.125627  .3706378   15.78  0.000   3.459551   4.919944
       |
   age_grp |
   71-75  |  1.758254  .1695919   5.85  0.000   1.455389   2.124144
   76-80  |  2.645494  .2594916   9.92  0.000    2.1828   3.206266
   81-95  |   4.31949  .4368414   14.47  0.000   3.542813   5.266435
Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).
Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposure’s association may get stronger.
C2.3a PRACTICAL: R
Unadjusted Cox regression
The coxph() command is used to run a Cox proportional hazards regression model in R. Â The simplified syntax of the coxph() command is the following:
coxph(formula, data)
The formula used in the coxph() command needs a survival object as dependent variable which is created using the Surv() command. The syntax of the Surv() command is the following:
Surv(time, event)
where time is the follow-up time per individual and event is the disease/outcome status per individual.
For example, we can examine the effect of frailty on the risk of death. Frailty is measured using a quantitative scale and it would be better to treat it as a categorical variable:
df$frailty <- as.factor(df$frailty)
model <- coxph(Surv(fu_years, death) ~ frailty, data = df)
summary(model)
The output looks like this:
> df$frailty <- as.factor(df$frailty)
> model <- coxph(formula = Surv(fu_years, death) ~ frailty, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ frailty, data = df)
 n= 4327, number of events= 1526Â
      coef exp(coef) se(coef)    z Pr(>|z|)  Â
frailty2 0.37490 Â 1.45484 Â 0.10467 Â 3.582 0.000341 ***
frailty3 0.66515 Â 1.94478 Â 0.10039 Â 6.626 3.45e-11 ***
frailty4 1.05922 Â 2.88412 Â 0.09164 11.558 Â < 2e-16 ***
frailty5 1.74762 Â 5.74090 Â 0.08719 20.044 Â < 2e-16 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
     exp(coef) exp(-coef) lower .95 upper .95
frailty2 Â Â 1.455 Â Â 0.6874 Â Â 1.185 Â Â 1.786
frailty3 Â Â 1.945 Â Â 0.5142 Â Â 1.597 Â Â 2.368
frailty4 Â Â 2.884 Â Â 0.3467 Â Â 2.410 Â Â 3.452
frailty5 Â Â 5.741 Â Â 0.1742 Â Â 4.839 Â Â 6.811
Concordance= 0.671 Â (se = 0.007 )
Likelihood ratio test= 577.2 Â on 4 df, Â p=<2e-16
Wald test       = 576.4  on 4 df,  p=<2e-16
Score (logrank) test = 667.9 Â on 4 df, Â p=<2e-16
Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant.
- Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?
Answer
> model <- coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)
 n= 4327, number of events= 1526Â
       coef exp(coef) se(coef)   z Pr(>|z|)  Â
vitd_med -0.56864 Â 0.56630 Â 0.05266 -10.8 Â <2e-16 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
     exp(coef) exp(-coef) lower .95 upper .95
vitd_med   0.5663    1.766   0.5108   0.6279
Concordance= 0.57 Â (se = 0.006 )
Likelihood ratio test= 120.3 Â on 1 df, Â p=<2e-16
Wald test       = 116.6  on 1 df,  p=<2e-16
Score (logrank) test = 119.8 Â on 1 df, Â p=<2e-16
Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).
Multivariable Cox Regression
We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group, we type:
df$age_grp <- as.factor(df$age_grp)
model <- coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)
summary(model)
Note: Age is recorded as a categorical variable. For this reason, we should define the variable age_grp as a factor (and not as numeric). We do that using the as.factor() command.
The output should look like this:
Call:
coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)
 n= 4327, number of events= 1526Â
       coef exp(coef) se(coef)    z Pr(>|z|)  Â
vitd_med -0.39618 Â 0.67288 Â 0.05348 -7.408 1.28e-13 ***
age_grp2 Â 0.62103 Â 1.86085 Â 0.09632 Â 6.448 1.13e-10 ***
age_grp3 Â 1.14063 Â 3.12874 Â 0.09736 11.716 Â < 2e-16 ***
age_grp4 Â 1.75800 Â 5.80083 Â 0.09967 17.638 Â < 2e-16 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
     exp(coef) exp(-coef) lower .95 upper .95
vitd_med   0.6729   1.4861   0.6059   0.7472
age_grp2 Â Â 1.8608 Â Â 0.5374 Â Â 1.5407 Â Â 2.2475
age_grp3 Â Â 3.1287 Â Â 0.3196 Â Â 2.5852 Â Â 3.7865
age_grp4 Â Â 5.8008 Â Â 0.1724 Â Â 4.7715 Â Â 7.0523
Concordance= 0.664 Â (se = 0.007 )
Likelihood ratio test= 559.9 Â on 4 df, Â p=<2e-16
Wald test       = 560.2  on 4 df,  p=<2e-16
Score (logrank) test = 638.8 Â on 4 df, Â p=<2e-16
We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.57 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.
- Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?
Answer
> model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)
 n= 4327, number of events= 1526Â
coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â
frailty2 0.27901 Â 1.32182 Â 0.10487 Â 2.660 Â 0.0078 **Â
frailty3 0.47799 Â 1.61282 Â 0.10114 Â 4.726 2.29e-06 ***
frailty4 0.80050 Â 2.22665 Â 0.09318 Â 8.591 Â < 2e-16 ***
frailty5 1.41722 Â 4.12563 Â 0.08984 15.775 Â < 2e-16 ***
age_grp2 0.56432 Â 1.75825 Â 0.09645 Â 5.851 4.90e-09 ***
age_grp3 0.97286 Â 2.64549 Â 0.09809 Â 9.918 Â < 2e-16 ***
age_grp4 1.46314 Â 4.31950 Â 0.10113 14.468 Â < 2e-16 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 exp(coef) exp(-coef) lower .95 upper .95
frailty2 Â Â 1.322 Â Â 0.7565 Â Â 1.076 Â Â 1.623
frailty3 Â Â 1.613 Â Â 0.6200 Â Â 1.323 Â Â 1.966
frailty4 Â Â 2.227 Â Â 0.4491 Â Â 1.855 Â Â 2.673
frailty5 Â Â 4.126 Â Â 0.2424 Â Â 3.460 Â Â 4.920
age_grp2 Â Â 1.758 Â Â 0.5687 Â Â 1.455 Â Â 2.124
age_grp3 Â Â 2.645 Â Â 0.3780 Â Â 2.183 Â Â 3.206
age_grp4 Â Â 4.319 Â Â 0.2315 Â Â 3.543 Â Â 5.266
Concordance= 0.709 Â (se = 0.006 )
Likelihood ratio test= 861.8 Â on 7 df, Â p=<2e-16
Wald test       = 862.2  on 7 df,  p=<2e-16
Score (logrank) test = 1018 Â on 7 df, Â Â p=<2e-16
Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).
Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposure’s association may get stronger.
C2.3a PRACTICAL: SPSS
Unadjusted Cox regression
To conduct the Cox Regression in SPSS
Select
Analyze>.> Survival >> Cox Regression
Place the time variable (fu_years) in the ‘Time’ box, and the death variable in the ‘Status’ box and define the event just as you did with the Kaplan-Meier.
In this example we are going to add frailty as an explanatory variable in the model. Add frailty to the Covariates box, then click on ‘Categorical’ to define the frailty variable as categorical, just as before.
Click on options and tick the box next to CI for exp(B) to show the confidence interval for the hazard ratios in the output. Keep the default value of 95%. Press continue and then OK to run the test.

The output table you are interested in will look like the below.

Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant (P<0.001).
Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?
Answer
The output table will look like the below.

Participants that have vitamin D concentrations above the median have a 44% lower hazard (or risk) of dying, and this is statistically significant (HR 0.56, 95% CI: 0.50-0.62).
Â
Multivariable Cox Regression
We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group we can just add age group into the analysis as another categorical covariate.

The output looks like the below.

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.56 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.
Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?
Answer

Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).
Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposure’s association may get stronger.
Video C2.3b – Proportional Hazards Assumption (8 minutes)
C2.3b PRACTICAL: Stata
Test the proportional hazards assumption
There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the proportional hazards assumption.
The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.
Fitting an interaction with time
If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. The ‘tvc’ and ‘texp’ options of the stcox are used for this.
For example, to test age the command would be:
stcox i.age_grp frailty, tvc(age_grp)
——————————————————————————
     _t | Haz. ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
main     |
   age_grp |
   71-75  |  1.534684  .1700356   3.87  0.000   1.235122    1.9069
   76-80  |  2.028394  .2977641   4.82  0.000    1.52124   2.704624
   81-95  |  3.009572  .5683724   5.83  0.000   2.078505   4.35771
       |
   frailty |  1.434477  .0290646   17.81  0.000   1.378627   1.492588
————-+—————————————————————-
tvc      |
   age_grp |  1.026909   .012475   2.19  0.029   1.002747   1.051653
——————————————————————————
Note: Variables in tvc equation interacted with _t.
When we look at the p-value for the coefficient in the ‘tvc’ row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.
Inspecting log-log plots
To compute a log-log plot, the command is ‘stphplot’, which can be adjusted for confounders.
A log-log plot for age group is computed using the command below:
stphplot, by(age_grp) adjust(frailty)
When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the ‘tvc’ command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.
- Question C2.3b:Â Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.Â
Answer
stphplot, by(frailty) adjust(age_grp)

In this log-log plot, we see the curves for the different levels of frailty (levels 1-4) criss-cross and touch each other at several points over time. This is a clearer violation of the PH assumption.       Â
Fixing violations in the PH assumption
If the PH assumption has been violated for a covariate, the lecture mentioned 2 ways to deal with the situation. One way is to specify the model using the ‘tvc’ option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.
We do this by using the strata option of the stcox command.
stcox i.age_grp, strata(frailty)
You do not get HRs or p-values for each stratum. Instead, Stata assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [ie no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.
If you run the command you above, you will get this output:
Stratified Cox regression with Breslow   method   for ties
Strata variable: frailty
No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Number of obs = Â 4,327
No. of failures = Â Â Â 1,526
Time at risk   = 29,476.9825
      LR chi2(3)   = 280.76
Log likelihood = -9708.127 Â Â Â Â Â Â Prob > chi2 Â = 0.0000
     Â
_t  Haz. ratio  Std. err.   z   P>z   [95% conf. interval]
      Â
age_grpÂ
71-75 Â Â 1.756097 Â .1693938 Â Â 5.84 Â Â 0.000 Â Â 1.453588 Â Â 2.121562
76-80 Â Â 2.642535 Â .2592631 Â Â 9.90 Â Â 0.000 Â Â 2.180259 Â Â 3.202827
81-95 Â Â Â 4.28578 Â .4339019 Â Â 14.37 Â Â 0.000 Â Â 3.514409 Â Â 5.226457
Notice that you do not get any output regarding the variable ‘frailty’ that you stratified on, although Stata indicates at the top of the output that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).
C2.3b PRACTICAL: R
Test the proportional hazards assumption
There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the proportional hazards assumption.
The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.
Fitting an interaction with time
If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. In R, the interaction of a covariate with time can be fitted by using the time-transform functionality tt() in the coxph() command.
model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp), data = df)
summary(model)
The output looks like this:
Call:
coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp),Â
  data = df)
 n= 4327, number of events= 1526Â
        coef exp(coef) se(coef)    z Pr(>|z|)  Â
frailty2 Â Â 0.27705 Â 1.31923 Â 0.10488 Â 2.642 Â 0.00825 **Â
frailty3 Â Â 0.47470 Â 1.60754 Â 0.10115 Â 4.693 2.69e-06 ***
frailty4 Â Â 0.79874 Â 2.22273 Â 0.09317 Â 8.573 Â < 2e-16 ***
frailty5 Â Â 1.41966 Â 4.13570 Â 0.08975 15.817 Â < 2e-16 ***
age_grp2 Â Â 0.53277 Â 1.70365 Â 0.09704 Â 5.490 4.01e-08 ***
age_grp3 Â Â 0.82084 Â 2.27241 Â 0.11049 Â 7.429 1.09e-13 ***
age_grp4 Â Â 0.68145 Â 1.97674 Â 0.28382 Â 2.401 Â 0.01635 * Â
tt(age_grp) 0.04261 Â 1.04353 Â 0.01434 Â 2.971 Â 0.00297 **Â
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      exp(coef) exp(-coef) lower .95 upper .95
frailty2 Â Â Â Â 1.319 Â Â 0.7580 Â Â 1.074 Â Â 1.620
frailty3 Â Â Â Â 1.608 Â Â 0.6221 Â Â 1.318 Â Â 1.960
frailty4 Â Â Â Â 2.223 Â Â 0.4499 Â Â 1.852 Â Â 2.668
frailty5 Â Â Â Â 4.136 Â Â 0.2418 Â Â 3.469 Â Â 4.931
age_grp2 Â Â Â Â 1.704 Â Â 0.5870 Â Â 1.409 Â Â 2.061
age_grp3 Â Â Â Â 2.272 Â Â 0.4401 Â Â 1.830 Â Â 2.822
age_grp4 Â Â Â Â 1.977 Â Â 0.5059 Â Â 1.133 Â Â 3.448
tt(age_grp) Â Â 1.044 Â Â 0.9583 Â Â 1.015 Â Â 1.073
Concordance= 0.71 Â (se = 0.007 )
Likelihood ratio test= 870.4 Â on 8 df, Â p=<2e-16
Wald test       = 880.1  on 8 df,  p=<2e-16
Score (logrank) test = 1045 Â on 8 df, Â Â p=<2e-16
When we look at the p-value for the coefficient in the tt(age_grp) row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.
Inspecting log-log plots
To compute a log-log plot, we should compute the survival curves for our variable using the survfit() command and then we can print the log-log plot using the ggsurvplot() command.
A log-log plot for age group is computed using the command below:
model <- coxph(Surv(fu_years, death) ~ frailty + strata(age_grp), data = df)
fit <- survfit(model)
ggsurvplot(fit, data = df, fun = “cloglog”)
When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the tt() command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.
Testing the residuals for a correlation with time
The PH assumption can be also assessed using a statistical test based on the scaled Schoenfeld residuals. In principle, Schoenfeld residuals are independent of time. If a plot shows that Schoenfeld residuals are distributed in a non-random pattern against time, this is evidence of violation of the PH assumption.
We can test the proportional hazards assumption for each covariate in a Cox model using the cox.zph() command.
In a cox model including age_grp and frailty as covariates, we can examine if any of these variables violate the PH assumption by running the following command:
model <- coxph(Surv(fu_years, death) ~ frailty + age_grp, data = df)
cox.zph(model)
The output looks like this:
    chisq df    p
frailty  9.51  4 0.04945
age_grp 11.47 Â 3 0.00943
GLOBAL Â 26.33 Â 7 0.00044
The cox.zph() command has two outputs. The first one is a chi-squared test and a p-value for testing the violation of PH assumption for each one of the variables included in the cox model. The second one is a global chi-squared test and a p-value for testing if there is any variable in our model that violates the PH assumption.
We observe that both frailty and age_grp variables violate the PH assumption.
- Question C2.3b:Â Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.Â
Answer
model <- coxph(Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)
fit <- survfit(model)
ggsurvplot(fit, data = df, fun = “cloglog”)

In this log-log plot, we see the curves for the different levels of frailty (levels 1-4) criss-cross and touch each other at several points over time. This is a clearer violation of the PH assumption.
Fixing violations in the PH assumption
If the PH assumption has been violated for a covariate, we can fit a stratified Cox model where we stratify by the covariate that violates the PH assumption.
To stratify a Cox model by a specific variable, we use the strata() command from the survival R package.
model <- coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)
summary(model)
You do not get HRs or p-values for each stratum. Instead, R assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e., no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.
If you run the command above, you will get this output:
Call:
coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp,Â
  data = df)
 n= 4327, number of events= 1526Â
      coef exp(coef) se(coef)    z Pr(>|z|)  Â
age_grp2 0.56309 Â 1.75610 Â 0.09646 Â 5.838 Â 5.3e-09 ***
age_grp3 0.97174 Â 2.64253 Â 0.09811 Â 9.904 Â < 2e-16 ***
age_grp4 1.45530 Â 4.28579 Â 0.10124 14.374 Â < 2e-16 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Notice that you do not get any output regarding the variable ‘frailty’ that you stratified on, although R indicates in the formula that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).
C2.3b PRACTICAL: SPSS
Test the proportional hazards assumption
There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the proportional hazards assumption.
The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.
Fitting an interaction with time
If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without.
Select
Analyze >> Survival >> Cox w/ Time-Dep Cov
This opens a window for you to define your time dependant covariate. The variable Time [T_] will be added to your list of potential variables. You need to use this, multiplied by your variable of interest (in his case age group) to define the time dependant covariate. When you have defined the expression, press ‘Model’.

The normal Cox Regression box will then open, with your new time dependant covariate indicated by T_COV in the left hand menu. Set up your analysis exactly as before, but add T_COV as another covariate.

Your output will look like the below.

When we look at the p-value for the coefficient in the T_COV row, we see that it is significant (p=0.022). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.
Inspecting log-log plots
To compute a log-log plot in SPSS click on ‘Plots’ within the Cox Regression window. Tick the box next to ‘log minus log’ and then move the variable you are interested in into the ‘Separate Lines for:’ box.

Your output will look like the below.

When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.
Fixing violations in the PH assumption
If the PH assumption has been violated for a covariate, the lecture mentioned two ways to deal with the situation. One way is to specify the model using the T_COV option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.
We do this by adding the variable (frailty in this example) to the ‘Strata’ box in the Cox Regression window rather than the ‘Covariate’ box.

You do not get HRs or p-values for each stratum. Instead, SPSS assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e. no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

You would still interpret the HR of age_grp as having been adjusted for frailty. So, you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).
No presentation. Please add downloadable presentation