FoSSA: Fundamentals of Statistical Software & Analysis C2.3 Cox Proportional Hazards Regression

Lesson 47 of 50

In Progress

← Previous Next→

C2.3 Cox Proportional Hazards Regression

Learning Outcomes

By the end of this section, students will be able to:

Understand the theory of proportional hazards models
Calculate and interpret the results of a Cox proportional hazards regression model
Test the assumptions underlying a Cox regression model

Video C2.3a – Cox Model (8 minutes)

C2.3a PRACTICAL: Stata

Unadjusted Cox regression

The ‘stcox’ command is used to run a Cox proportional hazards regression model. You must use the ‘stset’ command before using stcox, and due to this you do not need to specify an outcome variable or a time variable again. You only need to specify the explanatory variables that you are examining in your model.

stcox [varlist] [if] [in] [, options]

For example, for frailty, the command would be:

stcox i.frailty

The output looks like this:

Cox regression with Breslow method for ties

No. of subjects = 4,327 Number of obs = 4,327
No. of failures = 1,526
Time at risk = 29,476.9825

LR chi2(4) = 577.22
Log likelihood = -12128.151 Prob > chi2 = 0.0000

——————————————————————————
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
frailty |
2 | 1.45484 .1522779 3.58 0.000 1.185005 1.786119
3 | 1.944776 .195232 6.63 0.000 1.597421 2.367664
4 | 2.884121 .2643057 11.56 0.000 2.409949 3.451588
5 | 5.740893 .5005441 20.04 0.000 4.839091 6.810752
——————————————————————————

Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant.

Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?

C2.3a.i. Answer

stcox i.vitd_med

Failure _d: death
Analysis time _t: fu_years
ID variable: whl1_id

Iteration 0: log likelihood = -12416.762
Iteration 1: log likelihood = -12356.639
Iteration 2: log likelihood = -12356.629
Refining estimates:
Iteration 0: log likelihood = -12356.629

Cox regression with Breslow method for ties

No. of subjects = 4,327 Number of obs = 4,327
No. of failures = 1,526
Time at risk = 29,476.9825
LR chi2(1) = 120.27
Log likelihood = -12356.629 Prob > chi2 = 0.0000

——————————————————————————
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
2.vitd_med | .566297 .0298226 -10.80 0.000 .5107612 .6278712
——————————————————————————

Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).

Multivariable Cox Regression

We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group, we type:

stcox vitd_med i.age_grp

The output should look like this:

Failure _d: death
Analysis time _t: fu_years
ID variable: whl1_id

Iteration 0: log likelihood = -12416.762
Iteration 1: log likelihood = -12186.532
Iteration 2: log likelihood = -12137.041
Iteration 3: log likelihood = -12136.832
Iteration 4: log likelihood = -12136.832
Refining estimates:
Iteration 0: log likelihood = -12136.832

Cox regression with Breslow method for ties

No. of subjects = 4,327 Number of obs = 4,327
No. of failures = 1,526
Time at risk = 29,476.9825
LR chi2(4) = 559.86
Log likelihood = -12136.832 Prob > chi2 = 0.0000

——————————————————————————
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
vitd_med | .6728851 .035984 -7.41 0.000 .605928 .7472412
|
age_grp |
71-75 | 1.860847 .1792291 6.45 0.000 1.540729 2.247475
76-80 | 3.128741 .3046026 11.72 0.000 2.585233 3.786513
81-95 | 5.800825 .5781615 17.64 0.000 4.771463 7.052256
——————————————————————————

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.57 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.

Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?

C2.3a.ii. Answer

stcox i.frailty i.age_grp

Failure _d: death
Analysis time _t: fu_years
ID variable: whl1_id

Iteration 0: log likelihood = -12416.762
Iteration 1: log likelihood = -12076.963
Iteration 2: log likelihood = -11986.28
Iteration 3: log likelihood = -11985.881
Iteration 4: log likelihood = -11985.881
Refining estimates:
Iteration 0: log likelihood = -11985.881

Cox regression with Breslow method for ties

No. of subjects = 4,327 Number of obs = 4,327
No. of failures = 1,526
Time at risk = 29,476.9825
LR chi2(7) = 861.76
Log likelihood = -11985.881 Prob > chi2 = 0.0000

——————————————————————————
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
frailty |
2 | 1.321821 .1386221 2.66 0.008 1.07623 1.623453
3 | 1.612825 .1631257 4.73 0.000 1.3228 1.966437
4 | 2.226646 .2074741 8.59 0.000 1.854975 2.672787
5 | 4.125627 .3706378 15.78 0.000 3.459551 4.919944
|
age_grp |
71-75 | 1.758254 .1695919 5.85 0.000 1.455389 2.124144
76-80 | 2.645494 .2594916 9.92 0.000 2.1828 3.206266
81-95 | 4.31949 .4368414 14.47 0.000 3.542813 5.266435

Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).

Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposure’s association may get stronger.

C2.3a PRACTICAL: R

Unadjusted Cox regression

The coxph() command is used to run a Cox proportional hazards regression model in R. The simplified syntax of the coxph() command is the following:

coxph(formula, data)

The formula used in the coxph() command needs a survival object as dependent variable which is created using the Surv() command. The syntax of the Surv() command is the following:

Surv(time, event)

where time is the follow-up time per individual and event is the disease/outcome status per individual.

For example, we can examine the effect of frailty on the risk of death. Frailty is measured using a quantitative scale and it would be better to treat it as a categorical variable:

df$frailty <- as.factor(df$frailty)
model <- coxph(Surv(fu_years, death) ~ frailty, data = df)
summary(model)

The output looks like this:

> df$frailty <- as.factor(df$frailty)
> model <- coxph(formula = Surv(fu_years, death) ~ frailty, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ frailty, data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
frailty2 0.37490 1.45484 0.10467 3.582 0.000341 ***
frailty3 0.66515 1.94478 0.10039 6.626 3.45e-11 ***
frailty4 1.05922 2.88412 0.09164 11.558 < 2e-16 ***
frailty5 1.74762 5.74090 0.08719 20.044 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
frailty2 1.455 0.6874 1.185 1.786
frailty3 1.945 0.5142 1.597 2.368
frailty4 2.884 0.3467 2.410 3.452
frailty5 5.741 0.1742 4.839 6.811

Concordance= 0.671 (se = 0.007 )
Likelihood ratio test= 577.2 on 4 df, p=<2e-16
Wald test = 576.4 on 4 df, p=<2e-16
Score (logrank) test = 667.9 on 4 df, p=<2e-16

Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?

Answer

> model <- coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
vitd_med -0.56864 0.56630 0.05266 -10.8 <2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
vitd_med 0.5663 1.766 0.5108 0.6279

Concordance= 0.57 (se = 0.006 )
Likelihood ratio test= 120.3 on 1 df, p=<2e-16
Wald test = 116.6 on 1 df, p=<2e-16
Score (logrank) test = 119.8 on 1 df, p=<2e-16

Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).

Multivariable Cox Regression

df$age_grp <- as.factor(df$age_grp)
model <- coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)
summary(model)

Note: Age is recorded as a categorical variable. For this reason, we should define the variable age_grp as a factor (and not as numeric). We do that using the as.factor() command.

The output should look like this:

Call:
coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
vitd_med -0.39618 0.67288 0.05348 -7.408 1.28e-13 ***
age_grp2 0.62103 1.86085 0.09632 6.448 1.13e-10 ***
age_grp3 1.14063 3.12874 0.09736 11.716 < 2e-16 ***
age_grp4 1.75800 5.80083 0.09967 17.638 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
vitd_med 0.6729 1.4861 0.6059 0.7472
age_grp2 1.8608 0.5374 1.5407 2.2475
age_grp3 3.1287 0.3196 2.5852 3.7865
age_grp4 5.8008 0.1724 4.7715 7.0523

Concordance= 0.664 (se = 0.007 )
Likelihood ratio test= 559.9 on 4 df, p=<2e-16
Wald test = 560.2 on 4 df, p=<2e-16
Score (logrank) test = 638.8 on 4 df, p=<2e-16

Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?

Answer

> model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)
> summary(model)
Call:
coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
frailty2 0.27901 1.32182 0.10487 2.660 0.0078 **
frailty3 0.47799 1.61282 0.10114 4.726 2.29e-06 ***
frailty4 0.80050 2.22665 0.09318 8.591 < 2e-16 ***
frailty5 1.41722 4.12563 0.08984 15.775 < 2e-16 ***
age_grp2 0.56432 1.75825 0.09645 5.851 4.90e-09 ***
age_grp3 0.97286 2.64549 0.09809 9.918 < 2e-16 ***
age_grp4 1.46314 4.31950 0.10113 14.468 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
frailty2 1.322 0.7565 1.076 1.623
frailty3 1.613 0.6200 1.323 1.966
frailty4 2.227 0.4491 1.855 2.673
frailty5 4.126 0.2424 3.460 4.920
age_grp2 1.758 0.5687 1.455 2.124
age_grp3 2.645 0.3780 2.183 3.206
age_grp4 4.319 0.2315 3.543 5.266

Concordance= 0.709 (se = 0.006 )
Likelihood ratio test= 861.8 on 7 df, p=<2e-16
Wald test = 862.2 on 7 df, p=<2e-16
Score (logrank) test = 1018 on 7 df, p=<2e-16

C2.3a PRACTICAL: SPSS

Unadjusted Cox regression

To conduct the Cox Regression in SPSS

Select

Analyze>.> Survival >> Cox Regression

Place the time variable (fu_years) in the ‘Time’ box, and the death variable in the ‘Status’ box and define the event just as you did with the Kaplan-Meier.

In this example we are going to add frailty as an explanatory variable in the model. Add frailty to the Covariates box, then click on ‘Categorical’ to define the frailty variable as categorical, just as before.

Click on options and tick the box next to CI for exp(B) to show the confidence interval for the hazard ratios in the output. Keep the default value of 95%. Press continue and then OK to run the test.

The output table you are interested in will look like the below.

Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?

Answer

The output table will look like the below.

Participants that have vitamin D concentrations above the median have a 44% lower hazard (or risk) of dying, and this is statistically significant (HR 0.56, 95% CI: 0.50-0.62).

Multivariable Cox Regression

The output looks like the below.

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.56 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.

Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?

Answer

Video C2.3b – Proportional Hazards Assumption (8 minutes)

C2.3b PRACTICAL: Stata

Test the proportional hazards assumption

There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the proportional hazards assumption.

The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.

Fitting an interaction with time

If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. The ‘tvc’ and ‘texp’ options of the stcox are used for this.

For example, to test age the command would be:

stcox i.age_grp frailty, tvc(age_grp)

——————————————————————————
_t | Haz. ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
main |
age_grp |
71-75 | 1.534684 .1700356 3.87 0.000 1.235122 1.9069
76-80 | 2.028394 .2977641 4.82 0.000 1.52124 2.704624
81-95 | 3.009572 .5683724 5.83 0.000 2.078505 4.35771
|
frailty | 1.434477 .0290646 17.81 0.000 1.378627 1.492588
————-+—————————————————————-
tvc |
age_grp | 1.026909 .012475 2.19 0.029 1.002747 1.051653
——————————————————————————
Note: Variables in tvc equation interacted with _t.

When we look at the p-value for the coefficient in the ‘tvc’ row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

Inspecting log-log plots

To compute a log-log plot, the command is ‘stphplot’, which can be adjusted for confounders.

A log-log plot for age group is computed using the command below:

stphplot, by(age_grp) adjust(frailty)

When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the ‘tvc’ command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.

Question C2.3b: Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.

Answer

stphplot, by(frailty) adjust(age_grp)

In this log-log plot, we see the curves for the different levels of frailty (levels 1-4) criss-cross and touch each other at several points over time. This is a clearer violation of the PH assumption.

Fixing violations in the PH assumption

If the PH assumption has been violated for a covariate, the lecture mentioned 2 ways to deal with the situation. One way is to specify the model using the ‘tvc’ option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.

We do this by using the strata option of the stcox command.

stcox i.age_grp, strata(frailty)

You do not get HRs or p-values for each stratum. Instead, Stata assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [ie no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

If you run the command you above, you will get this output:

Stratified Cox regression with Breslow method for ties
Strata variable: frailty

No. of subjects = 4,327 Number of obs = 4,327
No. of failures = 1,526
Time at risk = 29,476.9825
LR chi2(3) = 280.76
Log likelihood = -9708.127 Prob > chi2 = 0.0000

_t Haz. ratio Std. err. z P>z [95% conf. interval]

age_grp
71-75 1.756097 .1693938 5.84 0.000 1.453588 2.121562
76-80 2.642535 .2592631 9.90 0.000 2.180259 3.202827
81-95 4.28578 .4339019 14.37 0.000 3.514409 5.226457

Notice that you do not get any output regarding the variable ‘frailty’ that you stratified on, although Stata indicates at the top of the output that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

C2.3b PRACTICAL: R

Test the proportional hazards assumption

Fitting an interaction with time

If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. In R, the interaction of a covariate with time can be fitted by using the time-transform functionality tt() in the coxph() command.

model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp), data = df)

summary(model)

The output looks like this:

Call:
coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp),
data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
frailty2 0.27705 1.31923 0.10488 2.642 0.00825 **
frailty3 0.47470 1.60754 0.10115 4.693 2.69e-06 ***
frailty4 0.79874 2.22273 0.09317 8.573 < 2e-16 ***
frailty5 1.41966 4.13570 0.08975 15.817 < 2e-16 ***
age_grp2 0.53277 1.70365 0.09704 5.490 4.01e-08 ***
age_grp3 0.82084 2.27241 0.11049 7.429 1.09e-13 ***
age_grp4 0.68145 1.97674 0.28382 2.401 0.01635 *
tt(age_grp) 0.04261 1.04353 0.01434 2.971 0.00297 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
frailty2 1.319 0.7580 1.074 1.620
frailty3 1.608 0.6221 1.318 1.960
frailty4 2.223 0.4499 1.852 2.668
frailty5 4.136 0.2418 3.469 4.931
age_grp2 1.704 0.5870 1.409 2.061
age_grp3 2.272 0.4401 1.830 2.822
age_grp4 1.977 0.5059 1.133 3.448
tt(age_grp) 1.044 0.9583 1.015 1.073

Concordance= 0.71 (se = 0.007 )
Likelihood ratio test= 870.4 on 8 df, p=<2e-16
Wald test = 880.1 on 8 df, p=<2e-16
Score (logrank) test = 1045 on 8 df, p=<2e-16

When we look at the p-value for the coefficient in the tt(age_grp) row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

Inspecting log-log plots

To compute a log-log plot, we should compute the survival curves for our variable using the survfit() command and then we can print the log-log plot using the ggsurvplot() command.

A log-log plot for age group is computed using the command below:

model <- coxph(Surv(fu_years, death) ~ frailty + strata(age_grp), data = df)

fit <- survfit(model)

ggsurvplot(fit, data = df, fun = “cloglog”)

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the tt() command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.

Testing the residuals for a correlation with time

The PH assumption can be also assessed using a statistical test based on the scaled Schoenfeld residuals. In principle, Schoenfeld residuals are independent of time. If a plot shows that Schoenfeld residuals are distributed in a non-random pattern against time, this is evidence of violation of the PH assumption.

We can test the proportional hazards assumption for each covariate in a Cox model using the cox.zph() command.

In a cox model including age_grp and frailty as covariates, we can examine if any of these variables violate the PH assumption by running the following command:

model <- coxph(Surv(fu_years, death) ~ frailty + age_grp, data = df)
cox.zph(model)

The output looks like this:
chisq df p
frailty 9.51 4 0.04945
age_grp 11.47 3 0.00943
GLOBAL 26.33 7 0.00044

The cox.zph() command has two outputs. The first one is a chi-squared test and a p-value for testing the violation of PH assumption for each one of the variables included in the cox model. The second one is a global chi-squared test and a p-value for testing if there is any variable in our model that violates the PH assumption.

We observe that both frailty and age_grp variables violate the PH assumption.

Question C2.3b: Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.

Answer

model <- coxph(Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)

fit <- survfit(model)

ggsurvplot(fit, data = df, fun = “cloglog”)

Fixing violations in the PH assumption

If the PH assumption has been violated for a covariate, we can fit a stratified Cox model where we stratify by the covariate that violates the PH assumption.

To stratify a Cox model by a specific variable, we use the strata() command from the survival R package.

model <- coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)
summary(model)

You do not get HRs or p-values for each stratum. Instead, R assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e., no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

If you run the command above, you will get this output:

Call:
coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp,
data = df)

n= 4327, number of events= 1526

coef exp(coef) se(coef) z Pr(>|z|)
age_grp2 0.56309 1.75610 0.09646 5.838 5.3e-09 ***
age_grp3 0.97174 2.64253 0.09811 9.904 < 2e-16 ***
age_grp4 1.45530 4.28579 0.10124 14.374 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Notice that you do not get any output regarding the variable ‘frailty’ that you stratified on, although R indicates in the formula that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

C2.3b PRACTICAL: SPSS

Test the proportional hazards assumption

Fitting an interaction with time

Select

Analyze >> Survival >> Cox w/ Time-Dep Cov

This opens a window for you to define your time dependant covariate. The variable Time [T_] will be added to your list of potential variables. You need to use this, multiplied by your variable of interest (in his case age group) to define the time dependant covariate. When you have defined the expression, press ‘Model’.

The normal Cox Regression box will then open, with your new time dependant covariate indicated by T_COV in the left hand menu. Set up your analysis exactly as before, but add T_COV as another covariate.

Your output will look like the below.

When we look at the p-value for the coefficient in the T_COV row, we see that it is significant (p=0.022). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

Inspecting log-log plots

To compute a log-log plot in SPSS click on ‘Plots’ within the Cox Regression window. Tick the box next to ‘log minus log’ and then move the variable you are interested in into the ‘Separate Lines for:’ box.

Your output will look like the below.

Fixing violations in the PH assumption

If the PH assumption has been violated for a covariate, the lecture mentioned two ways to deal with the situation. One way is to specify the model using the T_COV option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.

We do this by adding the variable (frailty in this example) to the ‘Strata’ box in the Cox Regression window rather than the ‘Covariate’ box.

You do not get HRs or p-values for each stratum. Instead, SPSS assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e. no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

You would still interpret the HR of age_grp as having been adjusted for frailty. So, you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

3 Comments

Newest

Oldest Most Voted

Inline Feedbacks

View all comments

Audu

5 months ago

Good

Manuel Abraham

6 months ago

Hi, dear team, can you help me, please?

When I run this script, I always get an error

model7 <- coxph(
  formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp),
  data = df,
  tt = function(x, t, …) x * log(t + 1)
)

Error in tt(age_grp) : argument “t” is missing, with no default

Sayed Jalal

2 years ago

No presentation. Please add downloadable presentation

FoSSA: Fundamentals of Statistical Software & Analysis

Course Information

Participants 1693

C2.3 Cox Proportional Hazards Regression

Learning Outcomes

Unadjusted Cox regression

Multivariable Cox Regression

Unadjusted Cox regression

The coxph() command is used to run a Cox proportional hazards regression model in R. The simplified syntax of the coxph() command is the following:

Multivariable Cox Regression

Test the proportional hazards assumption

Fitting an interaction with time

Inspecting log-log plots

Fixing violations in the PH assumption

Test the proportional hazards assumption

Fitting an interaction with time

Inspecting log-log plots

Testing the residuals for a correlation with time

Fixing violations in the PH assumption

Live Bootcamps

Training & Consulting

Free Courses

Get In Touch

TERMS OF SERVICE

PRIVACY POLICY

Join Our Mailing List

FoSSA: Fundamentals of Statistical Software & Analysis

Course Information

Participants 1693

C2.3 Cox Proportional Hazards Regression

Learning Outcomes

Unadjusted Cox regression

Multivariable Cox Regression

Unadjusted Cox regression

The coxph() command is used to run a Cox proportional hazards regression model in R. The simplified syntax of the coxph() command is the following:

Multivariable Cox Regression

Test the proportional hazards assumption

Fitting an interaction with time

Inspecting log-log plots

Fixing violations in the PH assumption

Test the proportional hazards assumption

Fitting an interaction with time

Inspecting log-log plots

Testing the residuals for a correlation with time

Fixing violations in the PH assumption

Live Bootcamps

Training & Consulting

Free Courses

Get In Touch

TERMS OF SERVICE

PRIVACY POLICY

Join Our Mailing List

SELECT A COURSE START DATE