## Learning Outcomes

By the end of this section, students will be able to:

- Understand the theory of proportional hazards models
- Calculate and interpret the results of a Cox proportional hazards regression model
- Test the assumptions underlying a Cox regression model

**Video C2.3a – Cox Model (8 minutes)Â **

## C2.3a PRACTICAL: Stata

**Unadjusted Cox regression**

The â€˜stcoxâ€™ command is used to run a Cox proportional hazards regression model. You must use the â€˜stsetâ€™ command before using stcox, and due to this you do not need to specify an outcome variable or a time variable again. You only need to specify the explanatory variables that you are examining in your model.

stcox [varlist] [if] [in] [, options]

For example, for frailty, the command would be:

stcox i.frailty

The output looks like this:

Cox regression with Breslow method for ties

No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327

No. of failures = Â Â Â 1,526

Time at risk Â Â = 29,476.9825

Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â LR chi2(4) Â Â = 577.22

Log likelihood = -12128.151Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000

Â ——————————————————————————

Â Â Â Â Â _t | Haz. ratio Â Std. err. Â Â Â z Â Â P>|z| Â Â [95% conf. interval]

————-+—————————————————————-

Â Â Â frailty |

Â Â Â Â Â 2 Â | Â Â 1.45484 Â .1522779 Â Â 3.58 Â 0.000 Â Â 1.185005 Â Â 1.786119

Â Â Â Â Â 3 Â | Â 1.944776 Â Â .195232 Â Â 6.63 Â 0.000 Â Â 1.597421 Â Â 2.367664

Â Â Â Â Â 4 Â | Â 2.884121 Â .2643057 Â Â 11.56 Â 0.000 Â Â 2.409949 Â Â 3.451588

Â Â Â Â Â 5 Â | Â 5.740893 Â .5005441 Â Â 20.04 Â 0.000 Â Â 4.839091 Â Â 6.810752

——————————————————————————

Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant.

__Question C2.3a.i:__**What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?**

**C2.3a.i. Answer**

stcox i.vitd_med

Â Â Â Failure _d: death

Â Analysis time _t: fu_years

Â Â Â Â ID variable: whl1_id

Iteration 0: Â log likelihood = -12416.762

Iteration 1: Â log likelihood = -12356.639

Iteration 2: Â log likelihood = -12356.629

Refining estimates:

Iteration 0: Â log likelihood = -12356.629

Cox regression with Breslow method for ties

No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327

No. of failures = Â Â Â 1,526

Time at risk Â Â = 29,476.9825

LR chi2(1) Â Â = 120.27

Log likelihood = -12356.629 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000

——————————————————————————

_t | Haz. ratio Â Std. err. Â Â Â z Â Â P>|z| Â Â [95% conf. interval]

————-+—————————————————————-

2.vitd_med | Â Â .566297 Â .0298226 Â -10.80 Â 0.000 Â Â .5107612 Â Â .6278712

——————————————————————————

Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).

**Multivariable Cox Regression**

We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group, we type:

stcox vitd_med i.age_grp

The output should look like this:

Â Â Â Â Failure _d: death

Â Analysis time _t: fu_years

Â Â Â Â ID variable: whl1_id

Iteration 0: Â log likelihood = -12416.762

Iteration 1: Â log likelihood = -12186.532

Iteration 2: Â log likelihood = -12137.041

Iteration 3: Â log likelihood = -12136.832

Iteration 4: Â log likelihood = -12136.832

Refining estimates:

Iteration 0: Â log likelihood = -12136.832

Cox regression with Breslow method for ties

No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327

No. of failures = Â Â Â 1,526

Time at risk Â Â = 29,476.9825

Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â LR chi2(4) Â Â = 559.86

Log likelihood = -12136.832 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000

——————————————————————————

Â Â Â Â Â _t | Haz. ratio Â Std. err. Â Â Â z Â Â P>|z| Â Â [95% conf. interval]

————-+—————————————————————-

Â Â vitd_med | Â .6728851 Â Â .035984 Â Â -7.41 Â 0.000 Â Â Â .605928 Â Â .7472412

Â Â Â Â Â Â Â |

Â Â Â age_grp |

Â Â Â 71-75 Â | Â 1.860847 Â .1792291 Â Â 6.45 Â 0.000 Â Â 1.540729 Â Â 2.247475

Â Â Â 76-80 Â | Â 3.128741 Â .3046026 Â Â 11.72 Â 0.000 Â Â 2.585233 Â Â 3.786513

Â Â Â 81-95 Â | Â 5.800825 Â .5781615 Â Â 17.64 Â 0.000 Â Â 4.771463 Â Â 7.052256

——————————————————————————

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.57 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.

__Question C2.3a.ii:__**What is the impact of frailty on the hazard of dying, once adjusted for age_grp?**

**C2.3a.ii. Answer**

stcox i.frailty i.age_grp

Â Â Â Â Failure _d: death

Â Analysis time _t: fu_years

Â Â Â Â ID variable: whl1_id

Iteration 0: Â log likelihood = -12416.762

Iteration 1: Â log likelihood = -12076.963

Iteration 2: Â log likelihood = Â -11986.28

Iteration 3: Â log likelihood = -11985.881

Iteration 4: Â log likelihood = -11985.881

Refining estimates:

Iteration 0: Â log likelihood = -11985.881

Cox regression with Breslow method for ties

No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Â Â Â Â Â Â Â Number of obs = Â 4,327

No. of failures = Â Â Â 1,526

Time at risk Â Â = 29,476.9825

Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â LR chi2(7) Â Â = 861.76

Log likelihood = -11985.881 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Prob > chi2 Â = 0.0000

——————————————————————————

Â Â Â Â Â _t | Haz. ratio Â Std. err. Â Â Â z Â Â P>|z| Â Â [95% conf. interval]

————-+—————————————————————-

Â Â Â frailty |

Â Â Â Â Â 2 Â | Â 1.321821 Â .1386221 Â Â 2.66 Â 0.008 Â Â Â 1.07623 Â Â 1.623453

Â Â Â Â Â 3 Â | Â 1.612825 Â .1631257 Â Â 4.73 Â 0.000 Â Â Â 1.3228 Â Â 1.966437

Â Â Â Â Â 4 Â | Â 2.226646 Â .2074741 Â Â 8.59 Â 0.000 Â Â 1.854975 Â Â 2.672787

Â Â Â Â Â 5 Â | Â 4.125627 Â .3706378 Â Â 15.78 Â 0.000 Â Â 3.459551 Â Â 4.919944

Â Â Â Â Â Â Â |

Â Â Â age_grp |

Â Â Â 71-75 Â | Â 1.758254 Â .1695919 Â Â 5.85 Â 0.000 Â Â 1.455389 Â Â 2.124144

Â Â Â 76-80 Â | Â 2.645494 Â .2594916 Â Â 9.92 Â 0.000 Â Â Â 2.1828 Â Â 3.206266

Â Â Â 81-95 Â | Â Â 4.31949 Â .4368414 Â Â 14.47 Â 0.000 Â Â 3.542813 Â Â 5.266435

Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).

Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposureâ€™s association may get stronger.

## C2.3a PRACTICAL: R

**Unadjusted Cox regression**

#### The coxph() command is used to run a Cox proportional hazards regression model in R. Â The simplified syntax of the coxph() command is the following:

coxph(formula, data)

The formula used in the coxph() command needs a survival object as dependent variable which is created using the Surv() command. The syntax of the Surv() command is the following:

Surv(time, event)

where time is the follow-up time per individual and event is the disease/outcome status per individual.

For example, we can examine the effect of frailty on the risk of death. Frailty is measured using a quantitative scale and it would be better to treat it as a categorical variable:

df$frailty <- as.factor(df$frailty)

model <- coxph(Surv(fu_years, death) ~ frailty, data = df)

summary(model)

The output looks like this:

> df$frailty <- as.factor(df$frailty)

> model <- coxph(formula = Surv(fu_years, death) ~ frailty, data = df)

> summary(model)

Call:

coxph(formula = Surv(fu_years, death) ~ frailty, data = df)

Â n= 4327, number of events= 1526Â

Â Â Â Â Â Â coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â

frailty2 0.37490 Â 1.45484 Â 0.10467 Â 3.582 0.000341 ***

frailty3 0.66515 Â 1.94478 Â 0.10039 Â 6.626 3.45e-11 ***

frailty4 1.05922 Â 2.88412 Â 0.09164 11.558 Â < 2e-16 ***

frailty5 1.74762 Â 5.74090 Â 0.08719 20.044 Â < 2e-16 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Â Â Â Â Â exp(coef) exp(-coef) lower .95 upper .95

frailty2 Â Â 1.455 Â Â 0.6874 Â Â 1.185 Â Â 1.786

frailty3 Â Â 1.945 Â Â 0.5142 Â Â 1.597 Â Â 2.368

frailty4 Â Â 2.884 Â Â 0.3467 Â Â 2.410 Â Â 3.452

frailty5 Â Â 5.741 Â Â 0.1742 Â Â 4.839 Â Â 6.811

Concordance= 0.671 Â (se = 0.007 )

Likelihood ratio test= 577.2 Â on 4 df, Â p=<2e-16

Wald test Â Â Â Â Â Â = 576.4 Â on 4 df, Â p=<2e-16

Score (logrank) test = 667.9 Â on 4 df, Â p=<2e-16

Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant.

__Question C2.3a.i:__**What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?**

**Answer**

> model <- coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)

> summary(model)

Call:

coxph(formula = Surv(fu_years, death) ~ vitd_med, data = df)

Â n= 4327, number of events= 1526Â

Â Â Â Â Â Â Â coef exp(coef) se(coef) Â Â z Pr(>|z|) Â Â

vitd_med -0.56864 Â 0.56630 Â 0.05266 -10.8 Â <2e-16 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Â Â Â Â Â exp(coef) exp(-coef) lower .95 upper .95

vitd_med Â Â 0.5663 Â Â Â 1.766 Â Â 0.5108 Â Â 0.6279

Concordance= 0.57 Â (se = 0.006 )

Likelihood ratio test= 120.3 Â on 1 df, Â p=<2e-16

Wald test Â Â Â Â Â Â = 116.6 Â on 1 df, Â p=<2e-16

Score (logrank) test = 119.8 Â on 1 df, Â p=<2e-16

Participants that have vitamin D concentrations above the median have a 43% lower hazard (or risk) of dying, and this is statistically significant (HR 0.57, 95% CI: 0.51-0.63).

**Multivariable Cox Regression**

We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group, we type:

df$age_grp <- as.factor(df$age_grp)

model <- coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)

summary(model)

Note: Age is recorded as a categorical variable. For this reason, we should define the variable age_grp as a factor (and not as numeric). We do that using the as.factor() command.

The output should look like this:

Call:

coxph(formula = Surv(fu_years, death) ~ vitd_med + age_grp, data = df)

Â n= 4327, number of events= 1526Â

Â Â Â Â Â Â Â coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â

vitd_med -0.39618 Â 0.67288 Â 0.05348 -7.408 1.28e-13 ***

age_grp2 Â 0.62103 Â 1.86085 Â 0.09632 Â 6.448 1.13e-10 ***

age_grp3 Â 1.14063 Â 3.12874 Â 0.09736 11.716 Â < 2e-16 ***

age_grp4 Â 1.75800 Â 5.80083 Â 0.09967 17.638 Â < 2e-16 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Â Â Â Â Â exp(coef) exp(-coef) lower .95 upper .95

vitd_med Â Â 0.6729 Â Â 1.4861 Â Â 0.6059 Â Â 0.7472

age_grp2 Â Â 1.8608 Â Â 0.5374 Â Â 1.5407 Â Â 2.2475

age_grp3 Â Â 3.1287 Â Â 0.3196 Â Â 2.5852 Â Â 3.7865

age_grp4 Â Â 5.8008 Â Â 0.1724 Â Â 4.7715 Â Â 7.0523

Concordance= 0.664 Â (se = 0.007 )

Likelihood ratio test= 559.9 Â on 4 df, Â p=<2e-16

Wald test Â Â Â Â Â Â = 560.2 Â on 4 df, Â p=<2e-16

Score (logrank) test = 638.8 Â on 4 df, Â p=<2e-16

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.57 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.

__Question C2.3a.ii:__**What is the impact of frailty on the hazard of dying, once adjusted for age_grp?**

**Answer**

> model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)

> summary(model)

Call:

coxph(formula = Surv(fu_years, death) ~ frailty + age_grp, data = df)

Â n= 4327, number of events= 1526Â

coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â

frailty2 0.27901 Â 1.32182 Â 0.10487 Â 2.660 Â 0.0078 **Â

frailty3 0.47799 Â 1.61282 Â 0.10114 Â 4.726 2.29e-06 ***

frailty4 0.80050 Â 2.22665 Â 0.09318 Â 8.591 Â < 2e-16 ***

frailty5 1.41722 Â 4.12563 Â 0.08984 15.775 Â < 2e-16 ***

age_grp2 0.56432 Â 1.75825 Â 0.09645 Â 5.851 4.90e-09 ***

age_grp3 0.97286 Â 2.64549 Â 0.09809 Â 9.918 Â < 2e-16 ***

age_grp4 1.46314 Â 4.31950 Â 0.10113 14.468 Â < 2e-16 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Â exp(coef) exp(-coef) lower .95 upper .95

frailty2 Â Â 1.322 Â Â 0.7565 Â Â 1.076 Â Â 1.623

frailty3 Â Â 1.613 Â Â 0.6200 Â Â 1.323 Â Â 1.966

frailty4 Â Â 2.227 Â Â 0.4491 Â Â 1.855 Â Â 2.673

frailty5 Â Â 4.126 Â Â 0.2424 Â Â 3.460 Â Â 4.920

age_grp2 Â Â 1.758 Â Â 0.5687 Â Â 1.455 Â Â 2.124

age_grp3 Â Â 2.645 Â Â 0.3780 Â Â 2.183 Â Â 3.206

age_grp4 Â Â 4.319 Â Â 0.2315 Â Â 3.543 Â Â 5.266

Concordance= 0.709 Â (se = 0.006 )

Likelihood ratio test= 861.8 Â on 7 df, Â p=<2e-16

Wald test Â Â Â Â Â Â = 862.2 Â on 7 df, Â p=<2e-16

Score (logrank) test = 1018 Â on 7 df, Â Â p=<2e-16

Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).

Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposureâ€™s association may get stronger.

## C2.3a PRACTICAL: SPSS

**Unadjusted Cox regression**

To conduct the Cox Regression in SPSS

Select

Analyze>.> Survival >> Cox Regression

Place the time variable (fu_years) in the â€˜Timeâ€™ box, and the death variable in the â€˜Statusâ€™ box and define the event just as you did with the Kaplan-Meier.

In this example we are going to add frailty as an explanatory variable in the model. Add frailty to the Covariates box, then click on â€˜Categoricalâ€™ to define the frailty variable as categorical, just as before.

Click on options and tick the box next to CI for exp(B) to show the confidence interval for the hazard ratios in the output. Keep the default value of 95%. Press continue and then OK to run the test.

The output table you are interested in will look like the below.

Looking at the output, you can see that the hazard of dying is nearly 6 times greater for participants in frailty level 5 compared with frailty level 1 (HR 5.7, 95% CI:4.8-6.8). This is statistically significant (P<0.001).

**Question C2.3a.i: What is association of vitamin D (using the variable coded above and below the median) with the risk (hazard) of dying?**

**Answer**

The output table will look like the below.

Participants that have vitamin D concentrations above the median have a 44% lower hazard (or risk) of dying, and this is statistically significant (HR 0.56, 95% CI: 0.50-0.62).

**Â **

**Multivariable Cox Regression**

We can adjust for confounders in a Cox regression as we do in any other regression model. For example, if we want to examine the independent association of vitamin D on the hazard of mortality, adjusted for age group we can just add age group into the analysis as another categorical covariate.

The output looks like the below.

We can see that once vitamin D has been adjusted for age group, the hazard ratio weakens from 0.56 (unadjusted HR, from section C2.3a.i.) to 0.67. There is still a statistically significant association (p<0.001), but it is weaker (i.e. closer to the null value of HR=1) since we controlled for the important confounder of age.

**Question C2.3a.ii: What is the impact of frailty on the hazard of dying, once adjusted for age_grp?**

**Answer**

Participants in frailty level 5 have a hazard of dying that is over 4 times greater than those in frailty level 1, once adjusted for age group (HR 4.1, 95% CI: 3.5-4.9, p<0.001). The association of frailty level 5 was made weaker by the adjustment for age group (if we compare the association here to that seen above in the unadjusted model).

Note: In these adjusted Cox regression model examples, adjusting for confounders weakened the associations. However, remember that confounding can make associations appear spuriously weaker or stronger. Therefore, sometimes when you adjust for additional variables, your exposureâ€™s association may get stronger.

**Video C2.3b – Proportional Hazards Assumption (8 minutes)**

## C2.3b PRACTICAL: Stata

**Test the proportional hazards assumption **

There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the *proportional hazards assumption.*

The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.

**Fitting an interaction with time**

If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. The ‘tvc’ and ‘texp’ options of the stcox are used for this.

For example, to test age the command would be:

stcox i.age_grp frailty, tvc(age_grp)

——————————————————————————

Â Â Â Â Â _t | Haz. ratio Â Std. err. Â Â Â z Â Â P>|z| Â Â [95% conf. interval]

————-+—————————————————————-

main Â Â Â Â |

Â Â Â age_grp |

Â Â Â 71-75 Â | Â 1.534684 Â .1700356 Â Â 3.87 Â 0.000 Â Â 1.235122 Â Â Â 1.9069

Â Â Â 76-80 Â | Â 2.028394 Â .2977641 Â Â 4.82 Â 0.000 Â Â Â 1.52124 Â Â 2.704624

Â Â Â 81-95 Â | Â 3.009572 Â .5683724 Â Â 5.83 Â 0.000 Â Â 2.078505 Â Â 4.35771

Â Â Â Â Â Â Â |

Â Â Â frailty | Â 1.434477 Â .0290646 Â Â 17.81 Â 0.000 Â Â 1.378627 Â Â 1.492588

————-+—————————————————————-

tvc Â Â Â Â Â |

Â Â Â age_grp | Â 1.026909 Â Â .012475 Â Â 2.19 Â 0.029 Â Â 1.002747 Â Â 1.051653

——————————————————————————

Note: Variables in tvc equation interacted with _t.

When we look at the p-value for the coefficient in the â€˜tvcâ€™ row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

**Inspecting log-log plots**

To compute a log-log plot, the command is â€˜stphplotâ€™, which can be adjusted for confounders.

A log-log plot for age group is computed using the command below:

stphplot, by(age_grp) adjust(frailty)

When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the â€˜tvcâ€™ command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.

Â__Question C2.3b:__**Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.Â**

**Answer**

stphplot, by(frailty) adjust(age_grp)

In this log-log plot, we see the curves for the different levels of frailty (levels 1-4) criss-cross and touch each other at several points over time. This is a clearer violation of the PH assumption.Â Â Â Â Â Â Â Â

**Fixing violations in the PH assumption**

If the PH assumption has been violated for a covariate, the lecture mentioned 2 ways to deal with the situation. One way is to specify the model using the â€˜tvcâ€™ option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.

We do this by using the strata option of the stcox command.

stcox i.age_grp, strata(frailty)

You do not get HRs or p-values for each stratum. Instead, Stata assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [ie no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

If you run the command you above, you will get this output:

Stratified Cox regression with Breslow Â Â method Â Â for ties

Strata variable: frailty

No. of subjects = Â Â Â 4,327 Â Â Â Â Â Â Number of obs = Â 4,327

No. of failures = Â Â Â 1,526

Time at risk Â Â = 29,476.9825

Â Â Â Â Â Â LR chi2(3) Â Â = 280.76

Log likelihood = -9708.127 Â Â Â Â Â Â Prob > chi2 Â = 0.0000

Â Â Â Â Â Â

_t Â Haz. ratio Â Std. err. Â Â z Â Â P>z Â Â [95% conf. interval]

Â Â Â Â Â Â Â

age_grpÂ

71-75 Â Â 1.756097 Â .1693938 Â Â 5.84 Â Â 0.000 Â Â 1.453588 Â Â 2.121562

76-80 Â Â 2.642535 Â .2592631 Â Â 9.90 Â Â 0.000 Â Â 2.180259 Â Â 3.202827

81-95 Â Â Â 4.28578 Â .4339019 Â Â 14.37 Â Â 0.000 Â Â 3.514409 Â Â 5.226457

Notice that you do not get any output regarding the variable â€˜frailtyâ€™ that you stratified on, although Stata indicates at the top of the output that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

## C2.3b PRACTICAL: R

**Test the proportional hazards assumption **

There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the *proportional hazards assumption.*

The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.

**Fitting an interaction with time**

If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without. In R, the interaction of a covariate with time can be fitted by using the time-transform functionality tt() in the coxph() command.

model <- coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp), data = df)

summary(model)

The output looks like this:

Call:

coxph(formula = Surv(fu_years, death) ~ frailty + age_grp + tt(age_grp),Â

Â Â data = df)

Â n= 4327, number of events= 1526Â

Â Â Â Â Â Â Â Â coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â

frailty2 Â Â 0.27705 Â 1.31923 Â 0.10488 Â 2.642 Â 0.00825 **Â

frailty3 Â Â 0.47470 Â 1.60754 Â 0.10115 Â 4.693 2.69e-06 ***

frailty4 Â Â 0.79874 Â 2.22273 Â 0.09317 Â 8.573 Â < 2e-16 ***

frailty5 Â Â 1.41966 Â 4.13570 Â 0.08975 15.817 Â < 2e-16 ***

age_grp2 Â Â 0.53277 Â 1.70365 Â 0.09704 Â 5.490 4.01e-08 ***

age_grp3 Â Â 0.82084 Â 2.27241 Â 0.11049 Â 7.429 1.09e-13 ***

age_grp4 Â Â 0.68145 Â 1.97674 Â 0.28382 Â 2.401 Â 0.01635 * Â

tt(age_grp) 0.04261 Â 1.04353 Â 0.01434 Â 2.971 Â 0.00297 **Â

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Â Â Â Â Â Â exp(coef) exp(-coef) lower .95 upper .95

frailty2 Â Â Â Â 1.319 Â Â 0.7580 Â Â 1.074 Â Â 1.620

frailty3 Â Â Â Â 1.608 Â Â 0.6221 Â Â 1.318 Â Â 1.960

frailty4 Â Â Â Â 2.223 Â Â 0.4499 Â Â 1.852 Â Â 2.668

frailty5 Â Â Â Â 4.136 Â Â 0.2418 Â Â 3.469 Â Â 4.931

age_grp2 Â Â Â Â 1.704 Â Â 0.5870 Â Â 1.409 Â Â 2.061

age_grp3 Â Â Â Â 2.272 Â Â 0.4401 Â Â 1.830 Â Â 2.822

age_grp4 Â Â Â Â 1.977 Â Â 0.5059 Â Â 1.133 Â Â 3.448

tt(age_grp) Â Â 1.044 Â Â 0.9583 Â Â 1.015 Â Â 1.073

Concordance= 0.71 Â (se = 0.007 )

Likelihood ratio test= 870.4 Â on 8 df, Â p=<2e-16

Wald test Â Â Â Â Â Â = 880.1 Â on 8 df, Â p=<2e-16

Score (logrank) test = 1045 Â on 8 df, Â Â p=<2e-16

When we look at the p-value for the coefficient in the tt(age_grp) row, we see that it is significant (p=0.03). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

**Inspecting log-log plots**

To compute a log-log plot, we should compute the survival curves for our variable using the survfit() command and then we can print the log-log plot using the ggsurvplot() command.

A log-log plot for age group is computed using the command below:

model <- coxph(Surv(fu_years, death) ~ frailty + strata(age_grp), data = df)

fit <- survfit(model)

ggsurvplot(fit, data = df, fun = “cloglog”)

When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

Looking at the curves for each level of age group above, they appear to be generally parallel with similar slopes over time. This looks acceptable and does not indicate a meaningful violation of the PH assumption. This may contradict the results of the tt() command above as statistical tests will often be significant for small deviations if you have a lot of power (e.g. if you have a large dataset). For this reason, we often prefer to visually inspect the data for obvious violations of the PH assumption.

**Testing the residuals for a correlation with time **

The PH assumption can be also assessed using a statistical test based on the scaled Schoenfeld residuals. In principle, Schoenfeld residuals are independent of time. If a plot shows that Schoenfeld residuals are distributed in a non-random pattern against time, this is evidence of violation of the PH assumption.

We can test the proportional hazards assumption for each covariate in a Cox model using the cox.zph() command.

In a cox model including age_grp and frailty as covariates, we can examine if any of these variables violate the PH assumption by running the following command:

model <- coxph(Surv(fu_years, death) ~ frailty + age_grp, data = df)

cox.zph(model)

The output looks like this:

Â Â Â Â chisq df Â Â Â p

frailty Â 9.51 Â 4 0.04945

age_grp 11.47 Â 3 0.00943

GLOBAL Â 26.33 Â 7 0.00044

The cox.zph() command has two outputs. The first one is a chi-squared test and a p-value for testing the violation of PH assumption for each one of the variables included in the cox model. The second one is a global chi-squared test and a p-value for testing if there is any variable in our model that violates the PH assumption.

We observe that both frailty and age_grp variables violate the PH assumption.

Â__Question C2.3b:__**Interpret a log-log plot for frailty (adjusted for age group) and assess if it violates the PH assumption.Â**

**Answer**

model <- coxph(Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)

fit <- survfit(model)

ggsurvplot(fit, data = df, fun = “cloglog”)

In this log-log plot, we see the curves for the different levels of frailty (levels 1-4) criss-cross and touch each other at several points over time. This is a clearer violation of the PH assumption.

**Fixing violations in the PH assumption**

If the PH assumption has been violated for a covariate, we can fit a stratified Cox model where we stratify by the covariate that violates the PH assumption.

To stratify a Cox model by a specific variable, we use the strata() command from the survival R package.

model <- coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp, data = df)

summary(model)

You do not get HRs or p-values for each stratum. Instead, R assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e., no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

If you run the command above, you will get this output:

Call:

coxph(formula = Surv(fu_years, death) ~ strata(frailty) + age_grp,Â

Â Â data = df)

Â n= 4327, number of events= 1526Â

Â Â Â Â Â Â coef exp(coef) se(coef) Â Â Â z Pr(>|z|) Â Â

age_grp2 0.56309 Â 1.75610 Â 0.09646 Â 5.838 Â 5.3e-09 ***

age_grp3 0.97174 Â 2.64253 Â 0.09811 Â 9.904 Â < 2e-16 ***

age_grp4 1.45530 Â 4.28579 Â 0.10124 14.374 Â < 2e-16 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

Notice that you do not get any output regarding the variable â€˜frailtyâ€™ that you stratified on, although R indicates in the formula that you have used this variable in the strata. You would still interpret the HR of age_grp as having been adjusted for frailty, though. So you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

## C2.3b PRACTICAL: SPSS

**Test the proportional hazards assumption**

There are many similarities between Cox regression and linear and logistic regression. We have seen that you can adjust for additional variables in the same way. You can also fit interactions using the same commands. However, the one key difference between a Cox regression and other regression models is the proportional hazards assumption.

The proportional hazards assumption states that hazard ratio (comparing levels of an exposure variable) is constant over time. This assumption needs to be tested. The lecture explained 3 ways to do this: fitting an interaction with time, examining the residuals with a log-log plot and testing the residuals for a correlation with time. In practice, generally just choosing one method to use is sufficient.

**Fitting an interaction with time**

If the PH assumption is violated, it means that the hazard ratio is not constant over time, i.e. there is an interaction between a covariate and time. Thus, we can fit a Cox model with a time interaction in it and see if this is significant. We can also use a likelihood ratio (LR) test to see if a model with a time-varying covariate fits better than one without.

Select

Analyze >> Survival >> Cox w/ Time-Dep Cov

This opens a window for you to define your time dependant covariate. The variable Time [T_] will be added to your list of potential variables. You need to use this, multiplied by your variable of interest (in his case age group) to define the time dependant covariate. When you have defined the expression, press â€˜Modelâ€™.

The normal Cox Regression box will then open, with your new time dependant covariate indicated by T_COV in the left hand menu. Set up your analysis exactly as before, but add T_COV as another covariate.

Your output will look like the below.

When we look at the p-value for the coefficient in the T_COV row, we see that it is significant (p=0.022). This suggests there is a significant interaction with time, which indicates the HR for age group is not constant over time.

**Inspecting log-log plots**

To compute a log-log plot in SPSS click on â€˜Plotsâ€™ within the Cox Regression window. Tick the box next to â€˜log minus logâ€™ and then move the variable you are interested in into the â€˜Separate Lines for:â€™ box.

Your output will look like the below.

When we inspect a log-log plot, we want to see that the curves in the main portion of the plot (i.e. not the tails of the plot, where there are few events) are generally parallel over time. If the curves touch or cross, this indicates there is a violation of the PH assumption.

**Fixing violations in the PH assumption**

If the PH assumption has been violated for a covariate, the lecture mentioned two ways to deal with the situation. One way is to specify the model using the T_COV option for the offending variable, which models an interaction with time. The other way, which we are doing here, is to fit a stratified Cox model where we stratify by the time-varying covariate.

We do this by adding the variable (frailty in this example) to the â€˜Strataâ€™ box in the Cox Regression window rather than the â€˜Covariateâ€™ box.

You do not get HRs or p-values for each stratum. Instead, SPSS assumes there is a separate baseline hazard for each stratum in order to estimate the proportional effects of the other variables across strata (which provides only one coefficient since the proportional effects are assumed to be constant, which is the no-interaction assumption [i.e. no interaction between other explanatory variables and the time-varying coefficient]). Since you do not get HR for levels of a variable you stratify on, do not use this method if you have non-proportionality in your main exposure of interest.

You would still interpret the HR of age_grp as having been adjusted for frailty. So, you would say that the hazard of dying was 4.3 times greater in the 81-95 age group compared to the 60-70 age group, once it was adjusted for frailty (as stratifying on a variable is another way to adjust for its association).

No presentation. Please add downloadable presentation