C2.4 Poisson Regression

C2.4 PRACTICAL: R

Poisson regression can be used to model the log(count of events) or the log(rate), since a rate is equivalent to the ‘count of events’ divided by a period of follow-up time. Here, we show a Poisson regression modelling the log(rate of disease) as the outcome.

In R, we can use the glm() command to fit a Poisson regression model as follows:

glm(formula, data, family = poisson(link = “log”))

The command glm() is the same command that was used to fit a logistic regression model in the previous module. The only difference is that in Poisson regression we use the family poisson(link = “log”), whereas in logistic regression we use the family binomial(link = “logit”). Note that we use the offset(time) when we specify the formula of the Poisson regression to specify follow-up time. If you do not use the offset(time) to specify the period of follow-up time, then you will be analysing a log(count) [rather than rate].

To estimate the incidence rate ratio of death for current smokers compared to non-smokers, we run the following command:

> model <- glm(death ~ offset(log(fu_years)) + currsmoker, data = df, family = poisson(link = “log”))
> summary(model)

Call:
glm(formula = death ~ offset(log(fu_years)) + currsmoker, family = poisson(link = “log”),
data = df)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.0083 -0.9267 -0.8854 0.9764 3.3032

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.98556 0.02770 -107.770 <2e-16 ***
currsmoker 0.16891 0.07318 2.308 0.021 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5138.5 on 4320 degrees of freedom
Residual deviance: 5133.3 on 4319 degrees of freedom
(6 observations deleted due to missingness)
AIC: 8179.3

Number of Fisher Scoring iterations: 6

The estimate in the row for ‘currsmoker’ is the rate ratio of death for current smokers compared to non-smokers. This output shows that the rate of death is 18% higher in current smokers compared to non-smokers (RR 1.18, 95% CI: 1.03-1.37). This association is statistically significant (p=0.02).

Assumptions of Poisson regression

Poisson regression requires that within an exposure group such as smokers or non-smokers, the rate of the event of interest (such as death) stays constant over time, a very strong assumption. A cohort study often involves follow-up over many years and it is unrealistic, because of changes in age, to assume that the rate stays unchanged over follow-up.

Question C2.4: Use a Poisson regression to assess if current smoking associated with the rate of death, once adjusted for age group and frailty?

Answer

The command and output is:> model <- glm(death ~ offset(log(fu_years)) + currsmoker + age_grp + frailty, data = df, family = poisson(link = “log”))
> summary(model)

Call:
glm(formula = death ~ offset(log(fu_years)) + currsmoker + age_grp +
frailty, family = poisson(link = “log”), data = df)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.0348 -0.8082 -0.5892 0.7480 3.2916

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.35993 0.10691 -40.783 < 2e-16 ***
currsmoker 0.22259 0.07330 3.037 0.00239 **
age_grp2 0.54990 0.09650 5.698 1.21e-08 ***
age_grp3 0.94373 0.09820 9.610 < 2e-16 ***
age_grp4 1.38579 0.10113 13.703 < 2e-16 ***
frailty2 0.26967 0.10491 2.571 0.01015 *
frailty3 0.46882 0.10116 4.634 3.58e-06 ***
frailty4 0.76742 0.09329 8.226 < 2e-16 ***
frailty5 1.33908 0.08988 14.899 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 5138.5 on 4320 degrees of freedom
Residual deviance: 4352.0 on 4312 degrees of freedom
(6 observations deleted due to missingness)
AIC: 7412

Number of Fisher Scoring iterations: 6

The rate of death is 23% higher in current smokers compared to non-smokers, once adjusted for age group and frailty (RR 1.25, 95% CI: 1.09-1.43). This association is statistically significant (p=0.002).

C2.4 PRACTICAL: Stata

There are several Stata commands which allow the analysis of data with follow-up time and rates. These are broadly grouped into ‘stand-alone’ commands (e.g. ‘poisson’) and those which are used after the ‘stset’ command which tells Stata that you have ‘survival data’. You can run a Poisson regression using the ‘stset’ command, followed by the ‘streg’ command, but we are not covering that here. For more information, see the recommended readings at the end of this module.

Poisson regression can be used to model the log(count of events) or the log(rate), since a rate is equivalent to the ‘count of events’ divided by a period of follow-up time.

Here, we show a Poisson regression modelling the log(rate of disease) as the outcome. In Stata, we can use the ‘poisson’ command as follows:

poisson death currsmoker, e(fu_years) irr

The command poisson works similarly to other regression commands such as logistic and regress, but note we use the ‘e’ option to specify follow-up time (see the help file for more details on this). If you do not use the ‘e’ option to specify the period of follow-up time, then you will be analysing a log(count) [rather than rate].

If we run the ‘poisson’ command above, we get the following output:

poisson death currsmoker,e(fu_years) irr

Iteration 0: log likelihood = -4087.6737

Iteration 1: log likelihood = -4087.6737

Poisson regression Number of obs = 4,321

LR chi2(1) = 5.12

Prob > chi2 = 0.0237

Log likelihood = -4087.6737 Pseudo R2 = 0.0006

——————————————————————————

death | IRR Std. err. z P>|z| [95% conf. interval]

————-+—————————————————————-

currsmoker | 1.184012 .0866404 2.31 0.021 1.025815 1.366605

_cons | .0505112 .0013993 -107.77 0.000 .0478417 .0533296

ln(fu_years) | 1 (exposure)

——————————————————————————

Note: _cons estimates baseline incidence rate.

The ‘IRR’ coefficient in the row for ‘currsmoker’ is the rate ratio of death for current smokers compared to non-smokers. This output shows that the rate of death is 18% higher in current smokers compared to non-smokers (RR 1.18, 95% CI: 1.03-1.37). This association is statistically significant (p=0.02).

The row ‘ln(fu_years)’ is always set at 1, and this is correct (even though it looks odd on the output). This refers to the follow-up time, which we set at 1 so that the regression does not multiply the variable denoting the follow-up time by any sort of constant.

Assumptions of Poisson regression

Poisson regression requires that within an exposure group such as smokers or non-smokers, the rate of the event of interest (such as death) stays constant over time, a very strong assumption. A cohort study often involves follow-up over many years and it is unrealistic, because of changes in age, to assume that the rate stays unchanged over follow-up.

One way to deal with this assumption is to split follow-up time using the ‘stsplit’ command in Stata, and model the rate of death in separate time bands of follow-up. The rate of death for smokers vs non-smokers can vary between time bands (such as the first 5 years of follow-up, compared to follow-up over years 6-10), as long as the rate is generally constant within time bands. This is outside the scope of introductory module on Poisson regression to cover, so please see the recommended readings at the end of the module.

Question C2.4: Use a Poisson regression to assess if current smoking associated with the rate of death, once adjusted for age group and frailty?

Answer

The command and output is:

poisson death currsmoker age_grp frailty,e(fu_years) irr

Iteration 0: log likelihood = -3706.1288

Iteration 1: log likelihood = -3706.1281

Iteration 2: log likelihood = -3706.1281

Poisson regression Number of obs = 4,321

LR chi2(3) = 768.21

Prob > chi2 = 0.0000

Log likelihood = -3706.1281 Pseudo R2 = 0.0939

———————————————————————————

death | IRR Std. err. z P>|z| [95% conf. interval]

————-+——————————————————————-

currsmoker | 1.234536 .0903537 2.88 0.004 1.069562 1.424958

age_grp | 1.550334 .0431322 15.76 0.000 1.46806 1.637218

frailty | 1.403122 .0282948 16.80 0.000 1.348747 1.459689

_cons | .0056696 .0005372 -54.59 0.000 .0047086 .0068266

ln(fu_years) | 1 (exposure)

——————————————————————————–
Note: _cons estimates baseline incidence rate.

The rate of death is 23% higher in current smokers compared to non-smokers, once adjusted for age group and frailty (RR 1.23, 95% CI: 1.07-1.42). This association is statistically significant (p=0.004).

C2.4 PRACTICAL: SPSS

SPSS does not have Poisson regression as an option within its survival analysis menu. A general Poisson regression can be conducted using the function Analyze >> Generalized Linear Models >> Generalized Linear Models and then ticking ‘Poisson Loglinear’ in the ‘Type of Model’ tab. This function does not allow the time component of the survival analysis to be entered, and will give very different results, so SPSS should not be used for Poisson regression on survival data.

FoSSA: Fundamentals of Statistical Software & Analysis

Course Information

Participants 1699

Learning Outcomes

Assumptions of Poisson regression

Assumptions of Poisson regression

Live Bootcamps

Training & Consulting

Free Courses

Get In Touch

TERMS OF SERVICE

PRIVACY POLICY

Join Our Mailing List

FoSSA: Fundamentals of Statistical Software & Analysis

Course Information

Participants 1699

C2.4 Poisson Regression

Learning Outcomes

Assumptions of Poisson regression

Assumptions of Poisson regression

Live Bootcamps

Training & Consulting

Free Courses

Get In Touch

TERMS OF SERVICE

PRIVACY POLICY

Join Our Mailing List

SELECT A COURSE START DATE