Learning Outcomes
By the end of this section, students will be able to:
- Calculate and interpret a logistic regression for a binary exposure variable
- Calculate and interpret a logistic regression for a categorical exposure variable
- Calculate and interpret a logistic regression for a continuous exposure variable
You can download a copy of the slides here: Video C1.3a
Video C1.3a – Logistic Regression (9 minutes)
You can download a copy of the slides here: Video C1.3b
Video C1.3b – Logistic Regression for Continuous & Categorical Variables (8 minutes)
C1.3 PRACTICAL: Stata
Logistic regression with a binary exposure
To run a logistic regression in Stata, we use the ‘logistic’ command (the ‘logit‘ command is used for obtaining log odds, but in practice, we are usually only interested in the odds ratio and its 95% CI). The set up of the command is:
logistic depvar indepvars
‘Depvar’ is your outcome variable. To use this command, you need to confirm that your outcome variable is coded as 0 (negative outcome) and 1 (positive outcome). ‘Indepvars’ is where you list your exposure variable (i.e. independent variable).
Question C1.3a: Dichotomise the variable ‘sbp’ into below 140 mmHg and greater than or equal to 140 mmHg. Call this variable “hyperten” to indicate “hypertensive”. Use logistic regression to examine the association of hypertensive on your odds of having prior CVD (‘prior_cvd’). Interpret the output.
Answer
recode sbp (min/139=0) (140/max=1), gen(hyperten)
logistic prior_cvd hyperten
The output looks like this:
Logistic regression Number of obs = 4,318
LR chi2(1) = 36.97
Prob > chi2 = 0.0000
Log likelihood = -2406.9283 Pseudo R2 = 0.0076
——————————————————————————
prior_cvd | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
hyperten | 1.596948 .1215625 6.15 0.000 1.375612 1.853898
_cons | .289861 .0123634 -29.03 0.000 .2666144 .3151345
——————————————————————————
Note: _cons estimates baseline odds.
The odds of having prior CVD are 1.60 times greater for someone that was hypertensive compared to those who were not (95% CI: 1.38-1.85). This effect is significant (p<0.001), which means that we can reject the null hypothesis that there is no association between hypertension and prior CVD, and that the odds ratio is equal to 1.
Logistic regression with a categorical exposure
We may use the logistic command to obtain odds ratios for a categorical variable such as BMI group. Putting an ‘i.’ before the categorical exposure will automatically omit the baseline category as the reference category, but if you want to choose a different reference category you can write ‘ib2.” with the number specifying whichever category you prefer. In this case, since the lowest category of BMI group (“underweight”) is quite small, we want to use the next category of “normal weight” as the baseline (or reference) group:
logistic prior_cvd ib2.bmi_grp
The output looks like this:
Logistic regression Number of obs = 4,310
LR chi2(3) = 3.64
Prob > chi2 = 0.3033
Log likelihood = -2420.1961 Pseudo R2 = 0.0008
——————————————————————————
prior_cvd | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
bmi_grp4 |
Underweight | .8995327 .3111124 -0.31 0.760 .4566888 1.771795
Overweight | 1.085572 .0811883 1.10 0.272 .9375593 1.256952
Obese | 1.252077 .1593964 1.77 0.077 .9755921 1.606918
|
_cons | .3135531 .0173705 -20.94 0.000 .2812907 .3495158
——————————————————————————
Note: _cons estimates baseline odds.
Odd ratios in a logistic regression with a categorical variable are interpreted as the odds of the outcome in the specified level compared with a baseline level. The constant refers to the odds of the outcome in the baseline category. Here, the odds of having prior CVD are 25% higher if a participant has obesity compared to if they have a normal weight (OR:1.25, 95% CI:0.98-1.61). This association is not statistically significant (p=0.08) so we cannot reject the null hypothesis that the OR of obese vs normal weight is 1.
Logistic regression with an ordered categorical exposure
We can use the logistic command to obtain an odds ratio and a linear trend test for an ordered categorical variable such as BMI group. Note that by not specifying the i. before the exposure variable ‘bmi_grp’, we have told Stata that our exposure variable ‘bmi_grp’ is to be treated as a continuous variable rather than as a categorical variable.
logistic prior_cvd bmi_grp
The output looks like this:
Logistic regression Number of obs = 4,310
LR chi2(1) = 3.50
Prob > chi2 = 0.0614
Log likelihood = -2420.2658 Pseudo R2 = 0.0007
——————————————————————————
prior_cvd | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
bmi_grp4 | 1.105901 .0594653 1.87 0.061 .9952821 1.228813
_cons | .2545908 .0376265 -9.26 0.000 .1905645 .3401288
——————————————————————————
Note: _cons estimates baseline odds.
Note that the Wald test here is equivalent to the chi-square test for linear trend (Section C1.2b, ‘chi-square test for trend’). The null hypothesis is that there is no association between BMI group and prior CVD (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, the p-value (p=0.06) indicates that there is not evidence against the null hypothesis (although this is a borderline significant p-value) and we conclude that there is not a statistically significant increasing trend in log odds of prior CVD across groups of BMI.
Logistic regression with a continuous exposure
To fit a logistic regression with a continuous exposure, we type the variable as it is (with no prefix) and we interpret the OR in terms of a 1 unit change in the exposure:
logistic prior_cvd hdlc
The output:
Logistic regression Number of obs = 4,302
LR chi2(1) = 54.04
Prob > chi2 = 0.0000
Log likelihood = -2391.5959 Pseudo R2 = 0.0112
——————————————————————————
prior_cvd | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
hdlc | .4875979 .0488451 -7.17 0.000 .4006755 .5933772
_cons | .7168523 .0790327 -3.02 0.003 .5775438 .8897631
——————————————————————————
Note: _cons estimates baseline odds.
The odds of having prior CVD were 51% lower for each 1 unit increase in HDL-C (OR: 0.49, 95%CI: 0.40-0.59). This association was statistically significant.
Question C1.3bi.: Use logistic regression to obtain odds ratios for the association of BMI group on prior diabetes (‘prior_t2dm’). Try fitting this variable both as a categorical variable and as a linear trend.
Question C1.3b.ii: Use logistic regression to assess the association between ‘prior_t2dm’ and the continuous exposure variable ‘hdlc’. How would you interpret this output?
C1.3b Answers
Answer C1.3bi:
Fitting BMI group as a categorical variable in logistic regression gives the following results:
logistic prior_t2dm ib2.bmi_grp4
Logistic regression Number of obs = 4,310
LR chi2(3) = 7.81
Prob > chi2 = 0.0501
Log likelihood = -969.91672 Pseudo R2 = 0.0040
——————————————————————————
prior_t2dm | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
bmi_grp4 |
Underweight | 1.626374 .8656765 0.91 0.361 .572991 4.616287
Overweight | 1.229727 .1732047 1.47 0.142 .9330798 1.620686
Obese | 1.799443 .3808491 2.78 0.006 1.188454 2.724541
|
_cons | .0534665 .0057527 -27.22 0.000 .0433009 .0660186
——————————————————————————
Note: _cons estimates baseline odds.
The odds ratio of prior diabetes for underweight vs normal weight (as BMI group 2 is the reference group) is 1.63 (95% CI: 0.57-4.62). The z-statistic for this odds ratio is 0.91 and p=0.361 so this odds ratio is not significantly different to 1.
The odds ratio for overweight vs normal weight is 1.23 (95% CI: 0.93-1.62). The z-statistic for this odds ratio is 1.47 and p=0.142 so this odds ratio is not significantly different to 1.
The odds ratio for obese vs normal weight is 1.80 (95% CI: 1.19-2.72). The z-statistic for this odds ratio is 2.78 and p=0.006 so this odds ratio is significantly different to 1.
Fitting ‘BMI group’ as a continuous variable in logistic regression gives the following results:
logistic prior_t2dm bmi_grp4
Logistic regression Number of obs = 4,310
LR chi2(1) = 5.75
Prob > chi2 = 0.0165
Log likelihood = -970.94533 Pseudo R2 = 0.0030
——————————————————————————
prior_t2dm | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
bmi_grp4 | 1.263954 .1229826 2.41 0.016 1.044503 1.529513
_cons | .0337493 .0092462 -12.37 0.000 .0197271 .0577384
——————————————————————————
Note: _cons estimates baseline odds.
The odds ratio associated with a unit increase in BMI group in the past year is 1.26 (95% CI: 1.04-1.53). Note that the Wald test here is equivalent to the chi-square test for linear trend. The null hypothesis is that there is no association between prior diabetes and BMI group (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, p=0.016 indicates that there is a statistically significant increasing linear trend in log odds of prior diabetes.
Answer C1.3b.ii:
The output is as follows:
logistic prior_t2dm hdlc
Logistic regression Number of obs = 4,302
LR chi2(1) = 17.93
Prob > chi2 = 0.0000
Log likelihood = -964.36248 Pseudo R2 = 0.0092
——————————————————————————
prior_t2dm | Odds ratio Std. err. z P>|z| [95% conf. interval]
————-+—————————————————————-
hdlc | .4634116 .0866931 -4.11 0.000 .3211646 .6686613
_cons | .1421395 .0282277 -9.82 0.000 .0963104 .2097762
——————————————————————————
Note: _cons estimates baseline odds.
For every 1 unit increase in HDL-C, the odds of having prior diabetes were 54% lower (95% CI: 0.32-0.67). This association is statistically significant (p<0.001) so we can reject the null hypothesis that there is no association of HDL-C with prior diabetes.
C1.3 PRACTICAL: SPSS
Logistic regression with a binary exposure
Firstly, create a new variable to dichotomise the variable ‘sbp’ into below 140 mmHg and greater than or equal to 140 mmHg. Call this variable “hyperten” to indicate “hypertensive”, using No= 0 and Yes =1
We are now going to use logistic regression to examine the association of hypertensive on your odds of having prior CVD (‘prior_cvd’).
Select
Analyze>> Regression >> Binary Logistic
Place prior_cvd in the Dependant Variable box and ‘hyperten’ in the covariates box.

Unlike with the ‘Crosstabs’ function for odds ratios and the Chi squared test, the Logistic Regression function allows you to define your reference category. To do this just click on ‘Categorical’ on the right-hand side of the pop up box. Move the ‘hyperten’ variable into the ‘Categorical Covariates’ box and indicate that you want to use the first category as the Reference Category. This way it will produce data based on what is different for the yes hypertension group compared to the no hypertension group (rather than the other way around).

Then click on ‘Statistics and Plots’ and tick the box next to ‘CI for exp(B)’. This will show the details of the confidence interval for the odds ratio. Once you have ticked the box you will then be able to alter the size of the confidence interval you wish to display. Leave it as the default value of 95%. Click Continue and then OK to run the test.

Question C1.3a: Can you interpret the output from the above analysis?
Answer
SPSS gives a lot of output tables for a logistic regression, but the one you are really interested in is the last one, which looks like the below.

ExpB is the odds ratio. The odds of having prior CVD are 1.59 times greater for someone that was hypertensive compared to those who were not (95% CI: 1.38-1.85). This effect is significant (p<0.001), which means that we can reject the null hypothesis that there is no association between hypertension and prior CVD, and that the odds ratio is equal to 1.
Logistic regression with a categorical exposure
We can use logistic regression to obtain odds ratios for a categorical variable such as BMI group. We follow the same process, but put the categorical variable into the ‘Covariates’ box, and define the reference category in the ‘Categorical’ tab as before.
A logistic regression output between prior_cvd and bmi_grp4 would look like the below.

Odds ratios in a logistic regression with a categorical variable are interpreted as the odds of the outcome in the specified level compared with a baseline level. The constant refers to the odds of the outcome in the baseline category. Here, the odds of having prior CVD are 39%% higher if a participant has obesity compared to if they are underweight (OR:1.39, 95% CI:0.69-2.82). This association is not statistically significant (p=0.36) so we cannot reject the null hypothesis that the OR of obese vs underweight is 1.
SPSS only allows the first or last categories to be defined as the reference category. If you wanted to compare each of the categories to the ‘normal weight’ category, you would first need to recode the variable so that ‘normal’ was associated with either the highest or lowest integer used for coding the groups.
Logistic regression with an ordered categorical exposure
If we do not define the bmi_grp variable in the ‘Categorical’ tab, then we are telling SPSS that our covariate is to be treated as continuous rather than categorical. The output from this analysis looks like the below.

The null hypothesis is that there is no association between BMI group and prior CVD (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, the p-value (p=0.06) indicates that there is not evidence against the null hypothesis (although this is a borderline significant p-value) and we conclude that there is not a statistically significant increasing trend in log odds of prior CVD across groups of BMI.
Logistic regression with a continuous exposure
To fit a logistic regression with a continuous exposure, we use the same process and we interpret the OR in terms of a 1 unit change in the exposure. Below is an output table for prior_cvd with hdlc as the covariate.

The odds of having prior CVD were 51% lower for each 1 unit increase in HDL-C (OR: 0.49, 95%CI: 0.40-0.59). This association was statistically significant.
Question C1.3bi.: Use logistic regression to obtain odds ratios for the association of BMI group on prior diabetes (‘prior_t2dm’). Try fitting this variable both as a categorical variable and as a linear trend.
Question C1.3b.ii: Use logistic regression to assess the association between ‘prior_t2dm’ and the continuous exposure variable ‘hdlc’. How would you interpret this output?
Answer
Answer C1.3b.i:
Fitting BMI group as a categorical variable in logistic regression gives the following results. In this example I have recoded the bmi_grp4 variable to compare all groups to the normal weight group.

The odds ratio of prior diabetes for underweight vs normal weight is 1.63 (95% CI: 0.57-4.62). p=0.361 so this odds ratio is not significantly different to 1.
The odds ratio for overweight vs normal weight is 1.23 (95% CI: 0.93-1.62). p=0.142 so this odds ratio is not significantly different to 1.
The odds ratio for obese vs normal weight is 1.80 (95% CI: 1.19-2.72). p=0.006 so this odds ratio is significantly different to 1.
Fitting ‘BMI group’ as a continuous variable in logistic regression gives the following results. This example uses the original bmi_grp4 variable.

The odds ratio associated with a unit increase in BMI group in the past year is 1.26 (95% CI: 1.04-1.53). Note that the Wald test here is equivalent to the chi-square test for linear trend. The null hypothesis is that there is no association between prior diabetes and BMI group (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, p=0.016 indicates that there is a statistically significant increasing linear trend in log odds of prior diabetes.
Answer C1.3b.ii:
Including HDL-C as a continuous variable gives the following results.

For every 1 unit increase in HDL-C, the odds of having prior diabetes were 54% lower (OR: 0.46, 95% CI: 0.32-0.67). This association is statistically significant (p<0.001) so we can reject the null hypothesis that there is no association of HDL-C with prior diabetes.
C1.3 PRACTICAL: R
Logistic regression with a binary exposure
To run a logistic regression in R, we use the ‘glm() ‘ command. The set up of the command is:
glm(formula, family, data)
where:
- formula has the format “outcome ~ exposure”,
- family is ‘binomial(link = “logit”)’ for logistic regression, and
- data corresponds to the dataframe including our data.
To use this command, you need to confirm that your outcome variable is coded as a factor variable. You can find more details on glm() command in this help file.
Using the ‘glm()’ command above, the log OR (Estimate) and its standard error (Std. Error) are printed, and we estimate the OR and the 95% confidence intervals by exponentiating the results with this post-estimation command:
exp(cbind(coef(model), confint(model)))
- Question C1.3a: Dichotomise the variable ‘sbp’ into below 140 mmHg and greater than or equal to 140 mmHg. Call this variable “hyperten” to indicate “hypertensive”. Use logistic regression to examine the association of hypertensive on your odds of having prior CVD (‘prior_cvd’). Interpret the output.
Answer
Answer C1.3a:
To dichotomise the “sbp” variable, use the following command:
df[df$sbp < 140, “hyperten”] <- 0
df[df$sbp >= 140, “hyperten”] <- 1
To run the logistic regression we use the glm() command:
model <- glm(prior_cvd ~ hyperten, family = binomial(link = “logit”), data = df)
summary(model)
The output of the glm()command is the following:
Call:
glm(formula = “prior_cvd ~ hyperten”, family = binomial(link = “logit”), data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8753 -0.7157 -0.7157 1.5132 1.7248
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.23143 0.04284 -28.747 < 2e-16 ***
hyperten 0.46966 0.07644 6.144 8.03e-10 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4805.5 on 4265 degrees of freedom
Residual deviance: 4768.6 on 4264 degrees of freedom
AIC: 4772.6
Number of Fisher Scoring iterations: 4
The lnOR (Estimate) and its standard error (Std. Error) are printed, and we estimate the OR and the 95% confidence intervals by exponentiating:
> exp(cbind(coef(model), confint(model)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.291874 0.2681921 0.3172374
hyperten 1.599446 1.3762597 1.8571969
The odds of having prior CVD are 1.60 times greater for someone that was hypertensive compared to those who were not (95% CI: 1.38-1.86). This effect is significant (p<0.001), which means that we can reject the null hypothesis that there is no association between hypertension and prior CVD, and that the odds ratio is equal to 1.
Logistic regression with a categorical exposure
We may use the glm() command to obtain odds ratios for a categorical variable such as BMI group. By default, the category with the lowest numerical value is treated as reference category. We fit the logistic regression model with the following command:
model2 <- glm(prior_cvd ~ bmi_grp4, family = binomial(link = “logit”), data = df)
summary(model2)
The output looks like this:
Call:
glm(formula = “prior_cvd ~ bmi_grp4”, family = binomial(link = “logit”),data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8138 -0.7674 -0.7408 1.5913 1.7285
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.23969 0.34238 -3.621 0.000294 ***
bmi_grp4Normal 0.08689 0.34687 0.250 0.802210
bmi_grp4Overweight 0.16795 0.34607 0.485 0.627460
bmi_grp4Obese 0.30471 0.36106 0.844 0.398707
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4805.5 on 4265 degrees of freedom
Residual deviance: 4802.1 on 4262 degrees of freedom
AIC: 4810.1
Number of Fisher Scoring iterations: 4
Odd ratios in a logistic regression with a categorical variable are interpreted as the odds of the outcome in the specified level compared with a baseline level. The constant refers to the odds of the outcome in the baseline category. We estimate the OR and the 95% confidence intervals by exponentiating:
exp(cbind(coef(model2), confint(model2)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.2894737 0.1409194 0.546895
bmi_grp4Normal 1.0907740 0.5717535 2.257744
bmi_grp4Overweight 1.1828794 0.6211096 2.445084
bmi_grp4Obese 1.3562290 0.6893090 2.875507
Here, the odds of having prior CVD are 36% higher if a participant has obesity compared to if they have a normal weight (OR:1.36, 95% CI:0.69-2.88). This association is not statistically significant (p=0.40) so we cannot reject the null hypothesis that the OR of obese vs normal weight is 1.
Logistic regression with an ordered categorical exposure
We can use the logistic command to obtain an odds ratio and a linear trend test for an ordered categorical variable such as BMI group. For this reason, we should tell R that our exposure variable is to be treated as a continuous variable rather than as a categorical variable. We achieve that using the ‘as.numeric()’ command in the formula.
The output looks like this:
> > model3 <- glm(prior_cvd ~ as.numeric(bmi_grp4), family = binomial(link = “logit”), data = df)
> summary(model3)
Call:
glm(formula = “prior_cvd ~ as.numeric(bmi_grp4)”, family = binomial(link = “logit”),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8040 -0.7710 -0.7389 1.6042 1.7359
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.35370 0.14824 -9.132 <2e-16 ***
as.numeric(bmi_grp4) 0.09753 0.05391 1.809 0.0704 .
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4805.5 on 4265 degrees of freedom
Residual deviance: 4802.3 on 4264 degrees of freedom
AIC: 4806.3
Number of Fisher Scoring iterations: 4
Note that the Wald test here is equivalent to the chi-square test for linear trend (Section C1.2b, ‘chi-square test for trend’). The null hypothesis is that there is no association between BMI group and prior CVD (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, the p-value (p=0.07) indicates that there is not evidence against the null hypothesis (although this is a borderline significant p-value) and we conclude that there is not a statistically significant increasing trend in log odds of prior CVD across groups of BMI.
Logistic regression with a continuous exposure
To fit a logistic regression with a continuous exposure, we type the variable as it is (with no prefix) and we interpret the OR in terms of a 1-unit change in the exposure:
> model4 <- glm(prior_cvd ~ hdlc, family = binomial(link = “logit”), data = df)
> summary(model4)
Call:
glm(formula = “prior_cvd ~ hdlc”, family = binomial(link = “logit”),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9867 -0.7962 -0.7180 1.3712 2.2951
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3131 0.1106 -2.83 0.00466 **
hdlc -0.7316 0.1005 -7.28 3.34e-13 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4805.5 on 4265 degrees of freedom
Residual deviance: 4749.7 on 4264 degrees of freedom
AIC: 4753.7
Number of Fisher Scoring iterations: 4
We estimate the OR and the 95% confidence intervals by exponentiating:
> exp(cbind(coef(model4), confint(model4)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.7311849 0.5887604 0.9084966
hdlc 0.4811146 0.3945020 0.5850291
The odds of having prior CVD were 52% lower for each 1 unit increase in HDL-C (OR: 0.48, 95%CI: 0.39-0.59). This association was statistically significant.
- Question C1.3bi.: Use logistic regression to obtain odds ratios for the association of BMI group on prior diabetes. Try fitting this variable both as a categorical variable and as a linear trend.
- Question C1.3b.ii: Use logistic regression to assess the association between ‘prior_t2dm’ and the continuous exposure variable ‘hdlc’. How would you interpret this output?
Answer
Answer C1.3bi.:
We fit the logistic regression model with the following command:
model <- glm(prior_t2dm ~ bmi_grp4, family = binomial(link = “logit”), data = df)
summary(model)
The output looks like this:
> model5 <- glm(prior_t2dm ~ bmi_grp4, family = binomial(link = “logit”), data = df)
> summary(model5)
Call:
glm(formula = prior_t2dm ~ as.character(bmi_grp4), family = binomial(link = “logit”),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4286 -0.3544 -0.3544 -0.3211 2.4457
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.9391 0.1088 -27.023 < 2e-16 ***
bmi_grp4Obese 0.5979 0.2122 2.817 0.00485 **
bmi_grp4Overweight 0.2029 0.1424 1.425 0.15426
bmi_grp4Underweight 0.5187 0.5330 0.973 0.33040
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1914.6 on 4265 degrees of freedom
Residual deviance: 1906.6 on 4262 degrees of freedom
AIC: 1914.6
Just to note that the reference group in the model is those with a normal weight. We can estimate the different ORs and their corresponding the 95% confidence intervals from the model by taking the exponential:
> exp(cbind(coef(model5), confint(model5)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.0529132 0.0424526 0.06505699
bmi_grp4Obese 1.8182592 1.1845276 2.72916539
bmi_grp4Overweight 1.2249272 0.9280857 1.62319656
bmi_grp4Underweight 1.6799001 0.4981225 4.25397420
The odds ratio of prior diabetes for underweight vs normal weight (as BMI group 2 is the reference group) is 1.68 (95% CI: 0.50-4.25). The z-statistic for this odds ratio is 0.97 and p=0.330 so this odds ratio is not significantly different to 1.
The odds ratio for overweight vs normal weight is 1.22 (95% CI: 0.93-1.62). The z-statistic for this odds ratio is 1.43 and p=0.154 so this odds ratio is not significantly different to 1.
Finally, the odds ratio for obese vs normal weight is 1.82 (95% CI: 1.18-2.73). The z-statistic for this odds ratio is 2.82 and p=0.005 so this odds ratio is significantly different to 1.
Fitting ‘BMI group’ as a continuous variable in logistic regression gives the following results:
> model6 <- glm(prior_t2dm ~ as.numeric(bmi_grp4), family = binomial(link = “logit”), data = df)
> summary(model6)
Call:
glm(formula = prior_t2dm ~ as.numeric(bmi_grp4), family = binomial(link = “logit”),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4048 -0.3613 -0.3613 -0.3223 2.5332
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.40303 0.27652 -12.306 <2e-16 ***
as.numeric(bmi_grp4) 0.23561 0.09811 2.402 0.0163 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1914.6 on 4265 degrees of freedom
Residual deviance: 1908.9 on 4264 degrees of freedom
AIC: 1912.9
Number of Fisher Scoring iterations: 5
> exp(cbind(coef(model6), confint(model6)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.03327238 0.01923618 0.05689893
as.numeric(bmi_grp4) 1.26568480 1.04369721 1.53348499
The odds ratio associated with a unit increase in BMI group in the past year is 1.27 (95% CI: 1.04-1.53). Note that the Wald test here is equivalent to the chi-square test for linear trend. The null hypothesis is that there is no association between prior diabetes and BMI group (i.e. that the linear trend OR equals 1), and the alternative hypothesis is that there is a linear increasing or decreasing trend. Here, p=0.016 indicates that there is a statistically significant increasing linear trend in log odds of prior diabetes.
Answer C1.3b.ii:
The output is as follows:
> model7 <- glm(prior_t2dm ~ hdlc, family = binomial(link = “logit”), data = df)
> summary(model7)
Call:
glm(formula = prior_t2dm ~ hdlc, family = binomial(link = “logit”),
data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.5043 -0.3757 -0.3448 -0.3069 2.7934
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.9596 0.2005 -9.775 < 2e-16 ***
hdlc -0.7717 0.1887 -4.089 4.34e-05 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1914.6 on 4265 degrees of freedom
Residual deviance: 1896.9 on 4264 degrees of freedom
AIC: 1900.9
Number of Fisher Scoring iterations: 5
> exp(cbind(coef(model7), confint(model7)))
Waiting for profiling to be done…
2.5 % 97.5 %
(Intercept) 0.1409122 0.09493215 0.2083414
hdlc 0.4622490 0.31763986 0.6657489
For every 1 unit increase in HDL-C, the odds of having prior diabetes were 54% lower (OR: 0.46 95% CI: 0.32-0.67). This association is statistically significant (p<0.001) so we can reject the null hypothesis that there is no association of HDL-C with prior diabetes.
At the 3:10 mark, the reference level is wrongly stated to be “BMI < 25”. From the model output and the table on the previous slide, it is clear that the reference level actually is “BMI < 18.5”. This then means that the explanation of the estimated coefficients for the BMI categories is also wrong: the exponantiated coefficients correspond to increased odds relative to the category “BMI < 18.5”, and not to the category of “BMI < 25”, as claimed in the video.
Video 1.3a is posted under 1.3b. Kindly upload video 1.3b.