Learning Outcomes
By the end of this section, students will be able to:
- Control the association of an exposure for additional variables in a logistic regression model
You can download a copy of the slides here: Video C1.4
Video C1.4 – Controlling for Confounding in Logistic Regression (6 minutes)
C1.4 PRACTICAL: Stata
Controlling for confounders in a logistic regression
Now use the logistic command to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C. Similar to the command in linear regression, we add additional variables after the exposure on the command line. This may be obtained as follows:
logistic prior_cvd hyperten hdlc
Note that the output gives us the odds ratio for ‘hyperten’ adjusted for ‘hdlc’ but it also gives us the odds ratios for ‘hdlc’ adjusted for ‘hyperten’.
You will run and interpret this command in the questions below.
We can control for additional variables as well by adding additional variables to the command line as we did above with ‘hdlc’. Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent (or adjusted for) of HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.
Question C1.4b: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?
C1.4b Answers
Answer C1.4b:
logistic prior_cvd i.hyperten hdlc
Logistic regression                   Number of obs =  4,293
                            LR chi2(2)   =  86.59
                            Prob > chi2  = 0.0000
Log likelihood = -2371.6353 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Pseudo R2 Â Â = 0.0179
——————————————————————————
  prior_cvd | Odds ratio  Std. err.    z   P>|z|   [95% conf. interval]
————-+—————————————————————-
 1.hyperten |  1.551415  .1191582   5.72  0.000   1.334598   1.803456
    hdlc |  .4996367  .0501632   -6.91  0.000   .4103876   .6082952
    _cons |  .6155069   .070238   -4.25  0.000   .4921515   .7697807
——————————————————————————
Note: _cons estimates baseline odds.
Here, the odds ratio for prior CVD is 1.55 times greater for participants that are hypertensive rather than normotensive, once adjusted for HDL-C (95% CI: 1.33-1.80).
C1.4 PRACTICAL: SPSS
Controlling for confounders in a logistic regression
You can control for additional variables by adding more variables to the ‘Covariates’ box within the logistic regression analysis pop up. Here we are going to use this to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C.

Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent (or adjusted for) of HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.
Question C1.4a: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?
Answer
The output table will look like the below.

Here, the odds ratio for prior CVD is 1.55 times greater for participants that are hypertensive rather than normotensive, once adjusted for HDL-C (95% CI: 1.34-1.80).
C1.4 PRACTICAL: R
Controlling for confounders in a logistic regression
Now use the ‘glm() ‘ command to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C. This may be obtained as follows:
model <- glm(prior_cvd ~ hypertens + hdlc, family = binomial(link = “logit”), data = df)
Note that the output gives us the odds ratio for ‘hyperten’ adjusted for ‘hdlc’ but it also gives us the odds ratios for ‘hdlc’ adjusted for ‘hyperten’.
You will run and interpret this command in the questions below.
We can control for additional variables as well by adding additional variables to the command line as we did above with ‘hdlc’. Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent of (or adjusted for) HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.
- Question C1.4b: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?
Answer
Answer C1.4a:
> model8 <- glm(prior_cvd ~ hyperten + hdlc, family = binomial(link = “logit”), data = df)
> summary(model8)
Call:
glm(formula = prior_cvd ~ hyperten + hdlc, family = binomial(link = “logit”), data = df)
Deviance Residuals:Â
  Min    1Q  Median    3Q    Max Â
-1.1004 Â -0.7844 Â -0.6978 Â 1.2748 Â 2.3248 Â
Coefficients:
      Estimate Std. Error z value Pr(>|z|)  Â
(Intercept) Â -0.4722 Â Â 0.1144 Â -4.127 3.67e-05 ***
hyperten    0.4432   0.0770  5.756 8.59e-09 ***
hdlc     -0.7038   0.1007  -6.991 2.73e-12 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
  Null deviance: 4805.5  on 4265  degrees of freedom
Residual deviance: 4717.3 Â on 4263 Â degrees of freedom
AIC: 4723.3
Number of Fisher Scoring iterations: 4
> exp(cbind(coef(model8), confint(model8)))
Waiting for profiling to be done…
             2.5 %   97.5 %
(Intercept) 0.6236630 0.4984294 0.7805407
hyperten   1.5577272 1.3388762 1.8107175
hdlc     0.4946928 0.4055012 0.6017479Â
Here, the odds ratio for prior CVD is 1.56 times higher in those with hypertension compared to those without hypertension, after adjusting for HDL-C (OR: 1.56, 95% CI: 1.34-1.81).
This is crystal clear