Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Learning Outcomes

By the end of this section, students will be able to:

  • Control the association of an exposure for additional variables in a logistic regression model

You can download a copy of the slides here: Video C1.4

Video C1.4 – Controlling for Confounding in Logistic Regression (6 minutes)

C1.4 PRACTICAL: Stata

Controlling for confounders in a logistic regression

Now use the logistic command to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C.  Similar to the command in linear regression, we add additional variables after the exposure on the command line. This may be obtained as follows:

logistic prior_cvd hyperten hdlc

Note that the output gives us the odds ratio for ‘hyperten’ adjusted for ‘hdlc’ but it also gives us the odds ratios for ‘hdlc’ adjusted for ‘hyperten’.

You will run and interpret this command in the questions below.

We can control for additional variables as well by adding additional variables to the command line as we did above with ‘hdlc’. Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent (or adjusted for) of HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.

Question C1.4b: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?

C1.4b Answers

Answer C1.4b:

logistic prior_cvd i.hyperten hdlc

Logistic regression                                     Number of obs =  4,293
                                                        LR chi2(2)    =  86.59
                                                        Prob > chi2   = 0.0000
Log likelihood = -2371.6353                             Pseudo R2     = 0.0179

——————————————————————————
   prior_cvd | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
————-+—————————————————————-
  1.hyperten |   1.551415   .1191582     5.72   0.000     1.334598    1.803456
        hdlc |   .4996367   .0501632    -6.91   0.000     .4103876    .6082952
       _cons |   .6155069    .070238    -4.25   0.000     .4921515    .7697807
——————————————————————————
Note: _cons estimates baseline odds.

Here, the odds ratio for prior CVD is 1.55 times greater for participants that are hypertensive rather than normotensive, once adjusted for HDL-C (95% CI: 1.33-1.80).

C1.4 PRACTICAL: SPSS

Controlling for confounders in a logistic regression

You can control for additional variables by adding more variables to the ‘Covariates’ box within the logistic regression analysis pop up. Here we are going to use this to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C.

Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent (or adjusted for) of HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.

Question C1.4a: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?

Answer

The output table will look like the below.

Here, the odds ratio for prior CVD is 1.55 times greater for participants that are hypertensive rather than normotensive, once adjusted for HDL-C (95% CI: 1.34-1.80).

C1.4 PRACTICAL: R

Controlling for confounders in a logistic regression

Now use the ‘glm() ‘ command to obtain the odds ratio for prior CVD with hypertension, adjusted for HDL-C. This may be obtained as follows:

model <- glm(prior_cvd ~ hypertens + hdlc, family = binomial(link = “logit”), data = df)

Note that the output gives us the odds ratio for ‘hyperten’ adjusted for ‘hdlc’ but it also gives us the odds ratios for ‘hdlc’ adjusted for ‘hyperten’.

You will run and interpret this command in the questions below.

We can control for additional variables as well by adding additional variables to the command line as we did above with ‘hdlc’. Each variable we put in the model will be controlled for all the other variables in the model. That is, if we add BMI group to this logistic regression above, we would interpret the coefficient for hypertension as ‘the association of hypertension with the odds of prior CVD, independent of (or adjusted for) HDL-C and BMI group’. BMI group would likewise be interpreted as the association of BMI group with prior CVD, once adjusted for hypertension and HDL-C.

  • Question C1.4b: What are the odds of having prior CVD if a participant is hypertensive, once you have adjusted for their levels of HDL-C?
Answer

Answer C1.4a:

> model8 <- glm(prior_cvd ~ hyperten + hdlc, family = binomial(link = “logit”), data = df)
> summary(model8)

Call:
glm(formula = prior_cvd ~ hyperten + hdlc, family = binomial(link = “logit”), data = df)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1004  -0.7844  -0.6978   1.2748   2.3248  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.4722     0.1144  -4.127 3.67e-05 ***
hyperten      0.4432     0.0770   5.756 8.59e-09 ***
hdlc         -0.7038     0.1007  -6.991 2.73e-12 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4805.5  on 4265  degrees of freedom

Residual deviance: 4717.3  on 4263  degrees of freedom
AIC: 4723.3

Number of Fisher Scoring iterations: 4

> exp(cbind(coef(model8), confint(model8)))
Waiting for profiling to be done…
                          2.5 %    97.5 %
(Intercept) 0.6236630 0.4984294 0.7805407
hyperten    1.5577272 1.3388762 1.8107175
hdlc        0.4946928 0.4055012 0.6017479 

Here, the odds ratio for prior CVD is 1.56 times higher in those with hypertension compared to those without hypertension, after adjusting for HDL-C (OR: 1.56, 95% CI: 1.34-1.81).

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Questions or comments?x