Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Learning Outcomes

By the end of this section, students will be able to:

  • Explore the data with correlations and scatterplots.
  • Use an ANOVA to test for a difference in means across a categorical variable.
  • Conduct univariable and multivariable linear regression
  • Check the regression diagnostics of a linear model.

You can download a copy of the slides here: B1.4a Multiple Linear Regression

Video B1.4a – Introduction to Multiple Linear Regression (7 minutes)

You can download a copy of the slides here: B1.4b Controlling for Confounding

Video B1.4b – Controlling for Confounding (9 minutes)

B1.4 PRACTICAL: Stata

Multiple linear regression

How do we fit a model with more than one explanatory variable?

Here, we will include each covariate one at a time into the model, starting with BMI, then adding ‘ldlc’ and lastly adding ‘currsmoker’. We like to add variables one at time so that we can see how each new additional covariate affects the other variables already in the model- this gives us an idea about how interrelationships between variables  may underlie the associations in the model.

To build a multivariable model, the command is the same as before:

    regress outcome covariate1 covariate2 covariate3 [,options]

 Therefore, to examine the impact the association of BMI group on SBP, adjusted for LDL-C, we write:

regress sbp ib2.bmi_grp4 ldlc

The output is:

Being overweight compared with normal weight is associated with a 1.91 mmHg higher SBP on average, once adjusted for LDL-C (95% CI: 0.80-3.02, p=0.001). This is a slightly stronger association for the overweight BMI group than before we adjusted for LDL-C.

Note that each covariate in the model is adjusted for all other covariates in the model. So BMI group is adjusted for LDL-C, and LDL-C is adjusted for BMI group. In other words, we can see the association of BMI group independent of LDL-C, and vice versa.

Question B1.4a: What is the association of BMI group with SBP, once adjusted for LDL-C and current smoking?

Answer    

regress sbp ib2.bmi_grp4 ldlc i.currsmoker

On average, being obese is associated with a 3.7 mmHg higher SBP than being normal weight, once adjusted for LDL-C and current smoking (95% CI 1.74-5.63, p<0.001). This association was not affected by additional adjustment for current smoking.

Each covariate in the model can be interpreted as adjusted for the other covariates in the model. So being a current smoker is associated with a -0.56 mmHg lower SBP compared to non-smokers, once adjusted for LDL-C and BMI group. The 95% CI crosses the null value (which is 0 in this case, as we are not working with ratios; -2.15-1.02) so this association is not statistically significant.

B1.4 PRACTICAL: SPSS

How do we fit a model with more than one explanatory variable?

Here, we will include each covariate one at a time into the model, starting with BMI, then adding ‘ldlc’ and lastly adding ‘currsmoker’. We like to add variables one at time so that we can see how each new additional covariate affects the other variables already in the model- this gives us an idea as to the mechanisms underlying an association.
To build a multivariable model, the process is the same as before.

Select

Analyze >> General Linear Model >> Univariate

Place SBP in the Dependent box and BMI group (bmi_grp4) in the Fixed Factors as in the earlier exercise. Then add LCL-C (ldlc) this time to the Covariate box.

Don’t forget to make sure Parameter Estimates is selected in the Options tab.

NB: we only use multivariate if we have more than one dependant variable. As we are only examining the effects on one dependant variable (SBP) then this is still a univariate analysis. We will cover when you would use the multivariate in Module B2.

The output should be as follows.

Being obese compared with normal weight is associated with a 3.7 mmHg higher SBP on average, once adjusted for LDL-C (95% CI: 1.77-5.66, p<0.001). This is a slightly stronger association than before we adjusted for LDL-C.

Note that each covariate in the model is adjusted for all other covariates in the model. So BMI group is adjusted for LDL-C, and LDL-C is adjusted for BMI group. In other words, we can see the association of BMI group independent of LDL-C, and vice versa.

Now expand this to include current smoking into the analysis as another fixed factor.

To avoid SPSS getting carried away and measuring ‘interactions’, which we will cover in the next module, once you have set up the General Linear Model, click on ‘Model’ from the menu on the right hand side. Select ‘Build terms’ at the top, then ‘Main effects’ from the drop down menu in the middle. The finally move all of your explanatory variables into the Model box on the right hand side. Click Continue to go back to the main screen.

What is the association of BMI group with SBP, once adjusted for LDL-C and current smoking?

Answer

On average, being obese is associated with a 3.7 mmHg higher SBP than being normal weight, once adjusted for LDL-C and current smoking (95% CI 1.74-5.63, p<0.001). This association was not affected by additional adjustment for current smoking.

Each covariate in the model can be interpreted as adjusted for the other covariates in the model. So being a current smoker is associated with a -0.56 mmHg lower SBP compared to non-smokers, once adjusted for LDL-C and BMI group. The 95% CI crosses the null value (which is 0 in this case, as we are not working with ratios; -2.15-1.02) so this association is not statistically significant.

B1.4 PRACTICAL: R

Multiple linear regression

How do we fit a model with more than one explanatory variable?

Here, we will include each covariate one at a time into the model, starting with BMI, then adding ‘age_grp’ and lastly adding ‘currsmoker’. We like to add variables one at time so that we can see how each new additional covariate affects the other variables already in the model- this gives us an idea as to the mechanisms underlying an association.

To build a multivariable model, the command is the same as before:

my.fit <- lm(Y ~ X+ X2 + X3, data = my.data) 

Therefore, to examine the impact the association of BMI group on SBP, adjusted for LDL-C, we write:

fit4 <- lm(sbp ~ bmi_fact+ldlc, data=white.data)
summary(fit4)

Call:
lm(formula = sbp ~ bmi_fact + ldlc, data = white.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-43.769 -12.106  -1.685   9.805 100.780 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 133.0774     1.2194 109.131  < 2e-16 ***
bmi_fact1    -3.7999     2.5360  -1.498 0.134107    
bmi_fact3     1.9109     0.5656   3.379 0.000735 ***
bmi_fact4     3.7098     0.9921   3.739 0.000187 ***
ldlc         -1.0539     0.3444  -3.060 0.002227 ** 

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.48 on 4271 degrees of freedom
  (51 observations deleted due to missingness)
Multiple R-squared:  0.007247,    Adjusted R-squared:  0.006318 
F-statistic: 7.795 on 4 and 4271 DF,  p-value: 2.958e-06

> confint(fit4, level=0.95)
                  2.5 %      97.5 %
(Intercept) 130.6867084 135.4681420
bmi_fact1    -8.7717473   1.1719708
bmi_fact3     0.8020861   3.0196496
bmi_fact4     1.7648434   5.6548333
ldlc         -1.7291229  -0.3786859

Being overweight (BMI group 3) compared with normal weight (group 2) is associated with a 1.91 mmHg higher SBP on average, once adjusted for LDL-C (95% CI: 0.80-3.02, p=0.001). This is a slightly stronger association for the overweight BMI group than before we adjusted for LDL-C.

Note that each covariate in the model is adjusted for all other covariates in the model. So BMI group is adjusted for LDL-C, and LDL-C is adjusted for BMI group. In other words, we can see the association of BMI group independent of LDL-C, and vice versa.

Question B1.4a: What is the association of BMI group with SBP, once adjusted for LDL-C and current smoking?

Answer    

> fit5 <- lm(sbp ~ bmi_fact+ldlc+currsmoker, data=white.data)
> summary(fit5)

Call:
lm(formula = sbp ~ bmi_fact + ldlc + currsmoker, data = white.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-43.857 -12.133  -1.698   9.869 101.087 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 133.1638     1.2250 108.709  < 2e-16 ***
bmi_fact1    -3.8183     2.5372  -1.505 0.132420    
bmi_fact3     1.8963     0.5664   3.348 0.000820 ***
bmi_fact4     3.6897     0.9928   3.717 0.000205 ***
ldlc         -1.0536     0.3446  -3.057 0.002248 ** 
currsmoker   -0.5632     0.8072  -0.698 0.485360    

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 17.48 on 4265 degrees of freedom
  (56 observations deleted due to missingness)
Multiple R-squared:  0.007334,    Adjusted R-squared:  0.006171 
F-statistic: 6.303 on 5 and 4265 DF,  p-value: 7.781e-06

> confint(fit5, level=0.95)
                  2.5 %      97.5 %
(Intercept) 130.7622530 135.5653401
bmi_fact1    -8.7925679   1.1559657
bmi_fact3     0.7859074   3.0066019
bmi_fact4     1.7433621   5.6360128
ldlc         -1.7291747  -0.3779296
currsmoker   -2.1456862   1.0192539

On average, being obese is associated with a 3.7 mmHg higher SBP than being normal weight, once adjusted for LDL-C and current smoking (95% CI 1.74-5.63, p<0.001). This association was not affected by additional adjustment for current smoking.

Each covariate in the model can be interpreted as adjusted for the other covariates in the model. So being a current smoker is associated with a -0.56 mmHg lower SBP compared to non-smokers, once adjusted for LDL-C and BMI group. The 95% CI crosses the null value (which is 0 in this case, as we are not working with ratios; -2.15-1.02) so this association is not statistically significant.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Questions or comments?x