Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Learning Outcomes

By the end of this section, students will be able to:

  • Explain when and how to use post hoc testing 
  • Explain the concept of multiple comparisons and be able to correct for it in their analysis
  • Apply extensions to the basic ANOVA test and interpret their results
  • Explain when and how to use repeated measures statistics

You can download a copy of the slides here: B2.3 Two-way ANOVA

B2.3 PRACTICAL: R

We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.

To demonstrate this, we are going to use a two-way ANOVA to examine multiple explanatory variables and interactions between them.

To continue our example, we want to examine if there are significant differences in the mean SBP between BMI groups that also smoke (or not).

For a two-way ANOVA the command is the same as in B2.1, but with a second variable:

white.data$smoked<-factor(white.data$currsmoker)

model2<- aov(sbp~bmi_fact+smoked, data=white.data)

To make this a two-way factorial ANOVA (i.e with an interaction between the independent variables), we substitute an ‘*’ for the ‘+’ sign:

model3<- aov(sbp~bmi_fact*smoked, data=white.data)

> summary(model3)

                        Df  Sum Sq Mean Sq F value   Pr(>F)   

bmi_fact           3    6415  2138.2   6.981 0.000111 ***

smoked             1     190   189.6   0.619 0.431468   

bmi_fact:smoked    3    1558   519.2   1.695 0.165772   

Residuals       4288 1313282   306.3                    

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

31 observations deleted due to missingness

  • Question B2.3a: Run the two-way factorial ANOVA and interpret your output. How does this compare to the previous test on these relationships?  
  • Question B2.3b: When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships. Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?
Answer

Answer B2.3a: Test one, including the interaction term, shows a significant relationship between BMI group and SBP but current smoking and the interaction are non-significant. The interaction is adding the least to the model (has the highest P value).

Answer B2.3b:

> model2<- aov(sbp~bmi_fact+smoked, data=white.data)

> summary(model2)

  Df  Sum Sq Mean Sq F value   Pr(>F)   

bmi_fact       3    6415  2138.2   6.978 0.000111 ***

smoked         1     190   189.6   0.619 0.431580   

Residuals   4291 1314840   306.4                    

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

31 observations deleted due to missingness

When we remove the interaction term we see the BMI grouping is still showing a significant relationship on its own, but that current smoker is not. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6

Minimal Adequate Model

As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.

B2.3 PRACTICAL: Stata

We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.

To demonstrate this, we are going to use a two-way ANOVA to examine multiple explanatory variables and interactions between them.

For a two-way ANOVA the command is:

anova outcome_var var1 var2

Var1 and var2 are assumed to be categorical (but you can override this).

To continue our example, we want to examine if there are significant differences in the mean SBP between BMI groups that also smoke (or not).

To make this a two-way factorial ANOVA (i.e with an interaction between the independent variables), we type:

anova sbp bmi_grp4##currsmoker

  • Question B2.3a: Run the two-way factorial ANOVA and interpret your output. How does this compare to the previous test on these relationships?  
  • Question B2.3b: When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships. Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?
Answer

Answer B2.3a: Test one, including the interaction term, shows a significant relationship between the group of explanatory variables and the dependant variables but each of the variables individually (and the interaction) are non-significant. The interaction is adding the least to the model (has the highest P value and the smallest F value).

Answer B2.3b:

When we remove the interaction term we see that the F value for the model overall increases, and the BMI grouping is now showing a significant relationship on its own. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6

Minimal Adequate Model

As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.

B2.3 PRACTICAL: SPSS

We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.

To demonstrate this, we are going to go back to the General Linear Model commands in SPSS, which gives us the flexibility to add multiple explanatory variables. When we input just categorical variables into the GLM box it will automatically run an ANOVA for us.

Select

Analyze >> General Linear Model >> Univariate

As before put SBP in the Dependant Variable box, and BMI grouping and current smoker in the Fixed Factors box.

Now click on Model on the right hand side.

This time we want to select ‘Build terms’ at the top of the box, we then move each of the variables from the left hand side to the right. We also want to specify the interaction term. To do this hold down the shift key on your keyboard while selecting both terms in the left hand box and then press the blue arrow. Leave the drop down menu in the centre on the default option of ‘Interaction’ then press continue.

You can also use the buttons on the right hand side choose to add in post hoc tests for each explanatory variable, save your residuals, display parameter estimates, or any of the things we did in Module B1.

Run the test and interpret your output. How does this compare to the previous test on these relationships?  

When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships.

Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?

Answer

Test one, including the interaction term, shows a significant relationship between the group of explanatory variables and the dependant variables (in the corrected model) but each of the variables individually (and the interaction) are non-significant. The interaction is adding the least to the model (has the highest P value).

When we remove the interaction term we see that the F value for the model overall increases, and the BMI grouping is now showing a significant relationship on its own. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6

Minimal Adequate Model

As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Questions or comments?x