Learning Outcomes
By the end of this section, students will be able to:
- Explain when and how to use post hoc testingÂ
- Explain the concept of multiple comparisons and be able to correct for it in their analysis
- Apply extensions to the basic ANOVA test and interpret their results
- Explain when and how to use repeated measures statistics
You can download a copy of the slides here:Â B2.3 Two-way ANOVA
B2.3 PRACTICAL: R
We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.
To demonstrate this, we are going to use a two-way ANOVA to examine multiple explanatory variables and interactions between them.
To continue our example, we want to examine if there are significant differences in the mean SBP between BMI groups that also smoke (or not).
For a two-way ANOVA the command is the same as in B2.1, but with a second variable:
white.data$smoked<-factor(white.data$currsmoker)
model2<- aov(sbp~bmi_fact+smoked, data=white.data)
To make this a two-way factorial ANOVA (i.e with an interaction between the independent variables), we substitute an ‘*’ for the ‘+’ sign:
model3<- aov(sbp~bmi_fact*smoked, data=white.data)
> summary(model3)
            Df Sum Sq Mean Sq F value  Pr(>F)  Â
bmi_fact          3   6415 2138.2  6.981 0.000111 ***
smoked            1    190  189.6  0.619 0.431468  Â
bmi_fact:smoked   3   1558  519.2  1.695 0.165772  Â
Residuals      4288 1313282  306.3                   Â
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
31 observations deleted due to missingness
- Question B2.3a: Run the two-way factorial ANOVA and interpret your output. How does this compare to the previous test on these relationships? Â
- Question B2.3b: When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships. Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?
Answer
Answer B2.3a: Test one, including the interaction term, shows a significant relationship between BMI group and SBP but current smoking and the interaction are non-significant. The interaction is adding the least to the model (has the highest P value).
Answer B2.3b:
> model2<- aov(sbp~bmi_fact+smoked, data=white.data)
> summary(model2)
 Df Sum Sq Mean Sq F value  Pr(>F)  Â
bmi_fact      3   6415 2138.2  6.978 0.000111 ***
smoked        1    190  189.6  0.619 0.431580  Â
Residuals  4291 1314840  306.4                   Â
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
31 observations deleted due to missingness
When we remove the interaction term we see the BMI grouping is still showing a significant relationship on its own, but that current smoker is not. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6
Minimal Adequate Model
As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.
B2.3 PRACTICAL: Stata
We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.
To demonstrate this, we are going to use a two-way ANOVA to examine multiple explanatory variables and interactions between them.
For a two-way ANOVA the command is:
anova outcome_var var1 var2
Var1 and var2 are assumed to be categorical (but you can override this).
To continue our example, we want to examine if there are significant differences in the mean SBP between BMI groups that also smoke (or not).
To make this a two-way factorial ANOVA (i.e with an interaction between the independent variables), we type:
anova sbp bmi_grp4##currsmoker

- Question B2.3a: Run the two-way factorial ANOVA and interpret your output. How does this compare to the previous test on these relationships? Â
- Question B2.3b: When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships. Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?
Answer
Answer B2.3a: Test one, including the interaction term, shows a significant relationship between the group of explanatory variables and the dependant variables but each of the variables individually (and the interaction) are non-significant. The interaction is adding the least to the model (has the highest P value and the smallest F value).
Answer B2.3b:

When we remove the interaction term we see that the F value for the model overall increases, and the BMI grouping is now showing a significant relationship on its own. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6
Minimal Adequate Model
As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.
B2.3 PRACTICAL: SPSS
We can also use the ANOVA to further investigate the effect of different categorical explanatory variables.
To demonstrate this, we are going to go back to the General Linear Model commands in SPSS, which gives us the flexibility to add multiple explanatory variables. When we input just categorical variables into the GLM box it will automatically run an ANOVA for us.
Select
Analyze >> General Linear Model >> Univariate
As before put SBP in the Dependant Variable box, and BMI grouping and current smoker in the Fixed Factors box.
Now click on Model on the right hand side.
This time we want to select ‘Build terms’ at the top of the box, we then move each of the variables from the left hand side to the right. We also want to specify the interaction term. To do this hold down the shift key on your keyboard while selecting both terms in the left hand box and then press the blue arrow. Leave the drop down menu in the centre on the default option of ‘Interaction’ then press continue.
You can also use the buttons on the right hand side choose to add in post hoc tests for each explanatory variable, save your residuals, display parameter estimates, or any of the things we did in Module B1.

Run the test and interpret your output. How does this compare to the previous test on these relationships? Â
When we are using interaction terms like this if they do not contribute to the model (i.e. are not significant) we can remove them and see if this makes a difference to the other relationships.
Go back and run the test a second time but without the interaction between BMI group and current smoking included. What do you notice?
Answer

Test one, including the interaction term, shows a significant relationship between the group of explanatory variables and the dependant variables (in the corrected model) but each of the variables individually (and the interaction) are non-significant. The interaction is adding the least to the model (has the highest P value).

When we remove the interaction term we see that the F value for the model overall increases, and the BMI grouping is now showing a significant relationship on its own. If you go back to practical B1.4 you can compare these results to those from the multivariable linear regression which included these two variables plus LDL-C. You can see that ANOVA without the interaction, is very similar to the linear regression. More on this in section B2.6
Minimal Adequate Model
As current smoker is also non-significant on its own, so you could choose to remove this as well and see what happens to the model. This is a simplified method of model optimisation known as the Minimal Adequate Model, meaning that you remove variables in order from least significant to most significant, until you end up with the simplest model you can. This can be a helpful process if you need to mentally work through your model and understand what each part is contributing. You never want to remove a variable that is significant on its own, or part of a significant interaction, but you can see how removing other non-significant variables can change the overall model. You can do this with any type of General Linear Model.