Here we are going to practice using F-tests to help choose the best fitting model for our data.
You can download a copy of the slides here: B1.5 Testing Regression Coefficients
Video B1.5 – Testing Regression Coefficients (7 minutes)
When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.
To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.
B1.5 PRACTICAL: Stata
We use the post-estimation command ‘test’ in Stata. We run this command after a regression command, specifying exactly which coefficients we want to test:
  regress sbp ib2.bmi_grp4 ldlc i.currsmoker

test 1.bmi_grp4 3.bmi_grp4 4.bmi_grp4

Notice that ‘bmi_grp4’ has value labels, but we need to specify the numeric values in the original variable in order to run the ‘test’ command. We do not specify the reference group (2.bmi_grp4).
Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).
Question B1.5: Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).
Answer
quietly: regress sbp ib2.bmi_grp4 ldlc i.currsmoker
test ldlc 1.currsmoker

You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (p<0.01). These variables are significantly contributing to model fit.
If you type ‘help test’ into Stata, you can see there are many more uses to this flexible command.
B1.5 PRACTICAL: SPSS
*Coming Soon*
Answer
Here is the answer
B1.5 PRACTICAL: R
When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.
To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.
We can do this in R using the ‘linearHypothesis’ function within the package ‘car’:
install.packages(“car”)
library(car)
linearHypothesis(fit5, c(“bmi_fact1=0”, “bmi_fact3=0”, “bmi_fact4=0”))
Linear hypothesis test
Hypothesis:
bmi_fact1 = 0
bmi_fact3 = 0
bmi_fact4 = 0
Model 1: restricted model
Model 2: sbp ~ bmi_fact + ldlc + currsmoker
 Res.Df   RSS Df Sum of Sq    F  Pr(>F)  Â
1 Â 4268 1310900 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
2 Â 4265 1303805 Â 3 Â Â Â 7095 7.7364 3.76e-05 ***
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The output reveals that the F-statistic for this joint hypothesis test (that all coefficients of the BMI variable are equal to 0) is about 7.7 and the corresponding p-value is <0.001.
Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).
Question B1.5: Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).
Answer
> linearHypothesis(fit5, c(“ldlc=0”, “currsmoker=0”))
Linear hypothesis test
Hypothesis:
ldlc = 0
currsmoker = 0
Model 1: restricted model
Model 2: sbp ~ bmi_fact + ldlc + currsmoker
 Res.Df   RSS Df Sum of Sq    F  Pr(>F)  Â
1 Â 4267 1306812 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
2 Â 4265 1303805 Â 2 Â Â 3007.2 4.9186 0.007351 **
—
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (F[2,4265]=4.92, p<0.01). These variables are significantly contributing to model fit.
You can download a copy of the slides here: B1.5b Presentation of Regression Results
Video B1.5b – Presentation of Regression Results (6 minutes)