Here we are going to practice using F-tests to help choose the best fitting model for our data.

You can download a copy of the slides here: B1.5 Testing Regression Coefficients

**Video B1.5 – Testing Regression Coefficients (7 minutes)**

#### When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.

To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.

## B1.5 PRACTICAL: Stata

We use the post-estimation command â€˜testâ€™ in Stata. We run this command after a regression command, specifying exactly which coefficients we want to test:

Â Â regress sbp ib2.bmi_grp4 ldlc i.currsmoker

test 1.bmi_grp4 3.bmi_grp4 4.bmi_grp4

Notice that â€˜bmi_grp4â€™ has value labels, but we need to specify the numeric values in the original variable in order to run the â€˜testâ€™ command. We do not specify the reference group (2.bmi_grp4).

Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).

__Question B1.5:__**Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).**

**Answer**

quietly: regress sbp ib2.bmi_grp4 ldlc i.currsmoker

test ldlc 1.currsmoker

You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (p<0.01). These variables are significantly contributing to model fit.

If you type â€˜help testâ€™ into Stata, you can see there are many more uses to this flexible command.

## B1.5 PRACTICAL: SPSS

***Coming Soon***

**Answer**

Here is the answer

## B1.5 PRACTICAL: R

When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.

To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.

We can do this in R using the â€˜linearHypothesisâ€™ function within the package â€˜carâ€™:

install.packages(“car”)

library(car)

linearHypothesis(fit5, c(“bmi_fact1=0”, “bmi_fact3=0”, “bmi_fact4=0”))

Linear hypothesis test

Hypothesis:

bmi_fact1 = 0

bmi_fact3 = 0

bmi_fact4 = 0

Model 1: restricted model

Model 2: sbp ~ bmi_fact + ldlc + currsmoker

Â Res.Df Â Â RSS Df Sum of Sq Â Â Â F Â Pr(>F) Â Â

1 Â 4268 1310900 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â

2 Â 4265 1303805 Â 3 Â Â Â 7095 7.7364 3.76e-05 ***

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

The output reveals that theÂ F-statistic for this joint hypothesis test (that all coefficients of the BMI variable are equal to 0) is aboutÂ 7.7Â and the correspondingÂ p-value isÂ <0.001.

Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).

__Question B1.5:__**Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).**

**Answer**

> linearHypothesis(fit5, c(“ldlc=0”, “currsmoker=0”))

Linear hypothesis test

Hypothesis:

ldlc = 0

currsmoker = 0

Model 1: restricted model

Model 2: sbp ~ bmi_fact + ldlc + currsmoker

Â Res.Df Â Â RSS Df Sum of Sq Â Â Â F Â Pr(>F) Â Â

1 Â 4267 1306812 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â

2 Â 4265 1303805 Â 2 Â Â 3007.2 4.9186 0.007351 **

—

Signif. codes: Â 0 â€˜***â€™ 0.001 â€˜**â€™ 0.01 â€˜*â€™ 0.05 â€˜.â€™ 0.1 â€˜ â€™ 1

You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (F[2,4265]=4.92, p<0.01). These variables are significantly contributing to model fit.

You can download a copy of the slides here: B1.5b Presentation of Regression Results

**Video B1.5b – Presentation of Regression Results (6 minutes)**