Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Here we are going to practice using F-tests to help choose the best fitting model for our data.

You can download a copy of the slides here: B1.5 Testing Regression Coefficients

Video B1.5 – Testing Regression Coefficients (7 minutes)

When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.

To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.

B1.5 PRACTICAL: Stata

We use the post-estimation command ‘test’ in Stata. We run this command after a regression command, specifying exactly which coefficients we want to test:

    regress sbp ib2.bmi_grp4 ldlc i.currsmoker

test 1.bmi_grp4 3.bmi_grp4 4.bmi_grp4

Notice that ‘bmi_grp4’ has value labels, but we need to specify the numeric values in the original variable in order to run the ‘test’ command. We do not specify the reference group (2.bmi_grp4).

Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).

Question B1.5: Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).

Answer

quietly: regress sbp ib2.bmi_grp4 ldlc i.currsmoker

test ldlc 1.currsmoker

You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (p<0.01). These variables are significantly contributing to model fit.

If you type ‘help test’ into Stata, you can see there are many more uses to this flexible command.

B1.5 PRACTICAL: SPSS

*Coming Soon*

Answer

Here is the answer

B1.5 PRACTICAL: R

When estimating the effect of a particular exposure, we have seen that it is important to include potential confounding variables in the regression model, and that failure to do so will lead to a biased estimate of the effect.

To assess if we have included important confounders in our model, we can run a statistical test to see if the extra coefficients are significantly contributing to the model fit. This is helpful for testing the overall inclusion of a categorical variable to a model (where some levels may have a significant association and other levels may not), or testing the addition of multiple variables to the model.

We can do this in R using the ‘linearHypothesis’ function within the package ‘car’:

install.packages(“car”)
library(car)

linearHypothesis(fit5, c(“bmi_fact1=0”, “bmi_fact3=0”, “bmi_fact4=0”))

Linear hypothesis test

Hypothesis:
bmi_fact1 = 0
bmi_fact3 = 0
bmi_fact4 = 0

Model 1: restricted model
Model 2: sbp ~ bmi_fact + ldlc + currsmoker

  Res.Df     RSS Df Sum of Sq      F   Pr(>F)    
1   4268 1310900                                 
2   4265 1303805  3      7095 7.7364 3.76e-05 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The output reveals that the F-statistic for this joint hypothesis test (that all coefficients of the BMI variable are equal to 0) is about 7.7 and the corresponding p-value is <0.001.

Whilst the main regression output shows that not all levels of BMI group are significantly associated SBP, the F-test shows that overall this variable is significantly explaining variance in the model (F[3,4265]=7.74, p<0.001).

Question B1.5: Run an F-test to assess if both LDL-C and current smoking are jointly contributing to model fit (i.e. test both these coefficients at the same time).

Answer

> linearHypothesis(fit5, c(“ldlc=0”, “currsmoker=0”))

Linear hypothesis test

Hypothesis:
ldlc = 0
currsmoker = 0

Model 1: restricted model
Model 2: sbp ~ bmi_fact + ldlc + currsmoker

  Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
1   4267 1306812                                
2   4265 1303805  2    3007.2 4.9186 0.007351 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You can see from the output that the null hypothesis is that both variables have a coefficient equal to 0. However, there is evidence to reject this null hypothesis (F[2,4265]=4.92, p<0.01). These variables are significantly contributing to model fit.

You can download a copy of the slides here: B1.5b Presentation of Regression Results

Video B1.5b – Presentation of Regression Results (6 minutes)

👋 Before you go, please rate your satisfaction with this lesson

Ratings are completely anonymous

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

Please share any positive or negative feedback you may have.

Feedback is completely anonymous

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Questions or comments?x