Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Learning Outcomes

By the end of this section, students will be able to:

  • Explain the importance of the parametric assumptions and determine if they have been met
  • Explain the basic principles of rank based non-parametric statistical tests
  • Describe the use of a range of common non-parametric tests
  • Conduct and interpret common non-parametric tests

You can download a copy of the slides here: B3.1 The Parametric Assumptions

B3.1 PRACTICAL: R 

Part 1: Normality Testing

We will first load the FoSSA mouse dataset, which we will be analysing.

We will need to load the dplyr package for data manipulation, and rstatix for the functions required to perform the tests.

We want to run a Shapiro-Wilk Test to examine Normality by each of the strain groups. We must first create a new dataset grouping by Strain, and then use this as the input for our test. To create the dataset, we will run:

> mice_g <- group_by(mice, Strain)

Now, we can test the normality of the three weight variables within each strain group. We specify the dataset, and then the variables we want to test:

> shapiro_test(data_g, Weight_baseline, Weight_mid, Weight_end)

This will run the Shapiro-Wilk test to examine the distribution of the three weight variables within each strain group.

Question B3.1a: From the output in R, what can be concluded about the distribution of these variables?

Answer

The RStudio output will look like this:

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

Part 2: Homogeneity of Variances

In order to test for homogeneity of variance, we must be testing with respect to a factor variable. We wish to test this by strain group, so must convert this variable:

> mice$Strain_group <- as.factor(mice$Strain_group)

Now we can use the levene_test function to test the homogeneity of the variances of baseline weight among the strain groups. We specify the dataset, then the variables in the form dependent variable ~ grouping variable:

> levene_test(data, Weight_baseline ~ Strain_group, center = mean)

This will run the Levene’s test to examine the homogeneity of variances of the baseline weight among the strain groups.

Question B3.1b: From the output in R, what can you conclude about the variance of the baseline weight?

Answer

The RStudio output will look like this:

This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.

B3.1 PRACTICAL: Stata

Part 1: Normality Testing

Run the Shapiro-Wilk test to examine the distribution of the three weight variables within each strain group.

The command in Stata is ‘swilk’’:

bysort Strain_group: swilk Weight_baseline Weight_mid Weight_end

Questions B3.1a: From the output, what can be concluded about the distribution of these variables?

Answer

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

Part 2: Homogeneity of Variances

Run the Levene’s test to examine the homogeneity of variances of the baseline weight among the strain groups.

We will use the following syntax to perform Levene’s Test:

robvar measurement_variable, by(grouping_variable)

The code in this case is:

robvar Weight_baseline, by( Strain_group)

Question B3.1b: From the output, what can you conclude about the variance of the baseline weight?

Answer

If we were considering a parametric test for these data, the row we would be interested in is the top one ‘W0’, which is the test statistic for Levene’s Test centered at the mean.

This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.

B3.1 PRACTICAL: SPSS

Part 1: Normality Testing

Open the FoSSA mouse data set in SPSS.

For this we want to run the Shapiro-Wilk test

Select   Analyze >> Descriptive Statistics >> Explore

Move the variables we want to test for normality into the ‘Dependant List’ box. This is all of the continuous variables, in this case the mouse weights at baseline, midpoint, and end of the trial.

Move ‘Strain_group’ into the ‘Factor List’ box. This is important, as otherwise SPSS will treat each weight variable as one large group.

Click on the ‘Plots’ button on the right-hand side and tick the box next to ‘Normality plots with tests’. At this point you can also deselect the other types of plots if you are not interested in these.

Press ‘Continue’ and this will take you back to the main ‘Explore’ box. Here you can also choose what you would like to display, so you can decide here if you would like to look at statistics, or plots, or both.

Open the output window and review the ‘Shapiro-Wilk’ output table. What can be concluded about the distribution of these variables?

Part 2: Homogeneity of Variances

For this we want to run Levene’s test.

The easiest way to do this is SPSS is to run a one-way ANOVA (see B1.2: Differences Between Means) and select the option to also test for homogeneity of variances.

Run this test for the variable ‘Weight_baseline’ with ‘Strain_group’ as the factor.

Select    Analyse >> Compare Means and Proportions  >> One-Way ANOVA

Click on ‘Options’ and tick the box next to ‘Homogeneity of variance test’.

Open the output window to view your results. What can you conclude about these data based on the results of this test?

Answer

Part 1: Normality Testing

In your output window you will see a table like the below. SPSS automatically conducts both tests, but we are interested in the Shapiro-Wilk half of the table.

Two of the weight variables are statistically significant different from a Gaussian or normal distribution (The Cdkn1a KO mice at the mid-point, and the N-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

Part 2: Homogeneity of Variances

If we were considering a parametric test for these data, the row we would be interested in is the top one ‘Based on Mean’. This row tells us that there is no statistically significant difference in variance about the mean for the three groups, so this assumption is met.

👋 Before you go, please rate your satisfaction with this lesson

Ratings are completely anonymous

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

Please share any positive or negative feedback you may have.

Feedback is completely anonymous

Subscribe
Notify of
guest

1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
ABEL

Do we have any certificate upon Completion?

1
0
Questions or comments?x