Learning Outcomes
By the end of this section, students will be able to:
- Explain the importance of the parametric assumptions and determine if they have been met
- Explain the basic principles of rank based non-parametric statistical tests
- Describe the use of a range of common non-parametric tests
- Conduct and interpret common non-parametric tests
You can download a copy of the slides here: B3.1 The Parametric Assumptions
B3.1 PRACTICAL: RÂ
Part 1: Normality Testing
We will first load the FoSSA mouse dataset, which we will be analysing.
We will need to load the dplyr package for data manipulation, and rstatix for the functions required to perform the tests.
We want to run a Shapiro-Wilk Test to examine Normality by each of the strain groups. We must first create a new dataset grouping by Strain, and then use this as the input for our test. To create the dataset, we will run:
> mice_g <- group_by(mice, Strain)
Now, we can test the normality of the three weight variables within each strain group. We specify the dataset, and then the variables we want to test:
> shapiro_test(data_g, Weight_baseline, Weight_mid, Weight_end)
This will run the Shapiro-Wilk test to examine the distribution of the three weight variables within each strain group.
Question B3.1a: From the output in R, what can be concluded about the distribution of these variables?
Answer
The RStudio output will look like this:

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.
Part 2: Homogeneity of Variances
In order to test for homogeneity of variance, we must be testing with respect to a factor variable. We wish to test this by strain group, so must convert this variable:
> mice$Strain_group <- as.factor(mice$Strain_group)
Now we can use the levene_test function to test the homogeneity of the variances of baseline weight among the strain groups. We specify the dataset, then the variables in the form dependent variable ~ grouping variable:
> levene_test(data, Weight_baseline ~ Strain_group, center = mean)
This will run the Levene’s test to examine the homogeneity of variances of the baseline weight among the strain groups.
Question B3.1b: From the output in R, what can you conclude about the variance of the baseline weight?
Answer
The RStudio output will look like this:

This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.
B3.1 PRACTICAL: Stata
Part 1: Normality Testing
Run the Shapiro-Wilk test to examine the distribution of the three weight variables within each strain group.
The command in Stata is ‘swilk’’:
bysort Strain_group: swilk Weight_baseline Weight_mid Weight_end
Questions B3.1a: From the output, what can be concluded about the distribution of these variables?
Answer

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.
Part 2: Homogeneity of Variances
Run the Levene’s test to examine the homogeneity of variances of the baseline weight among the strain groups.
We will use the following syntax to perform Levene’s Test:
robvar measurement_variable, by(grouping_variable)
The code in this case is:
robvar Weight_baseline, by( Strain_group)
Question B3.1b:Â From the output, what can you conclude about the variance of the baseline weight?
Answer

If we were considering a parametric test for these data, the row we would be interested in is the top one ‘W0’, which is the test statistic for Levene’s Test centered at the mean.
This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.
B3.1 PRACTICAL: SPSS
Part 1: Normality Testing
Open the FoSSA mouse data set in SPSS.
For this we want to run the Shapiro-Wilk test
Select  Analyze >> Descriptive Statistics >> Explore
Move the variables we want to test for normality into the ‘Dependant List’ box. This is all of the continuous variables, in this case the mouse weights at baseline, midpoint, and end of the trial.
Move ‘Strain_group’ into the ‘Factor List’ box. This is important, as otherwise SPSS will treat each weight variable as one large group.

Click on the ‘Plots’ button on the right-hand side and tick the box next to ‘Normality plots with tests’. At this point you can also deselect the other types of plots if you are not interested in these.

Press ‘Continue’ and this will take you back to the main ‘Explore’ box. Here you can also choose what you would like to display, so you can decide here if you would like to look at statistics, or plots, or both.
Open the output window and review the ‘Shapiro-Wilk’ output table. What can be concluded about the distribution of these variables?
Part 2: Homogeneity of Variances
For this we want to run Levene’s test.
The easiest way to do this is SPSS is to run a one-way ANOVA (see B1.2: Differences Between Means) and select the option to also test for homogeneity of variances.
Run this test for the variable ‘Weight_baseline’ with ‘Strain_group’ as the factor.
Select   Analyse >> Compare Means and Proportions >> One-Way ANOVA
Click on ‘Options’ and tick the box next to ‘Homogeneity of variance test’.

Open the output window to view your results. What can you conclude about these data based on the results of this test?
Answer
Part 1: Normality Testing
In your output window you will see a table like the below. SPSS automatically conducts both tests, but we are interested in the Shapiro-Wilk half of the table.

Two of the weight variables are statistically significant different from a Gaussian or normal distribution (The Cdkn1a KO mice at the mid-point, and the N-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.
Part 2: Homogeneity of Variances

If we were considering a parametric test for these data, the row we would be interested in is the top one ‘Based on Mean’. This row tells us that there is no statistically significant difference in variance about the mean for the three groups, so this assumption is met.
great module
great module. Learning somehing new.
Do we have any certificate upon Completion?