## Learning Outcomes

By the end of this section, students will be able to:

- Explain the importance of the parametric assumptions and determine if they have been met
- Explain the basic principles of rank based non-parametric statistical tests
- Describe the use of a range of common non-parametric tests
- Conduct and interpret common non-parametric tests

You can download a copy of the slides here: B3.1 The Parametric Assumptions

## B3.1 PRACTICAL: RÂ

Part 1: Normality Testing

We will first load the FoSSA mouse dataset, which we will be analysing.

We will need to load the dplyr package for data manipulation, and rstatix for the functions required to perform the tests.

We want to run aÂ **Shapiro-Wilk Test **to examine Normality by each of the strain groups. We must first create a new dataset grouping by Strain, and then use this as the input for our test. To create the dataset, we will run:

> mice_g <- group_by(mice, Strain)

Now, we can test the normality of the three weight variables within each strain group. We specify the dataset, and then the variables we want to test:

> shapiro_test(data_g, Weight_baseline, Weight_mid, Weight_end)

This will run the **Shapiro-Wilk test **to examine the distribution of the three weight variables within each strain group.

*Question B3.1a:* From the output in R, what can be concluded about the distribution of these variables?

**Answer**

The RStudio output will look like this:

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

Part 2: Homogeneity of Variances

In order to test for homogeneity of variance, we must be testing with respect to a factor variable. We wish to test this by strain group, so must convert this variable:

> mice$Strain_group <- as.factor(mice$Strain_group)

Now we can use the levene_test function to test the homogeneity of the variances of baseline weight among the strain groups. We specify the dataset, then the variables in the form dependent variable ~ grouping variable:

> levene_test(data, Weight_baseline ~ Strain_group, center = mean)

This will run the **Leveneâ€™s test **to examine the homogeneity of variances of the baseline weight among the strain groups.

*Question B3.1b:* From the output in R, what can you conclude about the variance of the baseline weight?

**Answer**

The RStudio output will look like this:

This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.

## B3.1 PRACTICAL: Stata

__Part 1: Normality Testing__

Run theÂ **Shapiro-Wilk testÂ **to examine the distribution of the three weight variables within each strain group.

The command in Stata is â€˜swilkâ€™â€™:

bysort Strain_group: swilk Weight_baseline Weight_mid Weight_end

__Questions B3.1a:__**From the output, what can be concluded about the distribution of these variables?**

**Answer**

Two of the weight variables are statistically significant different (p<0.05) from a Gaussian or normal distribution (The Cdkn1a KO mice and the mid-point, and the n-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

__Part 2: Homogeneity of Variances__

Run theÂ **Leveneâ€™s testÂ **to examine the homogeneity of variances of the baseline weight among the strain groups.

We will use the following syntax to perform Leveneâ€™s Test:

robvar measurement_variable, by(grouping_variable)

The code in this case is:

robvar Weight_baseline, by( Strain_group)

__Question B3.1b:__**Â From the output, what can you conclude about the variance of the baseline weight?**

**Answer**

If we were considering a parametric test for these data, the row we would be interested in is the top one â€˜W0â€™, which is the test statistic for Leveneâ€™s Test centered at the mean.

This tells us that there is no statistically significant difference (p>0.05) in the variance of the mean baseline weight among the strain groups, so the assumption of homogeneity is met.

## B3.1 PRACTICAL: SPSS

__Part 1: Normality Testing__

Open the FoSSA mouse data set in SPSS.

For this we want to run the **Shapiro-Wilk test**

SelectÂ Â Analyze >> Descriptive Statistics >> Explore

Move the variables we want to test for normality into the â€˜Dependant Listâ€™ box. This is all of the continuous variables, in this case the mouse weights at baseline, midpoint, and end of the trial.

Move â€˜Strain_groupâ€™ into the â€˜Factor Listâ€™ box. This is important, as otherwise SPSS will treat each weight variable as one large group.

Click on the â€˜Plotsâ€™ button on the right-hand side and tick the box next to â€˜Normality plots with testsâ€™. At this point you can also deselect the other types of plots if you are not interested in these.

Press â€˜Continueâ€™ and this will take you back to the main â€˜Exploreâ€™ box. Here you can also choose what you would like to display, so you can decide here if you would like to look at statistics, or plots, or both.

Open the output window and review the â€˜Shapiro-Wilkâ€™ output table. What can be concluded about the distribution of these variables?

__Part 2: Homogeneity of Variances__

For this we want to run **Leveneâ€™s test.**

The easiest way to do this is SPSS is to run a one-way ANOVA (see B1.2: Differences Between Means) and select the option to also test for homogeneity of variances.

Run this test for the variable â€˜Weight_baselineâ€™ with â€˜Strain_groupâ€™ as the factor.

SelectÂ Â Â Analyse >> Compare Means and ProportionsÂ >> One-Way ANOVA

Click on â€˜Optionsâ€™ and tick the box next to â€˜Homogeneity of variance testâ€™.

Open the output window to view your results. What can you conclude about these data based on the results of this test?

**Answer**

__Part 1: Normality Testing__

In your output window you will see a table like the below. SPSS automatically conducts both tests, but we are interested in the **Shapiro-Wilk** half of the table.

Two of the weight variables are statistically significant different from a Gaussian or normal distribution (The Cdkn1a KO mice at the mid-point, and the N-RAS KO mice at the end of the trial). This would mean we need to use non-parametric tests when analysing these data.

__Part 2: Homogeneity of Variances__

If we were considering a parametric test for these data, the row we would be interested in is the top one â€˜Based on Meanâ€™. This row tells us that there is no statistically significant difference in variance about the mean for the three groups, so this assumption is met.

Do we have any certificate upon Completion?