## Learning Outcomes

By the end of this module, students will be able to:

- Explore the data with correlations and scatterplots.
- Use an ANOVA to test for a difference in means across a categorical variable.
- Conduct univariable and multivariable linear regression
- Check the regression diagnostics of a linear model.

You can download a copy of the slides here: B1.1 Correlation and Scatterplots

**Video B1.1a – Introduction (5 minutes)**

**Video B1.1b – Correlation (10 minutes)**

## B1.1 PRACTICAL: Correlations in Stata

**Scatterplots**

**Â **In this section we will use correlations and scatter plots to examine the relationship between two __continuous__ variables.

Let now assess if there is a relationship between age and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other The command is:

graph twoway scatter var1 var2 [, options]

The first variable you list will be the y axis, and the second variable listed is the on the x axis.

* Question B1.1a:* Make a scatterplot of variables SBP (y axis) and BMI. What do you notice?

**B1.1a Answer**

graph twoway scatter sbp bmi

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

**Correlations**

We can also quantitatively assess if there is any relationship between age and SBP by looking at their correlation. There are two commands you can choose from: ‘pwcorr’ and ‘correlate’. These two commands handle missing data differently. Specifically, if you want to look at the correlation matrix between multiple variables, the ‘pwcorr’ will use â€˜pairwise deletionâ€™ and the correlate command uses â€˜listwise deletionâ€™ or â€˜complete case analysisâ€™. The ‘pwcorr’ command will also provide a significance test for the correlation, with or without a Bonferroni correction.

Â Â pwcorr sbp bmi

__Question B1.1b__* :* Interpret the correlation coefficient between SBP and BMI.Â Â

**B1.1b Answer**

â€‚â€‚â€‚â€‚â€‚â€‚

The correlation between SBP and BMI is 0.09. Â This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).

## B1.1 PRACTICAL: Correlations in SPSS

**Scatterplots **

In this section we will use correlations and scatter plots to examine the relationship between two __continuous__ variables.

A scatterplot is a quick way to have a first impression on how variables may relate to each other.

Open the FoSSA Whitehall data in SPSS.

Select

Graphs >> Chart Builder

A warning on â€˜Define Variable Propertiesâ€™ will pop up. You should have properly categorised all of your variables within Module A1. So you can just press â€˜OKâ€™ to move on to creating a chart.

You will then see the â€˜Chart Builderâ€™ window open. Select Scatter/Dot from the â€˜Galleryâ€™ menu on the bottom left, then drag and drop this into your previws window in the centre.

Select the variables you want on each axis from the â€˜Variablesâ€™ menu on the left hand side and drag and drop them to the relevant axis.

Then press â€˜OKâ€™ at the bottom and the chart will appear in the Output window.

*Question B1.1a*: Make a scatterplot of variables SBP (y axis) and BMI (x axis). What do you notice?

**Correlations**

We can also quantitatively assess if there is any relationship between variables by looking at their correlation.

Select

AnalyzeÂ >> Correlate >> Bivariate

Move the two variables you are interested in into the Test Variables box.

If you put more than two variables into the Test Variables box, SPSS will perform the selected test of correction on all possible combinations.

Make sure â€˜Pearsonâ€™ is selected at the bottom of the box before you press â€˜OKâ€™ to run the test. This will run the standard Pearsonâ€™s product moment correlation coefficient.

*Question B1.1b: *Run the test and interpret the correlation coefficient between SBP and BMI.Â Â

**Answers**

B1.1a: Scatterplots

B1.1a: Scatterplots

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also, there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

You can double click on the chart in the output window to change axis labels, titles and colours if you wish.

**B1.1b: Correlations**

The correlation between SBP and BMI is 0.085.Â This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).

SPSS automatically conducts all of the correlations both ways and the correlation of each variable against itself. if this is confusing, you can get rid of this by clicking â€˜show only lower triangleâ€™ and then deselecting â€˜show diagonalâ€™ when setting up the test. Then your output will look like this.

## B1.1 PRACTICAL: Correlations in R

**Scatterplots **

In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.

Let us now assess if there is a relationship between BMI and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other. There are several ways to create a scatterplot in R. The basic function is ‘plot()‘, used as plot(x,y) where x and y denote the (x,y) points to plot.

white.data<-Whitehall_fossa

Â Â Â Â Â Â plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m^{2})”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)

__Question____ ____B____1.1__* a:* Make a scatterplot of variables SBP and BMI. What do you notice?

**B1.1a. Answer**

plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m^{2})”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)

The parameters ‘xlab‘ and ‘ylab‘ can be used to insert the names we want to see in the x-axis and y axis. If you do not set these parameters you will obtain the same plot without any label on the axis.

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

**Correlations**

We can also assess if there is any relationship between age and SBP by looking at their correlation. In R, you can use ‘cor()‘ to obtain correlations. Also ‘cor.test()‘ can be used if a test of association is needed.

In the presence of missing values it is important to set the argument ‘use‘ to specify the method for computing correlations otherwise ‘cor()‘ will not work. If ‘use‘ is set to ‘ use=”complete.obs“ ‘ then the correlation is computed after casewise deletion of missing values. If ‘use‘ is set to ‘use=”pairwise.complete.obs” ‘ then the correlation between each pair of variables is computed using all complete pairs of observations on those variables.

__Question B1.1b:__**Interpret the correlation coefficient between SBP and BMI.Â Â **

**B1.1b. Answer**

cor(white.data$bmi, white.data$sbp, use=”complete.obs”)

`[1] 0.08545547`

The correlation between SBP and BMI is 0.09 and it is a weak positive correlation.