Learning Outcomes
By the end of this module, students will be able to:
- Explore the data with correlations and scatterplots.
- Use an ANOVA to test for a difference in means across a categorical variable.
- Conduct univariable and multivariable linear regression
- Check the regression diagnostics of a linear model.
You can download a copy of the slides here: B1.1 Correlation and Scatterplots
Video B1.1a – Introduction (5 minutes)
Video B1.1b – Correlation (10 minutes)
B1.1 PRACTICAL: Correlations in Stata
Scatterplots
 In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.
Let now assess if there is a relationship between age and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other The command is:
graph twoway scatter var1 var2 [, options]
The first variable you list will be the y axis, and the second variable listed is the on the x axis.
Question B1.1a: Make a scatterplot of variables SBP (y axis) and BMI. What do you notice?
B1.1a Answer
graph twoway scatter sbp bmi

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.
Correlations
We can also quantitatively assess if there is any relationship between age and SBP by looking at their correlation. There are two commands you can choose from: ‘pwcorr’ and ‘correlate’. These two commands handle missing data differently. Specifically, if you want to look at the correlation matrix between multiple variables, the ‘pwcorr’ will use ‘pairwise deletion’ and the correlate command uses ‘listwise deletion’ or ‘complete case analysis’. The ‘pwcorr’ command will also provide a significance test for the correlation, with or without a Bonferroni correction.
  pwcorr sbp bmi
Question B1.1b: Interpret the correlation coefficient between SBP and BMI. Â
B1.1b Answer
 
     
The correlation between SBP and BMI is 0.09. Â This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).
B1.1 PRACTICAL: Correlations in SPSS
Scatterplots
In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.
A scatterplot is a quick way to have a first impression on how variables may relate to each other.
Open the FoSSA Whitehall data in SPSS.
Select
Graphs >> Chart Builder
A warning on ‘Define Variable Properties’ will pop up. You should have properly categorised all of your variables within Module A1. So you can just press ‘OK’ to move on to creating a chart.
You will then see the ‘Chart Builder’ window open. Select Scatter/Dot from the ‘Gallery’ menu on the bottom left, then drag and drop this into your previws window in the centre.
Select the variables you want on each axis from the ‘Variables’ menu on the left hand side and drag and drop them to the relevant axis.

Then press ‘OK’ at the bottom and the chart will appear in the Output window.
Question B1.1a: Make a scatterplot of variables SBP (y axis) and BMI (x axis). What do you notice?
Correlations
We can also quantitatively assess if there is any relationship between variables by looking at their correlation.
Select
Analyze >> Correlate >> Bivariate
Move the two variables you are interested in into the Test Variables box.
If you put more than two variables into the Test Variables box, SPSS will perform the selected test of correction on all possible combinations.
Make sure ‘Pearson’ is selected at the bottom of the box before you press ‘OK’ to run the test. This will run the standard Pearson’s product moment correlation coefficient.

Question B1.1b: Run the test and interpret the correlation coefficient between SBP and BMI. Â
Answers
B1.1a: Scatterplots
If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also, there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.
You can double click on the chart in the output window to change axis labels, titles and colours if you wish.
B1.1b: Correlations

The correlation between SBP and BMI is 0.085. This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).
SPSS automatically conducts all of the correlations both ways and the correlation of each variable against itself. if this is confusing, you can get rid of this by clicking ‘show only lower triangle’ and then deselecting ‘show diagonal’ when setting up the test. Then your output will look like this.

B1.1 PRACTICAL: Correlations in R
Scatterplots
In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.
Let us now assess if there is a relationship between BMI and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other. There are several ways to create a scatterplot in R. The basic function is ‘plot()‘, used as plot(x,y) where x and y denote the (x,y) points to plot.
white.data<-Whitehall_fossa
      plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m2)”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)
Question B1.1a: Make a scatterplot of variables SBP and BMI. What do you notice?
B1.1a. Answer
plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m2)”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)

The parameters ‘xlab‘ and ‘ylab‘ can be used to insert the names we want to see in the x-axis and y axis. If you do not set these parameters you will obtain the same plot without any label on the axis.
If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.
Correlations
We can also assess if there is any relationship between age and SBP by looking at their correlation. In R, you can use ‘cor()‘ to obtain correlations. Also ‘cor.test()‘ can be used if a test of association is needed.
In the presence of missing values it is important to set the argument ‘use‘ to specify the method for computing correlations otherwise ‘cor()‘ will not work. If ‘use‘ is set to ‘ use=”complete.obs“ ‘ then the correlation is computed after casewise deletion of missing values. If ‘use‘ is set to ‘use=”pairwise.complete.obs” ‘ then the correlation between each pair of variables is computed using all complete pairs of observations on those variables.
Question B1.1b: Interpret the correlation coefficient between SBP and BMI. Â
B1.1b. Answer
cor(white.data$bmi, white.data$sbp, use=”complete.obs”)
[1] 0.08545547
The correlation between SBP and BMI is 0.09 and it is a weak positive correlation.
This is simplified statistics