Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz

Learning Outcomes

By the end of this module, students will be able to:

  • Explore the data with correlations and scatterplots.
  • Use an ANOVA to test for a difference in means across a categorical variable.
  • Conduct univariable and multivariable linear regression
  • Check the regression diagnostics of a linear model.

You can download a copy of the slides here: B1.1 Correlation and Scatterplots

Video B1.1a – Introduction (5 minutes)

Video B1.1b – Correlation (10 minutes)

B1.1 PRACTICAL: Correlations in Stata

Scatterplots

 In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.

Let now assess if there is a relationship between age and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other The command is:

graph twoway scatter var1 var2 [, options]

The first variable you list will be the y axis, and the second variable listed is the on the x axis.

Question B1.1a: Make a scatterplot of variables SBP (y axis) and BMI. What do you notice?

B1.1a Answer

graph twoway scatter sbp bmi

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

Correlations

We can also quantitatively assess if there is any relationship between age and SBP by looking at their correlation. There are two commands you can choose from: ‘pwcorr’ and ‘correlate’. These two commands handle missing data differently. Specifically, if you want to look at the correlation matrix between multiple variables, the ‘pwcorr’ will use ‘pairwise deletion’ and the correlate command uses ‘listwise deletion’ or ‘complete case analysis’. The ‘pwcorr’ command will also provide a significance test for the correlation, with or without a Bonferroni correction.

   pwcorr sbp bmi

Question B1.1b: Interpret the correlation coefficient between SBP and BMI.  

B1.1b Answer

     

The correlation between SBP and BMI is 0.09.  This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).

B1.1 PRACTICAL: Correlations in SPSS

Scatterplots

In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.

A scatterplot is a quick way to have a first impression on how variables may relate to each other.

Open the FoSSA Whitehall data in SPSS.

Select

Graphs >> Chart Builder

A warning on ‘Define Variable Properties’ will pop up. You should have properly categorised all of your variables within Module A1. So you can just press ‘OK’ to move on to creating a chart.

You will then see the ‘Chart Builder’ window open. Select Scatter/Dot from the ‘Gallery’ menu on the bottom left, then drag and drop this into your previws window in the centre.

Select the variables you want on each axis from the ‘Variables’ menu on the left hand side and drag and drop them to the relevant axis.

Then press ‘OK’ at the bottom and the chart will appear in the Output window.

Question B1.1a: Make a scatterplot of variables SBP (y axis) and BMI (x axis). What do you notice?

Correlations

We can also quantitatively assess if there is any relationship between variables by looking at their correlation.

Select

Analyze  >> Correlate >> Bivariate

Move the two variables you are interested in into the Test Variables box.

If you put more than two variables into the Test Variables box, SPSS will perform the selected test of correction on all possible combinations.

Make sure ‘Pearson’ is selected at the bottom of the box before you press ‘OK’ to run the test. This will run the standard Pearson’s product moment correlation coefficient.

Question B1.1b: Run the test and interpret the correlation coefficient between SBP and BMI.  

Answers
B1.1a: Scatterplots

spss.jpg

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also, there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

You can double click on the chart in the output window to change axis labels, titles and colours if you wish.

B1.1b: Correlations

The correlation between SBP and BMI is 0.085.  This is a weak, positive correlation, and it indicates that as one variable increases so the does the other one (to a small extent).

SPSS automatically conducts all of the correlations both ways and the correlation of each variable against itself. if this is confusing, you can get rid of this by clicking ‘show only lower triangle’ and then deselecting ‘show diagonal’ when setting up the test. Then your output will look like this.

B1.1 PRACTICAL: Correlations in R

Scatterplots

In this section we will use correlations and scatter plots to examine the relationship between two continuous variables.

Let us now assess if there is a relationship between BMI and SBP. A scatterplot is a quick way to have a first impression on how variables may relate to each other. There are several ways to create a scatterplot in R. The basic function is ‘plot()‘, used as plot(x,y) where x and y denote the (x,y) points to plot.

white.data<-Whitehall_fossa

           plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m2)”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)

Question B1.1a: Make a scatterplot of variables SBP and BMI. What do you notice?

B1.1a. Answer

plot(white.data$bmi, white.data$sbp, xlab=”BMI (kg/m2)”,ylab=”Systolic blood pressure (mm Hg)”, cex=0.8)

The parameters xlaband ‘ylab can be used to insert the names we want to see in the x-axis and y axis. If you do not set these parameters you will obtain the same plot without any label on the axis.

If we look at the plot we can see some vertical lines as BMI was collected at discrete values and each vertical line represent individuals with the same BMI. Also there does not appear to be an obvious relationship between the two variables as the dots all appear in a big clump in the middle, without any direction to them.

Correlations

We can also assess if there is any relationship between age and SBP by looking at their correlation. In R, you can use cor() to obtain correlations. Also ‘cor.test()‘ can be used if a test of association is needed.

In the presence of missing values it is important to set the argument ‘use‘ to specify the method for computing correlations otherwise ‘cor()‘ will not work. If ‘use‘ is set to ‘ use=”complete.obs ‘ then the correlation is computed after casewise deletion of missing values. If ‘use is set to use=”pairwise.complete.obs” ‘ then the correlation between each pair of variables is computed using all complete pairs of observations on those variables.

Question B1.1b: Interpret the correlation coefficient between SBP and BMI.  

B1.1b. Answer

cor(white.data$bmi, white.data$sbp, use=”complete.obs”)

[1] 0.08545547

The correlation between SBP and BMI is 0.09 and it is a weak positive correlation.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments
0
Questions or comments?x