Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited – Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman’s Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz
  50. A Note about the Fossa Certificate

The Whitehall FoSSA Study is a simulated cohort study similar to the original Whitehall Study of Civil Servants, set up in the 1960s, which followed London-based male civil servants with a view to investigating cardiovascular disease and mortality. Participants from the original Whitehall cohort in the 1960s were flagged for mortality at the Office for National Statistics, which provided the date and cause of all deaths occurring until the end of September 2005. The (simulated) Whitehall FoSSA Study was conducted in 1997 to assess risk factors for cardiac and all-cause mortality in a subset of the original cohort that was still being followed. The Whitehall FoSSA Study contains information on 4,327 individuals that were followed-up from 1997 until 2005, and the variables are summarised in the table below. See Clarke et al. (2007), Arch Intern Med, 167(13) for more details on the real data that inspired this dataset.

Variable name Description Type of measure Coding
whl1_id Participant ID number Continuous
age_grp Age group (years) Categorical 1=60-70; 2=71-75; 3=76-80; 4=81-95
prior_cvd Prior CVD Binary 0=No; 1=Yes
prior_t2dm Prior Type 2 Diabetes Binary 0=No; 1=Yes
prior_cancer Prior Cancer Binary 0=No; 1=Yes
sbp Systolic Blood Pressure (mmHg) Continuous 86-230 mmHg
bmi Body Mass Index (BMI; kg/m2) Continuous 15-44 kg/m2
bmi_grp4 BMI, grouped Categorical 1=Underweight; 2=Normal; 3=Overweight; 4=Obese
hdlc HDL cholesterol (mmol/L) Continuous 0.5-3.07 mmol/L
ldlc LDL cholesterol (mmol/L) Continuous 1.05-6.81 mmol/L
chol Total cholesterol (mmol/L) Continuous 2.24-10.77 mmol/L
currsmoker Current smoker Binary 0= Not a current smoker, 1= Current smoker
frailty Summary frailty score Categorical 1= least frail quintile; 5= most frail quintile
vitd Vitamin D [25(OH)D] nmol/L Continuous 18.92-419.89 nmol/L
cvd_death Fatal CVD Binary 0=No; 1=Yes
death Death Binary 0=No; 1=Yes
fu_years Years of follow-up Continuous 0.03-8.5 years

As this is simulated data, some of the associations you may find between variables are not real, nor do they reflect what the current science suggests about risk factors for cardiovascular disease and mortality. Please do not use this dataset for anything beyond the training material in this course.

Stata Dataset

Whitehall_fossa.dta

XLSX Dataset

Whitehall_fossa.csv

👋 Before you go, leave an anonymous rating & feedback

Average rating 4.5 / 5. Vote count: 128

No votes so far! Be the first to rate this post.

Please share any positive or negative feedback you may have.

Feedback is completely anonymous

17 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Elizabeth

Interesting data set

Temesgen Habtu

Hello,
I am working with the Whitehall dataset in CSV format and I noticed that some continuous variables (HDLC, LDL, Cholesterol, VitD) are imported as string variables with nominal measure, likely due to non-numeric entries like N/A.
I tried converting them to numeric using syntax: COMPUTE HDLC_num = NUMBER(hdlc, F20.9). But I keep getting Warnings like #1102: “An invalid numeric field has been found. The result has been set to system-missing value”.
I believe the errors are due to the presence of non-numeric values (N/A) or hidden spaces, but I’m not sure how to properly clean or convert these variables so that I can set their measure as scale for analysis.
Could you advise the best approach to handle this conversion?
Thank you!

Temesgen Habtu

It worked after adding the variables and the data manually myself.

Sheila

Well designed

Audu

Insightful summary data set

Ibrahim

Well-structured and insightful summary

jackson1

Insightful

well designed

Amos

Insightful

Opolot Aedeke

an so expectant

Samuel

insightful

Affiong

Hello. I’m thrilled to be here and I hope that at the end of this course, statistics will be demystified. How long does the entire course last and is there a deadline for completion?

Olamide

What app was used in opening the file?

Abigail

I was able to download the stats dataset

In the CSV Dataset,
I noticed that those with cardiovascular death have bmi greater than 20 and this can be as a result of the obstruction of blood flow due to their excess adipose tissue.

But in the CVS Dataset,
I’m a bit confused as to why smokers also have bmi greater than 20, I was thinking they’re underweight since they tend to become dry overtime

dyn

Data is fictitious and for training purposes only

Victor

I am unable to download the file

Goodnews

Uhoh

17
0
Questions or comments?x