Back to Course

FoSSA: Fundamentals of Statistical Software & Analysis

0% Complete
0/0 Steps
  1. Course Information

    Meet the Teaching Team
  2. Course Dataset 1
  3. Course Dataset 2
  4. MODULE A1: INTRODUCTION TO STATISTICS USING R, STATA, AND SPSS
    A1.1 What is Statistics?
  5. A1.2.1a Introduction to Stata
  6. A1.2.2b: Introduction to R
  7. A1.2.2c: Introduction to SPSS
  8. A1.3: Descriptive Statistics
  9. A1.4: Estimates and Confidence Intervals
  10. A1.5: Hypothesis Testing
  11. A1.6: Transforming Variables
  12. End of Module A1
    1 Quiz
  13. MODULE A2: POWER & SAMPLE SIZE CALCULATIONS
    A2.1 Key Concepts
  14. A2.2 Power calculations for a difference in means
  15. A2.3 Power Calculations for a difference in proportions
  16. A2.4 Sample Size Calculation for RCTs
  17. A2.5 Sample size calculations for cross-sectional studies (or surveys)
  18. A2.6 Sample size calculations for case-control studies
  19. End of Module A2
    1 Quiz
  20. MODULE B1: LINEAR REGRESSION
    B1.1 Correlation and Scatterplots
  21. B1.2 Differences Between Means (ANOVA 1)
  22. B1.3 Univariable Linear Regression
  23. B1.4 Multivariable Linear Regression
  24. B1.5 Model Selection and F-Tests
  25. B1.6 Regression Diagnostics
  26. End of Module B1
    1 Quiz
  27. MODULE B2: MULTIPLE COMPARISONS & REPEATED MEASURES
    B2.1 ANOVA Revisited - Post-Hoc Testing
  28. B2.2 Correcting For Multiple Comparisons
  29. B2.3 Two-way ANOVA
  30. B2.4 Repeated Measures and the Paired T-Test
  31. B2.5 Repeated Measures ANOVA
  32. End of Module B2
    1 Quiz
  33. MODULE B3: NON-PARAMETRIC MEASURES
    B3.1 The Parametric Assumptions
  34. B3.2 Mann-Whitney U Test
  35. B3.3 Kruskal-Wallis Test
  36. B3.4 Wilcoxon Signed Rank Test
  37. B3.5 Friedman Test
  38. B3.6 Spearman's Rank Order Correlation
  39. End of Module B3
    1 Quiz
  40. MODULE C1: BINARY OUTCOME DATA & LOGISTIC REGRESSION
    C1.1 Introduction to Prevalence, Risk, Odds and Rates
  41. C1.2 The Chi-Square Test and the Test For Trend
  42. C1.3 Univariable Logistic Regression
  43. C1.4 Multivariable Logistic Regression
  44. End of Module C1
    1 Quiz
  45. MODULE C2: SURVIVAL DATA
    C2.1 Introduction to Survival Data
  46. C2.2 Kaplan-Meier Survival Function & the Log Rank Test
  47. C2.3 Cox Proportional Hazards Regression
  48. C2.4 Poisson Regression
  49. End of Module C2
    1 Quiz
Topic 17 of 49
In Progress

A2.5 Sample size calculations for cross-sectional studies (or surveys)

Learning Outcomes

By the end of this section, students will be able to:

  • Explain the key concept of power and what impacts it
  • Estimate the power of a given study
  • Estimate the sample size needed to test hypotheses in different study designs

You can download a copy of the slides here: A2.5: Sample size calculations for cross-sectional studies (or surveys)

Video A2.5 Sample Size Calculation for Cross-Sectional Studies (5 minutes)

A2.5 PRACTICAL: R

Example of estimating sample size for a hypothesis in a cross-sectional study


You have been asked to help with a power calculation for a cross-sectional study, to estimate the point prevalence of obesity within a population. A study five years ago in this population found that 30% of people were obese, but the government thinks this has increased by 10% (to a point prevalence of 40%). Estimate the sample size needed for this study, assuming that the previous point prevalence of 30% is your `null hypothesis’. You want 80% power.

You are calculating a sample size for one proportion here.

The command is now ‘pwr.p.test’:

> power8<-pwr.p.test(h=ES.h(p1=0.3, p2=0.4), power=0.8, sig.level=0.05)
> power8

proportion power calculation for binomial distribution (arcsine transformation)

          h = 0.2101589
n = 177.7096
sig.level = 0.05
power = 0.8
alternative = two.sided

You need about 178 participants in your study to estimate this prevalence.

Question A2.5: One researcher has suggested that the proportion of the population who is obese may actually have decreased by 10% in the last five years (i.e. to 20%). How would this change your estimate for the sample size needed?

Answer

You are calculating a sample size for one proportion here.

> power9<-pwr.p.test(h=ES.h(p1=0.3, p2=0.2), power=0.8, sig.level=0.05)
> power9

proportion power calculation for binomial distribution (arcsine transformation)

          h = 0.2319843
n = 145.8443
sig.level = 0.05
power = 0.8
alternative = two.sided

The estimated sample size has now reduced slightly, to 146.

Based the outputs above, we can conclude that more data are needed to detect a change in proportion from 0.3 to 0.4 than from 0.3 to 0.2. For a fixed absolute difference (here the absolute difference in proportions is 0.1), larger sample sizes are needed to obtain a given level of power as the proportions approach 0.5. This relationship is symmetrical around 0.5, as shown below:

> power9<-pwr.p.test(h=ES.h(p1=0.1, p2=0.2), power=0.8, sig.level=0.05)
> power9

proportion power calculation for binomial distribution (arcsine transformation)

          h = 0.2837941
n = 97.45404
sig.level = 0.05
power = 0.8
alternative = two.sided

> power10<-pwr.p.test(h=ES.h(p1=0.9, p2=0.8), power=0.8, sig.level=0.05)
> power10     

proportion power calculation for binomial distribution (arcsine transformation)

          h = 0.2837941
n = 97.45404
sig.level = 0.05
power = 0.8
alternative = two.sided

Recall that the standard error (se) of the sampling distribution of p is LaTeX: \sqrt{\frac{p\left(1-p\right)}{n}}.

As p gets closer to 0.5, the amount of variability increases (se is largest when p=0.5) and, therefore, more data are needed to detect a change in proportions of 0.1.

A2.5 PRACTICAL: Stata

Example of estimating sample size for a hypothesis in a cross-sectional study

You have been asked to help with a power calculation for a cross-sectional study, to estimate the point prevalence of obesity within a population. A study five years ago in this population found that 30% of people were obese, but the government thinks this has increased by 10% (to a point prevalence of 40%). Estimate the sample size needed for this study, assuming that the previous point prevalence of 30% is your `null hypothesis’.

You are calculating a sample size for one proportion here.

The command is:

power oneproportion 0.3, diff(0.1)

This could also be calculated using:

power oneproportion 0.3 0.4, power(0.8)

*–Estimated sample size: N = 172

Question A2.5: One researcher has suggested that the proportion of the population who is obese may actually have decreased by 10% in the last five years (i.e. to 20%). How would this change your estimate for the sample size needed?

Answer

You are calculating a sample size for one proportion here.

  power oneproportion 0.3, diff(-0.1)

Or alternatively:

power oneproportion 0.3 0.2

*– Estimated sample size: N = 153

The estimated sample size has now reduced slightly, to 153.

Based the outputs above, we can conclude that more data are needed to detect a change in proportion from 0.3 to 0.4 than from 0.3 to 0.2. For a fixed absolute difference (here the absolute difference in proportions is 0.1), larger sample sizes are needed to obtain a given level of power as the proportions approach 0.5. This relationship is symmetrical around 0.5, as shown below:

power oneproportion 0.1 0.2

*– Estimated sample size: N = 86

power oneproportion 0.9 0.8

*– Estimated sample size: N = 86

Recall that the standard error (se) of the sampling distribution of p is LaTeX: \sqrt{\frac{p\left(1-p\right)}{n}}. As p gets closer to 0.5, the amount of variability increases (se is largest when p=0.5) and, therefore, more data are needed to detect a change in proportions of 0.1.

A2.5 PRACTICAL: SPSS

Example of estimating sample size for a hypothesis in a cross-sectional study

You have been asked to help with a power calculation for a cross-sectional study, to estimate the point prevalence of obesity within a population. A study five years ago in this population found that 30% of people were obese, but the government thinks this has increased by 10% (to a point prevalence of 40%). Estimate the sample size needed for this study, assuming that the previous point prevalence of 30% is your `null hypothesis’.

Select

Analyze >> Power Analysis >> Proportions >> One Sample Binomial Test

Input the values form the scenario into the Power Analysis window as before. In this example the Population proportion is the predicted value of 40% (0.4) and the null value is the previous prevalence of 30% (0.3).

One researcher has suggested that the proportion of the population who is obese may actually have decreased by 10% in the last five years (i.e. to 20%). How would this change your estimate for the sample size needed?

Answer

For the first part of the question, when the hypothesis is that we will see a 40% prevalence, the output table will look like this.

The estimated sample size we need to have a power of 80% is 172.

In the second part of the question, when the hypothesis is that we will see a 20% prevalence, the output table will look like this.

The estimated sample size has now reduced slightly, to 153.

Based the outputs above, we can conclude that more data are needed to detect a change in proportion from 0.3 to 0.4 than from 0.3 to 0.2. For a fixed absolute difference (here the absolute difference in proportions is 0.1), larger sample sizes are needed to obtain a given level of power as the proportions approach 0.5. This relationship is symmetrical around 0.5, as shown below:

Recall that the standard error (se) of the sampling distribution of p is LaTeX: \sqrt{\frac{p\left(1-p\right)}{n}}. As p gets closer to 0.5, the amount of variability increases (se is largest when p=0.5) and, therefore, more data are needed to detect a change in proportions of 0.1

Subscribe
Notify of
guest

1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
David

Hello, Using this synthax power8<-pwr.p.test(h=ES.h(p1=0.3, p2=0.4), power=0.8, sig.level=0.05) to calculate the sample size

What is the considered value of precision(d)?

Thank you

1
0
Questions or comments?x