Data Quiz | Groupby Summarize
Quiz Summary
0 of 30 Questions completed
Questions:
Information
You have already completed the quiz before. Hence you can not start it again.
Quiz is loading…
You must sign in or sign up to start the quiz.
You must first complete the following:
Results
Results
Time has elapsed
You have reached 0 of 0 point(s), (0)
Earned Point(s): 0 of 0, (0)
0 Essay(s) Pending (Possible Point(s): 0)
Average score |
|
Your score |
|
Categories
- ch04_ls05_groupby_summarize_data_quiz 0%
-
Daww you didn’t pass. Keep trying. You need to score at least 80% to pass.
-
Good! You scored up to 80%. You have won a bronze medal. This earns you 1 brainpoint! Go for gold (a perfect score) and get 100 brain points!
-
Very good! You scored up to 90%. You have won a silver medal. This earns you 10 brainpoints! Go for gold (a perfect score) and get 100 brain points!
-
Congrats! You scored 100%. You have won a gold medal. This earns you 100 brainpoints!
Pos. | Name | Entered on | Points | Result |
---|---|---|---|---|
Table is loading | ||||
No data available | ||||
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- Current
- Review
- Answered
- Correct
- Incorrect
-
Question 1 of 30
1. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1L-XbE6pAphrQrXnbRBeDC_UMqiarVkI2&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 79765 77 1 5 157 50 132 61 5.32 ## 2 616894 28 1 5 164. 77 100 70 6.01 ## 3 403048 36 1 5 164 58 124 71 4.69 ## 4 564476 38 1 5 168 79 104 70 4.68 ## 5 225430 31 1 8 160 46 111 62 4.7 ## 6 462177 44 1 2 170 68 131 94 5.36 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=15Ehiis9JZkNZTwepRp6s4814SdCkn4L1&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 52 male islam unmarried some often hopele… ## 2 23 transgender islam divorced or sepa… some none none ## 3 40 male islam unmarried often none none ## 4 31 male islam divorced or sepa… often unbear… hopele… ## 5 33 male islam divorced or sepa… omnipr… none discou… ## 6 35 male hindu unmarried often often hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 2 of 30
2. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1Tyedn372X6A9r6eyrEy6QOIIvFe4INAO&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 149444 43 2 10 156. 49.4 101 54 5.28 ## 2 247965 32 2 11 160. 56.6 109 72 4.97 ## 3 177179 48 1 5 162 58 122 63 4.55 ## 4 117062 34 2 3 164. 53 111 73 4.93 ## 5 51264 53 1 13 170. 77 145 83 5.29 ## 6 250814 65 1 4 166. 79.5 149 85 4.92 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1oTVTEyIIOdtaWgtYuaQBzMxTsrkdug0j&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicid…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 42 male hindu divorced or separated often none none ## 2 43 male islam unmarried none unbear… discou… ## 3 51 male islam unmarried often none none ## 4 42 male islam <NA> often unbear… hopele… ## 5 37 male islam unmarried none some none ## 6 31 female islam <NA> some none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 3 of 30
3. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1k8Nj8IN24yMTDdXjsIZmoeWUv9VZucdI&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 171692 36 1 5 166 64 124 73 4.7 ## 2 558221 40 2 8 162 57.5 97 77 5.2 ## 3 542551 27 1 5 172. 60 100 72 4.89 ## 4 271527 33 1 5 174 73 150 109 4.6 ## 5 639846 56 1 2 167 72 120 88 5.28 ## 6 20108 29 2 2 157 48 124 78 4.74 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1ZT9K-wywh3mQXu1CRGp93far-HOJvv0m&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 32 male islam <NA> none often discou… ## 2 40 male islam unmarried some none none ## 3 37 male islam unmarried some none none ## 4 38 male islam unmarried some some hopele… ## 5 35 male islam unmarried omnipr… none hopele… ## 6 35 transgender islam divorced or sepa… often none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 4 of 30
4. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1oIgNSbrEf6vt6wrQiZ91TrOrDxgaMTIV&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 638409 30 1 5 170 69 128 70 4.58 ## 2 483357 59 1 9 165 67.1 96 71 5.52 ## 3 479861 40 2 13 159 64.8 118 72 4.26 ## 4 616419 36 2 10 170 61.3 93 58 5.06 ## 5 15816 59 2 4 162 60 125 71 5.4 ## 6 485236 43 1 3 172 86.5 133 79 4.61 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1N_t8lGtxAvfolKAlYVdHGyUOPK3H44XD&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 23 male islam divorced or separated omnipres… none none ## 2 27 male islam divorced or separated omnipres… some none ## 3 60 male islam unmarried often often none ## 4 45 male islam <NA> some unbear… hopele… ## 5 18 male hindu divorced or separated some none none ## 6 58 male islam unmarried none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 5 of 30
5. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1SOEpjJbekKavbus8pmVrbZn3JlByqaar&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 357527 25 1 5 165 52 112 60 4.7 ## 2 61113 42 1 11 178. 66.9 120 62 5.58 ## 3 573053 33 1 5 171 56 117 76 5.16 ## 4 171806 30 1 13 176. 85.8 139 75 4.19 ## 5 235742 79 1 9 173 68.9 131 74 5.07 ## 6 528341 44 2 3 152. 54.5 117 81 4 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1wMI9E56p6zlkjARlvz6n3TViBiTZ17yn&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 44 male islam unmarried often often discou… ## 2 40 male islam unmarried some some discou… ## 3 26 male islam divorced or separated some some hopele… ## 4 42 male islam unmarried some some hopele… ## 5 18 male islam divorced or separated none none none ## 6 30 male islam divorced or separated omnipres… often hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 6 of 30
6. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1vz5NAPzZWfOIBq2VVXou-giLNLPfGjQS&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 199160 36 1 2 168 63 109 69 4.54 ## 2 81438 75 1 8 153 64.5 141 87 5.6 ## 3 13917 30 2 2 160 52 80 49 3.5 ## 4 406797 42 1 2 166 57 105 64 4.8 ## 5 275059 34 1 5 179 70 116 64 5.25 ## 6 478870 44 1 2 178 71 113 76 4.34 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1omEiCv3Vyi3xKfpiZVpXqqMlv-CnapeE&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 30 male islam unmarried some some discou… ## 2 35 male islam divorced or sepa… omnipr… unbear… hopele… ## 3 41 male islam divorced or sepa… omnipr… unbear… hopele… ## 4 35 female islam <NA> some none none ## 5 45 male islam unmarried none none none ## 6 28 transgender islam divorced or sepa… some none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 7 of 30
7. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1Ce4LYRJn9x8EYJE0IcMoXn-Tm7TXx6nW&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 296803 34 2 9 163 55.4 105 53 4.19 ## 2 439575 37 1 5 172 63 114 72 4.58 ## 3 592679 48 2 4 168 53.5 106 64 4.57 ## 4 630506 51 1 7 172. 89.5 138 93 4.8 ## 5 310076 33 2 4 156. 52.4 124 85 4.82 ## 6 685129 45 2 2 158 65 103 64 5.16 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1j9K8gcwbvUg6wr-Nu-kQdM2U0Ne0q5wy&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 29 male islam unmarried none none none ## 2 33 male islam divorced or separated omnipres… none discou… ## 3 36 male islam unmarried none some none ## 4 38 male islam unmarried omnipres… none none ## 5 49 male islam unmarried omnipres… often discou… ## 6 25 male islam divorced or separated none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 8 of 30
8. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1-5ggDkv1HNlety7YDY1YoyZIwGySEq2w&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 680086 57 1 7 172 66 122 76 4.9 ## 2 562561 46 2 13 160 43.3 114 73 4.04 ## 3 577010 47 1 5 164 62 132 83 5.19 ## 4 622152 61 2 5 147 47 126 90 5.57 ## 5 561049 47 1 3 167 73.5 134 97 4.21 ## 6 19302 34 1 9 167 64.8 128 74 5.22 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1l2Rnjj4DVKOIb4sg8LHDaEw0gXWSvsmc&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 50 male islam unmarried often some discou… ## 2 20 male islam divorced or separated omnipres… some none ## 3 48 male islam unmarried none some none ## 4 36 male islam unmarried some none none ## 5 29 male hindu unmarried omnipres… none none ## 6 43 male islam unmarried some some discou… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 9 of 30
9. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1O4THXncTPwojZdXPtXtkgR9mUf-tCJaw&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 429674 51 1 13 176 75 122 95 5.5 ## 2 257973 39 1 8 167 70.5 128 84 5.1 ## 3 664424 28 1 7 177 106. 111 57 4.5 ## 4 502875 44 1 2 164 56 105 51 4.88 ## 5 214107 40 1 2 177 65 123 88 3.63 ## 6 64582 40 1 3 176. 82.5 130 92 4.63 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1UTWtVT_pCrT3p0CRduow6HEfh5i1qQUw&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 26 male islam divorced or separated none some none ## 2 35 male islam unmarried omnipres… none hopele… ## 3 45 male islam unmarried none none discou… ## 4 44 male islam divorced or separated often none none ## 5 30 male islam <NA> none none none ## 6 32 male hindu unmarried some none hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 10 of 30
10. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1euYX2K1HZO6Hcs-nHsyZnzv32hqkwkuo&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 489528 61 1 5 177 73 112 73 4.7 ## 2 359151 58 1 5 172. 76.5 126 73 4.83 ## 3 683863 30 1 9 172. 58 126 79 5.3 ## 4 515576 37 2 9 165 56.1 100 55 3.99 ## 5 544759 56 2 4 158 53.6 94 62 4.36 ## 6 550681 36 2 7 166. 72.6 126 82 4.7 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1K8Zanf2UeUjQSNMkV0vduGEHatGEwN8-&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicid…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 24 female hindu unmarried often some discou… ## 2 36 female islam <NA> some none none ## 3 28 male islam divorced or separated often some discou… ## 4 28 male islam unmarried some none none ## 5 42 male islam divorced or separated often some hopele… ## 6 27 male hindu divorced or separated often often hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 11 of 30
11. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1eN8duup_zBsZHCrUllbqOtSByRVewcjb&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 212531 27 1 11 176. 62 142 82 4.4 ## 2 639573 53 1 4 164. 75.3 118 98 5.68 ## 3 62270 42 1 5 155 57 130 85 5.39 ## 4 238447 38 2 2 158 50.9 106 71 5 ## 5 94238 43 2 4 155 51 122 73 4.86 ## 6 393136 77 1 5 168. 67.7 144 73 6.79 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1tOmQSHf-p74L3jy8pvWBz-qD28he-Gvq&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 26 male islam divorced or separated some some discou… ## 2 55 male islam unmarried none none discou… ## 3 18 male islam divorced or separated none none none ## 4 60 male islam unmarried often often none ## 5 32 male islam unmarried often some discou… ## 6 25 male islam divorced or separated often none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 12 of 30
12. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1q9BX2W87_A5sMXKSzq7PikTqiD3h1IRE&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 386153 50 1 2 176 67 94 63 5.1 ## 2 431793 37 2 3 159 58.5 137 78 5.64 ## 3 136663 41 2 5 158 57 100 71 4.55 ## 4 638295 32 1 5 173 77 110 70 5.7 ## 5 293136 37 2 4 167 53 92 60 5.28 ## 6 305723 62 2 2 160 75 161 97 6.27 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1vLuXcJu9i2HeyqqRP5_YynzuwSxvvFqd&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 18 male islam divorced or separated none none none ## 2 41 male hindu <NA> some none none ## 3 25 male islam divorced or separated often some discou… ## 4 50 male islam unmarried some often none ## 5 25 male islam divorced or separated none none none ## 6 45 male islam unmarried none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 13 of 30
13. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1uCC4XLzBpQ41S6xsjjCn-NSWCPb2aJY4&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 655434 52 2 5 158 51 127 72 4.69 ## 2 214616 26 1 13 170 72.7 112 55 4.18 ## 3 374730 38 2 2 154 47 106 65 4.66 ## 4 220991 61 2 4 165 77 117 66 4.9 ## 5 501913 57 1 2 169 77 112 74 4.09 ## 6 568694 53 1 3 176. 63.5 117 81 4 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1ej8U0C55J2Q8FuDWNpFZ4stYZFMe07i-&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 50 male islam unmarried none none none ## 2 30 male islam unmarried omnipres… unbear… discou… ## 3 42 male islam unmarried none none none ## 4 24 male islam divorced or separated some none none ## 5 45 male islam unmarried none some hopele… ## 6 34 male islam unmarried some none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 14 of 30
14. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1jZqDUflUp_DDt_-Uc3Z5K9phKHsn-nX7&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 135801 26 1 2 170 57 101 57 4.46 ## 2 541546 33 2 16 162 51 105 63 4.05 ## 3 532868 30 2 6 166. 51 115 75 4.9 ## 4 303355 60 1 2 165 72 172 119 5.77 ## 5 247127 44 2 9 154. 63.8 106 66 5.12 ## 6 181008 29 1 14 175 63.7 122 77 4.7 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=15OZ_LHVNaB8E7fDtmaKqKYkkzKOX_G_d&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 50 male islam divorced or sepa… some often hopele… ## 2 20 male islam divorced or sepa… omnipr… some none ## 3 35 transgender islam divorced or sepa… none none none ## 4 26 male islam unmarried some none none ## 5 59 male islam <NA> omnipr… none none ## 6 26 female islam divorced or sepa… none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 15 of 30
15. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1LoNazeNc6KU_z7niUV1xHD1IeFVQNoAD&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 41967 41 1 3 174. 88.5 125 82 5.41 ## 2 350470 41 1 3 170 71.5 110 81 4.55 ## 3 210033 50 2 2 166 60 127 81 5.46 ## 4 54610 30 2 3 167 63 104 66 4.34 ## 5 600830 44 2 5 148 40 96 56 4.86 ## 6 178989 33 1 4 167 58 97 61 5.05 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1P_qkIgUvu0Smh0OjzgxRQPnEnmpdV40_&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicid…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 27 male islam unmarried often unbear… discou… ## 2 40 male islam unmarried some some discou… ## 3 50 male islam unmarried often some discou… ## 4 50 male islam divorced or separated omnipre… unbear… hopele… ## 5 45 male islam unmarried none none discou… ## 6 25 female islam unmarried some often hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 16 of 30
16. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1agaVYGN4WoI-yMBuRur0SXcwEdTzaH-J&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 52488 58 1 8 177 79.5 104 63 5.8 ## 2 473311 30 2 2 166 62 93 64 4.42 ## 3 490517 38 2 8 162 51 101 70 5 ## 4 241826 28 2 8 162 64.5 113 71 5.5 ## 5 549689 37 1 3 178. 77 130 76 4.21 ## 6 51591 34 2 8 156 73.5 107 73 5.3 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1CC8koEzywaF1e0erysITw6QmcSJ3xR-z&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 23 male islam divorced or sepa… none unbear… hopele… ## 2 22 male islam divorced or sepa… omnipr… none none ## 3 39 male islam unmarried often some discou… ## 4 27 transgender islam divorced or sepa… some none none ## 5 43 male islam divorced or sepa… often none none ## 6 36 male islam divorced or sepa… none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 17 of 30
17. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=19Y27rbMGThcQxr7HjuNgsys4BvI47v9h&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 417836 30 2 2 165 60 105 65 4.23 ## 2 165182 47 2 2 158 63 114 71 4.4 ## 3 464463 40 2 4 162. 49.1 104 71 4.63 ## 4 41744 47 2 9 161 54 113 72 4.6 ## 5 212879 34 1 10 167 73 116 69 4.55 ## 6 600568 45 2 3 162. 73 115 76 4.5 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=13rTbXYe-l0CihUa6yrS7t8qPy49FabFT&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 45 male islam unmarried some none discou… ## 2 31 male islam unmarried often often hopele… ## 3 26 male islam divorced or separated some some discou… ## 4 40 male islam unmarried some some hopele… ## 5 30 male islam divorced or separated some some hopele… ## 6 30 male islam unmarried omnipres… some none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 18 of 30
18. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1W2epKiDwrQrwJuhN8Cn_h8dKaNZhw1WP&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 362893 29 2 9 162. 52.9 130 89 5.3 ## 2 16959 40 1 9 163. 67.3 103 62 4.78 ## 3 320782 43 2 5 169 66 97 68 5 ## 4 682292 27 1 9 173 66.5 98 60 4.7 ## 5 277640 38 1 5 174. 69.1 116 61 5.12 ## 6 25665 38 2 4 166 62 114 84 4.63 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1x0UmQsQtwYj_W5RERLp8zl-elFUyunVe&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 64 male islam unmarried some some discou… ## 2 50 male islam divorced or separated some often hopele… ## 3 53 male islam unmarried often none none ## 4 30 male islam unmarried omnipres… often hopele… ## 5 28 male islam unmarried some some discou… ## 6 27 male hindu divorced or separated often unbear… hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 19 of 30
19. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1c--yNaWcIWEzsjyn9gXKXTc4nzZOk2p1&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 103267 34 1 5 173 79 129 88 4.61 ## 2 226786 34 1 5 168 80 122 71 4.48 ## 3 48039 65 1 3 172 72.5 134 89 4.82 ## 4 121837 56 1 8 169. 67.7 105 70 6.3 ## 5 154438 29 1 7 166. 59.5 129 70 5 ## 6 519713 48 2 4 174 60 106 61 4.63 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1fxAWZ1Mx9wNDRWb8FUWksRT7EBbztEeb&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 30 male islam unmarried some some discou… ## 2 38 male islam unmarried some some hopele… ## 3 30 male islam unmarried none some none ## 4 28 male hindu divorced or separated often none none ## 5 42 male islam unmarried some some discou… ## 6 60 male islam unmarried some none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 20 of 30
20. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1-Y8HFCgi2QaiLeO0jheBq6GUoCZIwTgK&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 375566 68 2 4 152 49 121 68 5.09 ## 2 536514 34 1 5 172. 74 102 59 4.9 ## 3 420329 32 2 2 167 55 99 63 4.74 ## 4 227317 42 1 3 160. 55.5 96 65 4.05 ## 5 290867 44 2 9 161. 65.7 100 71 4.24 ## 6 161198 26 1 9 179. 89.2 128 67 5.29 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1O6aGO0iO61nY6JKKXxGyyc0N5FYwWsw0&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 28 male islam divorced or separated omnipres… none none ## 2 65 male islam unmarried omnipres… often hopele… ## 3 32 male islam <NA> none often discou… ## 4 48 male islam unmarried some some none ## 5 44 male islam unmarried none none none ## 6 48 male islam unmarried some none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 21 of 30
21. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1vUJlMYwjEzrcJxBNhTtX1BI60gQbqhrE&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 478129 49 2 5 157 66 124 79 5.08 ## 2 463838 57 2 2 159 62 108 67 5.68 ## 3 407381 28 2 4 170 52 110 67 4.71 ## 4 672492 48 2 4 168. 73 120 67 4.87 ## 5 604343 26 2 13 162 51 112 56 4.77 ## 6 343647 41 1 3 180 71.5 121 81 3.95 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1FofiG_odhySq88XiD8F5fbfrY14bS9T_&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suici…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 42 male islam unmarried none none none ## 2 40 male islam <NA> some some hopele… ## 3 40 male islam <NA> often often discou… ## 4 45 male islam unmarried some none discou… ## 5 20 transgender islam divorced or sepa… some none none ## 6 30 male islam <NA> none none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 22 of 30
22. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1wRLaNq9TOeHudtxhpJjYiKbCRd9Q_zBT&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 313345 32 2 5 156 50 119 80 4.76 ## 2 365834 58 1 5 168 68 118 66 5.74 ## 3 79868 44 2 9 156 60.5 113 85 4.16 ## 4 553005 33 2 2 161 53 102 65 5.4 ## 5 606192 25 2 7 164. 56.4 108 66 4.37 ## 6 653575 32 1 9 164. 68.4 128 74 4.95 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1OHID2XQDInvEZ8CQsdgdWj5OX0x3sWVU&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 33 male islam divorced or separated omnipres… none discou… ## 2 38 male islam unmarried none none none ## 3 26 male islam divorced or separated none none none ## 4 35 male islam divorced or separated omnipres… none discou… ## 5 50 male islam divorced or separated omnipres… often hopele… ## 6 31 male islam unmarried often often hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 23 of 30
23. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1LxLZAHYYIq1xfovXzeXOQdWJMpY-I747&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 47891 39 1 8 175 81.5 133 90 5.6 ## 2 631807 51 2 5 163 57 101 62 4.81 ## 3 211689 32 1 3 177 56 108 73 4.24 ## 4 491279 36 1 13 174. 68 113 80 4.47 ## 5 29925 38 1 5 180 98.5 122 62 4.52 ## 6 345623 25 2 5 164 82 121 70 5.37 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1iIEDZoJWGDQI80PKiNv-IFbeZC-09yl9&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 49 male islam <NA> some some discou… ## 2 31 male islam unmarried often often hopele… ## 3 38 male islam unmarried often often hopele… ## 4 43 male islam unmarried none none none ## 5 50 male islam divorced or separated omnipres… some discou… ## 6 38 male hindu divorced or separated some some discou… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 24 of 30
24. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=10zOZZB4UWqYzBja9RTj6U-hVnjSJzG9z&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 472307 26 1 5 189 88 128 94 4.41 ## 2 407743 38 1 5 171 100 152 110 5.53 ## 3 162501 26 2 9 170. 60.5 113 89 5.25 ## 4 291002 42 1 5 168 56 103 73 3.95 ## 5 645826 35 2 11 154. 47.6 104 56 4.34 ## 6 29806 48 1 5 157 71 126 85 5.17 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1qeuB0UnIuefxtRMchAHEcNG70hGNFvmn&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 26 male islam divorced or separated some some hopele… ## 2 47 male islam unmarried often none none ## 3 32 male islam unmarried none some discou… ## 4 25 male islam divorced or separated often unbear… discou… ## 5 38 male islam unmarried none some discou… ## 6 28 male hindu divorced or separated often none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 25 of 30
25. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1C4d8rMmfIJSsA2C_S8XM_Hg7PcTI10P3&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 571635 36 1 9 171 62.9 132 77 3.82 ## 2 209228 58 1 5 175 67 140 100 5.57 ## 3 272028 33 2 3 160 57.5 102 66 4.46 ## 4 289439 39 1 8 180 80 118 90 5.5 ## 5 667292 27 1 7 172. 71 119 76 4.3 ## 6 154871 34 2 2 154 53 102 70 4.98 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1ibxtfuLwZYEgsBc-Rjd4MaGcl9sZcdM0&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 45 male islam unmarried some none none ## 2 55 male islam unmarried omnipres… some discou… ## 3 28 male islam unmarried some none none ## 4 42 male islam unmarried omnipres… unbear… none ## 5 25 male islam divorced or separated often unbear… discou… ## 6 26 male islam divorced or separated often often discou… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 26 of 30
26. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1p8CeHktxipYOVhVNKx8UubvM1QCgI9nv&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 594313 32 1 8 165 72 138 75 5.2 ## 2 580460 36 1 3 176. 85 130 63 5.42 ## 3 566312 34 1 10 174. 70.3 110 71 4.99 ## 4 685254 50 1 4 170. 66 107 61 4.84 ## 5 638346 40 2 2 154 61 112 83 4.91 ## 6 278580 49 2 2 159 58 154 102 3 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1Dwl2TZThkb6uWBavTrrasClfc5HXsJPI&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 28 male hindu divorced or separated often none none ## 2 27 male islam unmarried some none none ## 3 36 male islam <NA> often some discou… ## 4 47 male islam unmarried often none none ## 5 40 male islam unmarried none none discou… ## 6 35 male islam divorced or separated omnipres… none discou… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 27 of 30
27. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=16exai-0GoaALelNHhF5pl69EHXbYtmZC&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 18076 29 2 2 167 55 108 78 3.39 ## 2 282327 27 1 5 169 53 122 72 5.45 ## 3 636978 47 2 8 160. 51.8 131 78 5.1 ## 4 41309 44 1 2 167 73 110 86 4.9 ## 5 180654 35 2 8 162 51.6 102 62 5.3 ## 6 157004 36 1 5 157 46 98 63 4.73 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1N0g4CxfKvBjvkwWxanxxFsTwcA84FbiK&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicid…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 38 male islam unmarried omnipre… none none ## 2 48 male islam unmarried some some none ## 3 25 male islam divorced or separated often unbear… discou… ## 4 35 female islam unmarried some some discou… ## 5 27 male islam unmarried often unbear… discou… ## 6 34 male islam unmarried none some hopele… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 28 of 30
28. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1KwVj1Lfop-um57e1_dtQNHw6diDXaL3p&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 19088 33 1 9 175. 66.3 109 63 4.2 ## 2 188305 60 2 12 158. 63.2 137 66 5.39 ## 3 161109 38 1 3 170 73 101 80 4.28 ## 4 396827 53 2 8 168 65.5 112 80 5.5 ## 5 625187 37 1 3 181 62 130 73 3.9 ## 6 564408 30 1 3 169 73.5 102 77 4.74 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1uPyGPbTZVH7x8H5RasgsxJvlQBu8fspy&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 35 male islam divorced or separated omnipres… none discou… ## 2 42 male islam unmarried none none none ## 3 48 male islam unmarried some none none ## 4 36 male islam <NA> some often discou… ## 5 45 male islam <NA> some unbear… hopele… ## 6 60 male islam unmarried often often none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 29 of 30
29. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1nuUsWnmGM0w95sAOz0Bd6IOONYZ6nuYn&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 51688 39 1 5 182. 88.3 130 74 5.68 ## 2 69838 55 2 9 160 65.1 136 75 4.18 ## 3 232089 30 2 5 158 46.5 115 72 4.98 ## 4 607972 35 2 9 162. 47.2 108 88 3.9 ## 5 134388 28 1 7 177 72.5 126 72 5.1 ## 6 388216 32 2 2 160 71 99 70 5.13 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1pAcV6SrKodQS5S6rpM0MqcHGI8gs7tbN&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicidal_thoug…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 35 male islam unmarried none often hopele… ## 2 38 male islam unmarried some some discou… ## 3 37 male islam unmarried omnipresent some discou… ## 4 55 male islam unmarried none some discou… ## 5 47 male islam unmarried some some discou… ## 6 35 male islam unmarried some some discou… ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -
-
Question 30 of 30
30. Question
-
The overall goal of this quiz will be test your knowledge of group_by and summarize.
First, you will analyze a dataset sample of a population-based cohort in China: clinical information about adults at baseline was used to predict whether they developed diabetes after some years of followup.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) diab_china_dat_raw <- import("https://docs.google.com/uc?id=1JwUJB8HzWfQ4FBsz2ns_LPMHkz86XJ3u&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
diab_china_dat_raw
after import:head(diab_china_dat_raw)
## # A tibble: 6 × 24 ## id age_y gender_…¹ site heigh…² weigh…³ sbp_m…⁴ dbp_m…⁵ fpg_m…⁶ ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 316122 35 2 9 161. 50.8 112 69 4.87 ## 2 301601 43 2 2 163 57 117 78 5.01 ## 3 31825 47 2 9 160 57 118 74 5.65 ## 4 259699 33 2 9 164. 54 101 65 5.7 ## 5 242202 33 2 5 155 44 80 60 4.52 ## 6 195688 27 2 2 166 57 108 69 3 ## # … with 15 more variables: cholesterol_mmol_l <dbl>, ## # triglyceride_mmol_l <dbl>, hdl_c_mmol_l <dbl>, ldl_mmol_l <dbl>, ## # alt_u_l <dbl>, ast_u_l <dbl>, bun_mmol_l <dbl>, ccr_umol_l <dbl>, ## # fpg_of_final_visit_mmol_l <dbl>, ## # diabetes_diagnosed_during_followup_1_yes <dbl>, ## # censor_of_diabetes_at_followup_1_yes_0_no <dbl>, ## # year_of_followup <dbl>, … ## # ℹ Use `colnames()` to see all variable names
Second, you will be working with an interesting survey data. HIV-positive people were surveyed about socio-demographics and their feelings of depression, with the eye to understanding the causes of depression.
Click here to view and download the data. Or import it directly into R with the code below:
if(!require(pacman)) install.packages("pacman") pacman::p_load(rio) hiv_depression_india_raw <- import("https://docs.google.com/uc?id=1W0-YQ6QZYEm9zO3zOnBo3ftkENhtDg9J&export=download", format = "csv", setclass = "tibble")
Here are the top 6 rows of
hiv_depression_india_raw
after import:head(hiv_depression_india_raw)
## # A tibble: 6 × 7 ## age sex religion marital_status suicida…¹ sadness pesim…² ## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 55 male islam unmarried none some discou… ## 2 42 male islam unmarried some some discou… ## 3 40 male islam unmarried some some discou… ## 4 65 male islam unmarried omnipres… often hopele… ## 5 35 male islam unmarried none often hopele… ## 6 26 male islam divorced or separated omnipres… none none ## # … with abbreviated variable names ¹suicidal_thoughts_or_wishes, ## # ²pesimism
Part A : Summarize
1 Using the diabetes dataset, write up code to get the average of cholesterol over the entire study.
mean_cholesterol_result <- diab_china_dat_raw %>% (mean_cholesterol = (cholesterol_mmol_l, na.rm= ))
2 Using the diabetes dataset, what is the value of the mean cholesterol level (give the resulting number with one decimal)? Answer:
3 Using the diabetes dataset, following a similar method, give the median trigyceride level over all patients. (give the resulting number with one decimal) Answer:
4 Using the diabetes dataset, give the median age of this cohort. Answer:
5 Using the HIV dataset and similar to the previous question, give the median age of this cohort. Answer:
Part B : Group by
6 Using the diabetes dataset, by gender, calculate the average HDL-c level (mmol/L).
average_HDL_by_gender <- diab_china_dat_raw %>% (gender_1_male_2_female) %>% summarize(average_HDL = (hdl_c_mmol_l, ))
7 Using the diabetes dataset and the previous question, what is the average HDL-c level (mmol/L) for men (round to 2 decimals)?
8 Using the HIV dataset, calculate the mean age of the respondents, grouping by sadness feelings.
hiv_depression_india <- hiv_depression_india_raw %>% group_by( ) %>% summarize(average_age = )
Part C : Nested Group by
9 Using the diabetes dataset, group by the smoking status and the drinking status, to calculate the median LDL level (mmol/L).
diab_china_dat <- diab_china_dat_raw %>% group_by(smoking_status_1_current_smoker_2_ever_smoker_3_never_smoker, drinking_status_1_current_drinker_2_ever_drinker_3_never_drinker, ) %>% summarize(median_LDL = )
10 Using the HIV dataset, group by sex, pessimism feelings, and suicidal thoughts, to calculate the mean age of the respondents.
hiv_depression_india <- hiv_depression_india_raw %>% mutate(pesimism = as.factor( ), sex = ) %>% group_by( ) %>% summarize(average_age = )
Part D: Summarize with a condition
11 Using the HIV dataset, group by sadness feelings and sex, then use summarize() with a condition to count the number of respondents under 30.
hiv_depression_india <- hiv_depression_india_raw %>% mutate( , ) %>% group_by( ) %>% summarize(counts = ( < ))
12 Using the HIV dataset, group by sex, then use summarize() with a condition to count the number of respondents who have complete records for pesimism feelings.
Hint: complete records means the pesimism variable is not NA.
hiv_depression_india <- hiv_depression_india_raw %>% group_by(sex) %>% summarize(counts = sum(!is.na(pesimism)))
Part E : Count
13 Using the diabetes dataset, investigate how many people have family history, and how many do not.
counts_family_history <- diab_china_dat_raw %>% (family_histroy_of_diabetes_1_yes_0_no)
14 Using the diabetes dataset and similar to the previous question, how many patients are women in this cohort?
15 Using the HIV dataset, investigate by gender and marital status, how many people have suicidal thoughts.
counts_suicidal_thoughts <- hiv_depression_india_raw %>% mutate(suicidal_thoughts_or_wishes = (suicidal_thoughts_or_wishes), sex = , marital_status = ) %>% (suicidal_thoughts_or_wishes, sex, marital_status, )
Correct 32 / 32 PointsIncorrect / 32 Points -