Loops

R help (deprec)

Loops

Posted by Immaculate on December 16, 2022 at 8:21 am
I am trying to replicate some analysis I had done using STATA in R but I need your help. Is there a way I can use a for loop to create new variables whose values comprise adjustments of already existing variables? I was converting income from Uganda shillings to USD but the rates varied at baseline, midline and endline periods. I used “mutate” and “case_when” for each variable but found the code to be lengthy and thought a loop, which I used in STATA, would make my work easier. I have attached screenshots of the STATA code and R code for your reference.
do-file.png 21 KB Image File - Click to view

Copy Download Link
r-code.png 43 KB Image File - Click to view

Copy Download Link
Immaculate replied 1 year, 4 months ago 2 Members · 5 Replies
5 Replies

Kene David

Administrator

December 16, 2022 at 3:57 pm

Hello! Here are some possible solutions.


library(tidyverse)

library(tibble)

Define the data


df ﹤- 

  tribble(

  ~respondent, ~eval_period, ~formal_emp_ugx, ~personal_emp_ugx, ~casual_emp_ugx,

      "Aaron",   "Baseline",             500,               600,            700,

        "Bob",    "Endline",             600,               700,            800,

    "Charlie",    "Midline",             700,               800,            900

  )

You can use the across function

There are two options. First you can use the across() function.


df %﹥% 

  mutate(across(.cols = c("formal_emp_ugx", "personal_emp_ugx", "casual_emp_ugx"),

                .fns = ~ case_when(eval_period == "Baseline" ~ .x/3700, 

                                   eval_period == "Endline" ~  .x/3800, 

                                   eval_period == "Midline" ~  .x/3500)))

## # A tibble: 3 × 5

##   respondent eval_period formal_emp_ugx personal_emp_ugx casual_emp_ugx

##   ﹤chr﹥    ﹤chr﹥          ﹤dbl﹥            ﹤dbl﹥          ﹤dbl﹥

## 1 Aaron      Baseline             0.135            0.162          0.189

## 2 Bob        Endline              0.158            0.184          0.211

## 3 Charlie    Midline              0.2              0.229          0.257

The tilde, ~ tells R, I am about to give you an operation to apply on many columns. And the .x signifier represents each of the columns listed (“formal_emp_ugx”, “personal_emp_ugx”, “casual_emp_ugx”).

Once this is done, you can rename the variables.

Here are some tutorials on how to use the across function:

Official documentation
Articla from Rebecca Barter
Video from IDG tech talk

Pivot longer

You could also first pivot the data to a longer format


df_long ﹤- df %﹥% 

  pivot_longer(cols = c("formal_emp_ugx", "personal_emp_ugx", "casual_emp_ugx"), 

               values_to = "ugx")

Then it becomes easy to do what you need:


df_long %﹥% 

  mutate(usd = case_when(eval_period == "Baseline" ~ ugx/3700, 

                         eval_period == "Endline" ~  ugx/3800, 

                         eval_period == "Midline" ~  ugx/3500))

## # A tibble: 9 × 5

##   respondent eval_period name               ugx   usd

##   ﹤chr﹥      ﹤chr﹥       ﹤chr﹥            ﹤dbl﹥ ﹤dbl﹥

## 1 Aaron      Baseline    formal_emp_ugx     500 0.135

## 2 Aaron      Baseline    personal_emp_ugx   600 0.162

## 3 Aaron      Baseline    casual_emp_ugx     700 0.189

## 4 Bob        Endline     formal_emp_ugx     600 0.158

## 5 Bob        Endline     personal_emp_ugx   700 0.184

## 6 Bob        Endline     casual_emp_ugx     800 0.211

## 7 Charlie    Midline     formal_emp_ugx     700 0.2  

## 8 Charlie    Midline     personal_emp_ugx   800 0.229

## 9 Charlie    Midline     casual_emp_ugx     900 0.257

And you can pivot back at the end:


df %﹥% 

  pivot_longer(cols = c("formal_emp_ugx", "personal_emp_ugx", "casual_emp_ugx"), 

               values_to = "ugx") %﹥% 

  mutate(usd = case_when(eval_period == "Baseline" ~ ugx/3700, 

                         eval_period == "Endline" ~  ugx/3800, 

                         eval_period == "Midline" ~  ugx/3500)) %﹥% 

  pivot_wider(names_from = name, 

              values_from = c(usd, ugx))

## # A tibble: 3 × 8

##   respondent eval_period usd_formal_em…¹ usd_p…² usd_c…³ ugx_f…⁴ ugx_p…⁵ ugx_c…⁶

##   ﹤chr﹥      ﹤chr﹥                 ﹤dbl﹥   ﹤dbl﹥   ﹤dbl﹥   ﹤dbl﹥   ﹤dbl﹥   ﹤dbl﹥

## 1 Aaron      Baseline              0.135   0.162   0.189     500     600     700

## 2 Bob        Endline               0.158   0.184   0.211     600     700     800

## 3 Charlie    Midline               0.2     0.229   0.257     700     800     900

## # … with abbreviated variable names ¹usd_formal_emp_ugx, ²usd_personal_emp_ugx,

## #   ³usd_casual_emp_ugx, ⁴ugx_formal_emp_ugx, ⁵ugx_personal_emp_ugx,

## #   ⁶ugx_casual_emp_ugx

Immaculate

Member
December 16, 2022 at 6:05 pm

Thanks! This has been helpful🙏🏾

Immaculate

Member

December 16, 2022 at 6:38 pm

The first solution replaces values of existing columns but doesn’t create new columns/ variables. Which changes can be made so that new columns are created?

Kene David

Administrator

December 16, 2022 at 6:45 pm

For this you can use the `.names` argument. Like this:

df %>% 
  mutate(across(.cols = c("formal_emp_ugx", "personal_emp_ugx", "casual_emp_ugx"),
                .fns = ~ case_when(eval_period == "Baseline" ~ .x/3700, 
                                   eval_period == "Endline" ~  .x/3800, 
                                   eval_period == "Midline" ~  .x/3500), 
                .names = "{.col}_usd"
                ))

That will leave you with names that look like “ugx_usd” though. To fix that you can use the `rename_with()` function which can rename many columns at the same time:

df %>% 
  mutate(across(.cols = c("formal_emp_ugx", "personal_emp_ugx", "casual_emp_ugx"),
                .fns = ~ case_when(eval_period == "Baseline" ~ .x/3700, 
                                   eval_period == "Endline" ~  .x/3800, 
                                   eval_period == "Midline" ~  .x/3500), 
                .names = "{.col}_usd"
                )) %>% 
  rename_with(.fn =  ~ str_replace_all(.x, "ugx_usd", "usd"))

Immaculate

Member
December 19, 2022 at 9:39 am

Thank you very much 🙏🏾

Community Forums

Kene David

Define the data

You can use the across function

Pivot longer

Immaculate

Immaculate

Kene David

Immaculate