Tutorial #1 - Bias in the Labor Market

Treatment Effects - The Basics

Francis J. DiTraglia

University of Oxford

Bertrand & Mullainathan (2004)

Today we’ll replicate a famous paper on bias in the labor market: “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination”
To reduce clutter, I’ll refer to the paper as “BM” throughout.
Our first step is to sign up for a free account on posit cloud.

What you’ll see when you load a new project on posit cloud. There are three “panes”: the console (left), environment (top right), and file browser (bottom right)

Type commands at the console and hit return to run them. Objects appear in the environment pane. Here we have one object x and it equals 3.

Commands typed directly into the console are not reproducible, so we’ll store and run our R commands from a script: select File > New > R Script. Alternatively: Ctrl+Shift+N

A new pane has appeared: the editor pane. This is a text editor where we can enter and save R commands. (Yes, there are vim and emacs keybindings!)

A text file with a .R extension containing R commands is called an R script. Make sure to save your work! Your file will appear in the file browser.

Put your cursor on the line you want to run then type Ctrl-Return (Command-Return on Mac) to run it. Alternatively click “Run” or “Source” to run the whole script.

Getting weird results? Clear your environment by entering rm(list = ls()) or clicking on the broom icon on the environment pane. Then re-run your script from the top.

Search the help files using ? or the help tab in the bottom right pane. Be warned: the help files aren’t always user-friendly 😵

Reading in the data

We’ll use the tidyverse family of R packages.
When starting a new project, you’ll need to install this: install.packages('tidyverse')
I’ve posted a copy of the dataset from BM on my website: https://ditraglia.com/data/lakisha_aer.csv.
After installing tidyverse we can read the data into a tibble called bm using the read_csv() function as follows:

library(tidyverse)
bm <- read_csv('https://ditraglia.com/data/lakisha_aer.csv')

Display the Data

bm contains 4870 rows and 65 columns; each row is a fictitious job applicant.

bm

# A tibble: 4,870 × 65
   id    ad    education ofjobs yearsexp honors volunteer military empholes
   <chr> <chr>     <dbl>  <dbl>    <dbl>  <dbl>     <dbl>    <dbl>    <dbl>
 1 b     1             4      2        6      0         0        0        1
 2 b     1             3      3        6      0         1        1        0
 3 b     1             4      1        6      0         0        0        0
 4 b     1             3      4        6      0         1        0        1
 5 b     1             3      3       22      0         0        0        0
 6 b     1             4      2        6      1         0        0        0
 7 b     1             4      2        5      0         1        0        0
 8 b     1             3      4       21      0         1        0        1
 9 b     1             4      3        3      0         0        0        0
10 b     1             4      2        6      0         1        0        0
# ℹ 4,860 more rows
# ℹ 56 more variables: occupspecific <dbl>, occupbroad <dbl>,
#   workinschool <dbl>, email <dbl>, computerskills <dbl>, specialskills <dbl>,
#   firstname <chr>, sex <chr>, race <chr>, h <dbl>, l <dbl>, call <dbl>,
#   city <chr>, kind <chr>, adid <dbl>, fracblack <dbl>, fracwhite <dbl>,
#   lmedhhinc <dbl>, fracdropout <dbl>, fraccolp <dbl>, linc <dbl>, col <dbl>,
#   expminreq <chr>, schoolreq <chr>, eoe <dbl>, parent_sales <dbl>, …

The Columns We’ll Need

call equals 1 if resume elicits a an email or telephone callback for an interview
sex equals f for female, m for male
race equals b for black, w for white
computerskills equals 1 if resume says applicant has computer skills
education level of education on resume
yearsexp years of experience on resume
ofjobs number of previous jobs on resume

Keep only the columns we need

bm <- bm |> 
  select(sex, race, call, computerskills, education, yearsexp, ofjobs)
bm

# A tibble: 4,870 × 7
   sex   race   call computerskills education yearsexp ofjobs
   <chr> <chr> <dbl>          <dbl>     <dbl>    <dbl>  <dbl>
 1 f     w         0              1         4        6      2
 2 f     w         0              1         3        6      3
 3 f     b         0              1         4        6      1
 4 f     b         0              1         3        6      4
 5 f     w         0              1         3       22      3
 6 m     w         0              0         4        6      2
 7 f     w         0              1         4        5      2
 8 f     b         0              1         3       21      4
 9 f     b         0              1         4        3      3
10 m     b         0              0         4        6      2
# ℹ 4,860 more rows

💪 What is this paper about?

Skim the introduction and conclusion of BM so we can discuss the following:

What research question do BM try to answer?
What data and methodology do they use to answer it?
What do the authors consider to be their key findings?

💪 How was the study carried out?

Skim parts A-D of section II in BM so you can answer the following:

How did the experimenters create their bank of resumes for the experiment?
The experimenters classified the resumes into two groups. What were they and how did they make the classification?
How did the experimenters generate identities for their fictitious job applicants?
What is the treatment \(D\)? What is the outcome \(Y\)?

Random Assignment

Random assignment of treatment implies that the characteristics of the treatment and control group will be balanced: the same on average.

Is `sex` balanced across `race`?

More of the fake resumes are female than male, but within sex we see that race is approximately balanced as it should be:

bm |> 
  group_by(sex, race) |> 
  summarize(count = n())

# A tibble: 4 × 3
# Groups:   sex [2]
  sex   race  count
  <chr> <chr> <int>
1 f     b      1886
2 f     w      1860
3 m     b       549
4 m     w       575

Remember: names were randomly assigned to resumes.

💪 Are the other variables balanced?

Why do we care about balance in sex, education, ofjobs, and yearsexp across race?
Check that these are indeed balanced across race.
Are computerskills and education balanced across sex? What’s going on here? Hint: see BM section II C.

Balanced Across Race

# balanced across race 
bm |> 
  group_by(race) |> 
  summarize(avg_educ = mean(education), 
            avg_jobs = mean(ofjobs),
            avg_exp = mean(yearsexp))

# A tibble: 2 × 4
  race  avg_educ avg_jobs avg_exp
  <chr>    <dbl>    <dbl>   <dbl>
1 b         3.62     3.66    7.83
2 w         3.62     3.66    7.86

We care about balance because we want to be sure that the perception of race is responsible for any difference in callback rates, not some other factor.

Not balanced across sex!

bm |> 
  group_by(sex) |> 
  summarize(avg_comp = mean(computerskills), 
            avg_educ = mean(education))

# A tibble: 2 × 3
  sex   avg_comp avg_educ
  <chr>    <dbl>    <dbl>
1 f        0.868     3.58
2 m        0.662     3.73

From the paper:

We use nearly exclusively female names for administrative and clerical jobs to increase callback rates.

💪 Compute Callback Rates

Compute the following and compare to Table 1 of BM:

Callback rate for all resumes in bm.
Callback rates by race.
Callback rates for each combination of race and sex.
What do your results suggest?

“Black” names cause fewer callbacks

# callback rate for all resumes
bm |>  
  summarize(avg_callback = mean(call))

# A tibble: 1 × 1
  avg_callback
         <dbl>
1       0.0805

# callback rates for black versus white 
bm |>  
  group_by(race) |> 
  summarize(avg_callback = mean(call))

# A tibble: 2 × 2
  race  avg_callback
  <chr>        <dbl>
1 b           0.0645
2 w           0.0965

Confidence Interval / Test

t.test(call ~ race, data = bm)


    Welch Two Sample t-test

data:  call by race
t = -4.1147, df = 4711.6, p-value = 3.943e-05
alternative hypothesis: true difference in means between group b and group w is not equal to 0
95 percent confidence interval:
 -0.04729503 -0.01677067
sample estimates:
mean in group b mean in group w 
     0.06447639      0.09650924

Similar effect for male and female

# callback rates by sex and race
bm |>  
  group_by(sex, race) |> 
  summarize(avg_callback = mean(call))

# A tibble: 4 × 3
# Groups:   sex [2]
  sex   race  avg_callback
  <chr> <chr>        <dbl>
1 f     b           0.0663
2 f     w           0.0989
3 m     b           0.0583
4 m     w           0.0887

Discussion

Highly influential and successful example of a randomized controlled trial.
What was actually randomized here?
How should we interpret the causal effect?
Internal versus external validity?
Fryer & Levitt (2004)