Tutorial #3 - Instrumental Variables

Treatment Effects - The Basics

Francis J. DiTraglia

University of Oxford

Part I - “Textbook” IV

The Colonial Origins of Comparative Development

  • Based on Acemoglu, Johnson, and Robinson (2001) (AJR).
  • The data for the paper are available at https://ditraglia.com/data/ajr.dta.
  • Notice that ajr.dta is a STATA file; I’ll show you how to open this file in R in an upcoming slide.

💪 What is this paper about?

Skim the abstract, introduction and conclusion of AJR so we can discuss the following:

  1. What is the main question that AJR try to answer?
  2. What is AJR’s key theory?
  3. For \(Z\) to be a valid instrument, it must be relevant and exogenous. What do these conditions mean in the context of AJR? Can either be checked?

Loading the Data

library(haven) # for the read_dta() to open STATA files
ajr <- read_dta('http://ditraglia.com/data/ajr.dta')

ajr
# A tibble: 62 × 14
   longname  shortnam   mort logmort0  risk loggdp latitude neoeuro  asia africa
   <chr>     <chr>     <dbl>    <dbl> <dbl>  <dbl>    <dbl>   <dbl> <dbl>  <dbl>
 1 Angola    AGO      280        5.63  5.36   7.77   0.137        0     0      1
 2 Argentina ARG       68.9      4.23  6.39   9.13   0.378        0     0      0
 3 Australia AUS        8.55     2.15  9.32   9.90   0.300        1     0      0
 4 Burkina … BFA      280        5.63  4.45   6.85   0.144        0     0      1
 5 Banglade… BGD       71.4      4.27  5.14   6.88   0.267        0     1      0
 6 Bolivia   BOL       71        4.26  5.64   7.93   0.189        0     0      0
 7 Brazil    BRA       71        4.26  7.91   8.73   0.111        0     0      0
 8 Canada    CAN       16.1      2.78  9.73   9.99   0.667        1     0      0
 9 Chile     CHL       68.9      4.23  7.82   9.34   0.333        0     0      0
10 Cote d'I… CIV      668        6.50  7      7.44   0.0889       0     0      1
# ℹ 52 more rows
# ℹ 4 more variables: other <dbl>, rainmin <dbl>, meantemp <dbl>, malaria <dbl>

Variable Descriptions

  • loggdp is our outcome variable \(Y\)
  • risk is our treatment of interest \(D\) (not binary)
  • logmort0 is the instrument \(Z\) (not binary either)

Warning

The larger the value of risk, the more protection a country has against expropriation. In other words, large values of risk indicate better institutions.

💪 Exercise: OLS Regression

  1. Regress loggdp on risk and store the result in an object called ols.
  2. Display the results of the previous part. Can we interpret them causally? Why or why not?
  3. Calculate the slope from a linear regression of \(Y\) on \(Z\).
  4. Calculate the slope from a linear regression of \(D\) on \(Z\).
  5. Use the results of the preceding to construct the IV estimate.
library(broom)
ols <- lm(loggdp ~ risk, ajr)
tidy(ols)
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    4.73     0.415      11.4  1.15e-16
2 risk           0.505    0.0623      8.11 3.24e-11
# "Reduced Form" regression of Y on Z
rf <- reduced_form <- lm(loggdp ~ logmort0, ajr)
tidy(rf)
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   10.6      0.382      27.8  5.44e-36
2 logmort0      -0.561    0.0789     -7.10 1.66e- 9
# "First Stage" regression of D on Z
fs <- lm(risk ~ logmort0, ajr)
tidy(fs)
# A tibble: 2 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    9.39      0.633     14.8  1.00e-21
2 logmort0      -0.620     0.131     -4.74 1.37e- 5
# IV estimate
coef(rf)[2] / coef(fs)[2]
 logmort0 
0.9052929 
# An alternative, equivalent way to calculate the IV 
library(tidyverse)
ajr |> 
  summarize(IV = cov(loggdp, logmort0) / cov(risk, logmort0))
# A tibble: 1 × 1
     IV
  <dbl>
1 0.905

A Better Way to Run IV

The iv_robust() function from estimatr provides a variety of standard error options and also support fixed effects:

library(estimatr)
iv <- iv_robust(loggdp ~ risk | logmort0, data = ajr, 
# I use 'classical' standard errors here to match the paper 
                se = 'classical')

library(modelsummary)
modelsummary(list(ols, iv), 
             gof_omit = 'Log.Lik|R2 Adj.|AIC|BIC|F', 
             fmt = 2)

A Better Way to Run IV

 (1)   (2)
(Intercept) 4.73 2.13
(0.41) (1.01)
risk 0.51 0.91
(0.06) (0.16)
Num.Obs. 62 62
R2 0.523 0.195
RMSE 0.71 0.92

Why is the IV Estimate Larger?

 (1)   (2)
(Intercept) 4.73 2.13
(0.41) (1.01)
risk 0.51 0.91
(0.06) (0.16)
Num.Obs. 62 62
R2 0.523 0.195
RMSE 0.71 0.92

\[ \beta_{\text{OLS}} = \beta + \frac{\text{Cov}(D,U)}{\text{Var}(D)}, \quad \beta_{\text{IV}} = \beta + \frac{\text{Cov}(Z, U)}{\text{Var}(Z)} \]

Part II - Local Average Treatment Effects

💪 Job Training Partnership Act (JPTA)

  1. Read paragraph 2 of the Introduction and skim Section 4.1 of Abadie, Angrist & Imbens (2002).
  2. Describe the empirical setting. What is the treatment of interest and what is the outcome? What is the instrument?
  3. Is there non-compliance in this example? If so, is it one-sided or two-sided?

Variable Descriptions: jtpa.csv

  • earnings total earning over the 30 months following random assignment
  • assignmt equals 1 if you were assigned to the treatment group, zero otherwise
  • training equals 1 if you actually enrolled in training, zero otherwise
  • sex equals 1 for male and 0 for female

💪 Exercise

  1. Load the data from my website: https://ditraglia.com/data/jpta.csv.
  2. Replicate the “IV estimates from a model without covariates” given in the very last paragraph of Section 4.1 from Abadie, Angrist & Imbens (2002).
  3. How do these compare to the “reduced-form” assignment effects? (Note that we called these ITT effects in the lecture.) Explain.
jtpa <- read_csv('https://ditraglia.com/data/jtpa.csv') 

jtpa |> 
  mutate(male = (sex == 1)) |> 
  rename(d = training, z = assignmt, y = earnings) |> 
  group_by(male) |> 
  summarize(fs = cov(d, z) / var(z), 
            rf = cov(z, y) / var(z), 
            iv = cov(z, y) / cov(d, z))
# A tibble: 2 × 4
  male     fs    rf    iv
  <lgl> <dbl> <dbl> <dbl>
1 FALSE 0.640 1243. 1942.
2 TRUE  0.612 1117. 1825.