Applied Class #5 - Gaussian MTE Model

Introduction

This practical session is based on Heckman, Tobias, & Vytlacil (2001) and Angrist (2004). Throughout this class, we will work with the following model: Y=(1D)Y0+DY1D=1{γ0+γ1Z>V}Y0=μ0+U0Y1=μ1+U1[VU0U1]Normal(0,Σ),Σ=[1σ0ρ0σ1ρ1σ02σ01σ12]ZBernoulli(q),indep. of (V,U0,U1) In real life, we would observe only (Y,D,Z) but in some of the exercises below we will also work with the unobserved variables (Y0,Y1,V) directly.

Exercises

  1. Write an R function that uses rmvnorm() from the mvtnorm package to simulate n iid draws of (Y0,Y1,V,Z,D) from the multivariate normal distribution described above, fixing μ0=μ1=0, σ0=σ1=1, σ01=1/2, and q=1/2. Your function should take five arguments—n, rho0, rho1, gamma0, and gamma1—and return a data frame with named columns Y0, Y1, V, Z, D, and Y. In real life we can only observe (Y,D,Z) but fortunately for us, simulations aren’t real life! In this example there is no need to store (U0,U1) since they coincide with (Y0,Y1) when μ0=μ1=0.

  2. Use your function from the preceding part to make and store 1,000,000 simulation draws with ρ0=0.5, ρ1=0.2, γ0=1 and γ1=1.5. Use your simulation draws to calculate the LATE, TOT, and TUT at these parameter values.

  3. In the previous exercise you used simulation to approximate the values of the LATE, TOT, and TUT at particular parameter values. Use the following formulas from the lecture slides to check your simulations against the analytical formulas that apply in the case where q=1/2 and σ0=σ1=1. Recall that we use the shorthand δ=(ρ1ρ0). LATE=δ[φ(γ0+γ1)φ(γ0)Φ(γ0+γ1)Φ(γ0)]TOT=δ[φ(γ0)+φ(γ0+γ1)Φ(γ0)+Φ(γ0+γ1)]TUT=δ[φ(γ0)+φ(γ0+γ1){1Φ(γ0)}+{1Φ(γ0+γ1)}]

  4. Consult Section 2.1 of Angrist (2004). Angrist’s notation is slightly different from ours: what he calls η we call V, what he calls ρ01 we call γ, and what he calls TT we call TOT. Other than that, everything is the same. Figure 1 of this paper plots the TOT and LATE over a range of values for P(D=1|Z=0). Use the formulas from the previous question to reproduce panel (a) of this figure. Add in the TUT effect for good measure if you’re feeling ambitious! You’ll need to read a few short extracts of the paper to determine how to set the parameters γ0 and γ1 when making your plot.

  5. In this problem you will apply the Heckman two-step estimator to the simulated values of (Y,D,Z) from question 2 above to estimate the parameters μ1, μ1, δ0σ0ρ0 and δ1σ1ρ1. In the simulation we know that μ1=μ0=0, σ0=σ1=1, ρ0=0.5 and ρ1=0.2, so you’ll check the estimator against these values. Follow these steps:

    1. Use the simulated values of (D,Z) to estimate (γ0,γ1). Call your estimates (γ^0,γ^0). Check that your estimates match the true values that you used to generate the data: γ0=1 and γ1=1.5.
    2. Define the shorthand λ^(z)=φ(γ^0+γ^1z)/[1Φ(γ^0+γ^1z)]. Add a column called lambda to the dataframe containing your simulated values of (Y,D,Z) that evaluates the function λ^() at the observed values of Z.
    3. Define the shorthand κ^(z)=φ(γ^0+γ^1z)/Φ(γ^0+γ^1z). Add a column called kappa to the dataframe containing your simulated values of (Y,D,Z) that evaluates the function λ^() and at the observed values of Z.
    4. For the subset of observations with D=0, run a regression of Y on lambda and a constant. The intercept should be approximately equal to μ0 and the slope approximately equal to δ0=σ0ρ0.
    5. For the subset of observations with D=1, run a regression of Y on kappa and a constant. The intercept should be approximately equal to μ1 and the slope approximately equal to δ1=σ1ρ1.