Practice 02 - Regression Discontinuity Designs in Development Economics

class: center, middle, inverse, title-slide

.title[
# Practice 02 - Regression Discontinuity Designs in Development Economics
]
.subtitle[
## Theory and Practice
]
.author[
### Bruno Conte
]
.institute[
### Barcelona School of Economics
]
.date[
### 09/Jul/2024
]

---

# RDD in Development Economics: Practical Class 02

Today's goal: put in practice the basic RD Sharp concepts with `rdrobust`

- Recall: Best-practice RD work consists of <g>3 fundamental steps</g>

1. <o>RD Plots:</o> visually inspection of discontinuity in observed outcomes

2. <g>RD Point Estimation:</g> done with local polynomial approximation (i.e., continuity-based approach)

3. ~~RD Validation~~ (tomorrow!)

**Material** for this class [here](https://www.dropbox.com/scl/fi/7llotj2ke1shw6p5qpgej/00_practice02.zip?rlkey=md2li6ergf7ko74x2tulz7fx2&st=go5xv2sb&dl=1). Installation:

```r
install.packages('rdrobust')
```

```stata
net install rdrobust, from(https://raw.githubusercontent.com/rdpackages/rdrobust/master/stata) replace
```

---

.center[

# RD Plots

]

---

# Practical Class 02: RD Plots

.pull-left[
**RD plots**: uncovers "hidden" data patterns

<g>Meyersson (2014):</g> Islamic representation `$\rightarrow$` effects in female education?

- Strategy: RD disc. on close elections

- Units: municipalities in Turkey

- Score: margin  Islamic party victory

- Raw data: <o>slight negative relationship</o>

- RD Plot: discontinuous, <g>positive effect</g>!

]
.pull-right[
<img src="figs/meyersson1.png" style="width: 100%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
**RD plots**: uncovers "hidden" data patterns

<g>Meyersson (2014):</g> Islamic representation `$\rightarrow$` effects in female education?

- Strategy: RD disc. on close elections

- Units: municipalities in Turkey

- Score: margin  Islamic party victory

- Raw data: <o>slight negative relationship</o>

- RD Plot: discontinuous, <g>positive effect</g>!

]
.pull-right[
<img src="figs/meyersson2.png" style="width: 100%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
Raw plots, loading `data_meyersson.csv`

```r
dset <- read.csv('data_meyersson.csv')
plot(dset$X, dset$Y)
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-4-1.png" width="90%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
Raw plots, loading `data_meyersson.csv`

```r
dset <- read.csv('data_meyersson.csv')
plot(
 dset$X, 
 dset$Y,
 xlab = "Score", # label x axis
 ylab = "Outcome", # label y axis
 col = 1, # black color
 pch = 20 # circle dots
 )
abline(v=0) # add vertical line X=0
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-6-1.png" width="90%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
`rdplot`: part of `rdrobust` library

```r
install.packages('rdrobust')
library(rdrobust)
dset <- read.csv('data_meyersson.csv')
rdplot(dset$Y, dset$X)
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-8-1.png" width="90%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
`rdplot`: part of `rdrobust` library

```r
install.packages('rdrobust')
library(rdrobust)
dset <- read.csv('data_meyersson.csv')
rdplot(
 dset$Y,
 dset$X,
 nbins = c(20, 20), # number of bins
 binselect = "es", # bin type
 y.lim = c(0,25) # limits y axis
 )
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-10-1.png" width="90%" />
]

---

<g>Important:</g> RD plots store carry out important information (in `Stata`, printed in the console)!

```r
output <- rdplot(dset$Y,dset$X,nbins = c(20, 20),binselect = "es",y.lim = c(0,25))
summary(output)
```

```
## Call: rdplot
## 
## Number of Obs.                 2629
## Kernel                      Uniform
## 
## Number of Obs.                 2314             315
## Eff. Number of Obs.            2314             315
## Order poly. fit (p)               4               4
## BW poly. fit (h)            100.000          99.051
## Number of bins scale              1               1
## 
## Bins Selected                    20              20
## Average Bin Length            5.000           4.953
## Median Bin Length             5.000           4.953
## 
## IMSE-optimal bins                11               7
## Mimicking Variance bins          40              75
## 
## Relative to IMSE-optimal:
## Implied scale                 1.818           2.857
## WIMSE variance weight         0.143           0.041
## WIMSE bias weight             0.857           0.959
```

---

# Practical Class 02: RD Plots (R versus Stata)

## R command

```r
rdplot(dset$Y,dset$X,nbins = c(20, 20),binselect = "es")
```

## Stata equivalent

```stata
rdplot Y X, nbins(20 20) binselect(es)
```

---

# Practical Class 02: RD Plots

<g>Important:</g> method of bin choice `$\rightarrow \neq$` results (`binselect = "es"` vs `binselect = "qs"`)
.center[
<img src="figs/cattaneo22.png" style="width: 80%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
<g>Number of bins:</g> can be optimally retrieved

- Method: minimize MSE

- How to? Omit `binselect`

```r
rdplot(
 dset$Y,
 dset$X,
 nbins = c(20, 20), # number of bins
 binselect = "es", # bin type
 y.lim = c(0,25) # limits y axis
 )
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-16-1.png" width="90%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
<g>Number of bins:</g> can be optimally retrieved

- Method: minimize MSE

- How to? Omit `binselect`

```r
rdplot(
 dset$Y,
 dset$X,
 # nbins = c(20, 20),
 binselect = "es", # bin type
 y.lim = c(0,25) # limits y axis
 )
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-18-1.png" width="90%" />
]

---

```r
summary(output)
```

```
## Call: rdplot
## 
## Number of Obs.                 2629
## Kernel                      Uniform
## 
## Number of Obs.                 2314             315
## Eff. Number of Obs.            2314             315
## Order poly. fit (p)               4               4
## BW poly. fit (h)            100.000          99.051
## Number of bins scale              1               1
## 
## Bins Selected                    11               7
## Average Bin Length            9.091          14.150
## Median Bin Length             9.091          14.150
## 
## IMSE-optimal bins                11               7
## Mimicking Variance bins          40              75
## 
## Relative to IMSE-optimal:
## Implied scale                 1.000           1.000
## WIMSE variance weight         0.500           0.500
## WIMSE bias weight             0.500           0.500
```

---

# Practical Class 02: RD Plots

.pull-left[
<g>Polynomial order:</g> standard is `p = 4`

- What if quadratic?

```r
rdplot(
 dset$Y,
 dset$X,
 nbins = c(20, 20), # number of bins
* p = 2, # polynomial order
 binselect = "es", # bin type
 y.lim = c(0,25) # limits y axis
 )
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-21-1.png" width="90%" />
]

---

# Practical Class 02: RD Plots

.pull-left[
<g>Polynomial order:</g> standard is `p = 4`

- What if linear?

```r
rdplot(
 dset$Y,
 dset$X,
 nbins = c(20, 20), # number of bins
* p = 1, # polynomial order
 binselect = "es", # bin type
 y.lim = c(0,25) # limits y axis
 )
```
]
.pull-right[
<img src="figs/class02/unnamed-chunk-23-1.png" width="90%" />
]

---

# Wrapping Up: RD Plots

<g>RD Plots:</g> crucial ingredient in RD work

- Motivates and illustrates, visually, the experiment in hand

- Flexible and credible implementation with `rdplot`

- More details: Cattaneo et al. (2019), chapter 3

---

.center[

# RD Point Estimation of `$\tau_{SRD}$`

]

---

# Practical Class 02: RD Point Estimation

RD point estimation of `$\tau_{SRD}$` with `rdrobust`

- Approximation of local polynomial in both sides of `$c \rightarrow \hat{\tau}_{SRD} = \hat{\mu}_{+} - \hat{\mu}_{-}$`

## R version

```r
rdrobust(dset$Y, dset$X, kernel = "uniform", p = 1, h = 20)
```
## Stata version

```stata
rdrobust Y X, kernel(uniform) p(1) h(20)
```
Important inputs: polynomial `$p$`, bandwidth `$h$`, and weights `$K()$`

---

# Practical Class 02: RD Point Estimation

.pull-left[
Polynomial `$p$`: local approach `$\rightarrow p \leq 2$`
- Robustness to `$p>2$`
]
.pull-right[
<img src="figs/cattaneo10.png" style="width: 100%" />
]

---

# Practical Class 02: RD Point Estimation

.pull-left[
Polynomial `$p$`: local approach `$\rightarrow p \leq 2$`
- Robustness to `$p>2$`

Kernel function `$K()$`: weights on regression
- Usual triangular (no weights = uniform)
]
.pull-right[
<img src="figs/cattaneo10.png" style="width: 100%" />
]

---

# Practical Class 02: RD Point Estimation

.pull-left[
Polynomial `$p$`: local approach `$\rightarrow p \leq 2$`
- Robustness to `$p>2$`

Kernel function `$K()$`: weights on regression
- Usual triangular (no weights = uniform)
]
.pull-right[
<img src="figs/cattaneo23.png" style="width: 100%" />
]

---

# Practical Class 02: RD Point Estimation

.pull-left[
Polynomial `$p$`: local approach `$\rightarrow p \leq 2$`
- Robustness to `$p>2$`

Kernel function `$K()$`: weights on regression
- Usual triangular (no weights = uniform)

Bandwidth `$h$`: bias-variance trade-off
- Data-driven optimal choice:
`$$h^* = \text{arg}\text{min}_{h} \left[ \text{bias}(\tau_{SRD}(h))^2 + \text{variance}(\tau_{SRD}(h)) \right]$$`
]
.pull-right[
<img src="figs/cattaneo10.png" style="width: 100%" />
]

---

# Practical Class 02: RD Point Estimation

<g>Illustration</g> with Meyerssen (2014)

- Linear and quadratic polynomials

- Uniform kernel (no weights) and triangular

- Bandwidth `$h = 20$` (only elections victory margins within -20 and 20%)

```r
reg <- rdrobust(dset$Y, dset$X, kernel = "uniform", p = 1, h = 20)
summary(reg)
```

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2629
## BW type                      Manual
## Kernel                      Uniform
## VCE method                       NN
## 
## Number of Obs.                 2314          315
## Eff. Number of Obs.             608          280
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  20.000       20.000
## BW bias (b)                  20.000       20.000
## rho (h/b)                     1.000        1.000
## Unique Obs.                    2314          315
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.927     1.235     2.371     0.018     [0.507 , 5.347]     
##         Robust         -         -     1.636     0.102    [-0.582 , 6.471]     
## =============================================================================
```

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2629
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 2314          315
## Eff. Number of Obs.             608          280
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  20.000       20.000
## BW bias (b)                  20.000       20.000
## rho (h/b)                     1.000        1.000
## Unique Obs.                    2314          315
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.937     1.343     2.187     0.029     [0.305 , 5.569]     
##         Robust         -         -     1.379     0.168    [-1.117 , 6.414]     
## =============================================================================
```

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2629
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 2314          315
## Eff. Number of Obs.             608          280
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  20.000       20.000
## BW bias (b)                  20.000       20.000
## rho (h/b)                     1.000        1.000
## Unique Obs.                    2314          315
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.649     1.921     1.379     0.168    [-1.117 , 6.414]     
##         Robust         -         -     0.420     0.674    [-3.969 , 6.135]     
## =============================================================================
```

---

# Practical Class 02: RD Point Estimation

<g>Illustration</g> with Meyerssen (2014)

- Linear polynomial

- Triangular kernel

- ~~Bandwidth `$h = 20$` (only elections victory margins within -20 and 20%)~~

- Data-driven bandwidth `$h$` choice: omit `h=20`, use `bwselect =`
  - `mserd`: equal `$h$` that minimizes MSE
  - `msetwo`: `$\neq h$` in each side

```r
reg <- rdrobust(dset$Y, dset$X, kernel = "triangular", p = 1, bwselect = "mserd")
summary(reg)
```

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2629
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 2314          315
## Eff. Number of Obs.             529          266
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  17.240       17.240
## BW bias (b)                  28.576       28.576
## rho (h/b)                     0.603        0.603
## Unique Obs.                    2311          315
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     3.020     1.427     2.116     0.034     [0.223 , 5.816]     
##         Robust         -         -     1.776     0.076    [-0.309 , 6.276]     
## =============================================================================
```

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2629
## BW type                      msetwo
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 2314          315
## Eff. Number of Obs.             607          267
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  19.967       17.360
## BW bias (b)                  32.279       29.729
## rho (h/b)                     0.619        0.584
## Unique Obs.                    2311          315
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     2.969     1.391     2.134     0.033     [0.243 , 5.695]     
##         Robust         -         -     1.810     0.070    [-0.245 , 6.152]     
## =============================================================================
```

---

# Wrapping Up: RD Point Estimation

<g>RD Estimation:</g> easily implemented with `rdrobust`

- Power toolbox for RD estimation under `$\neq$` parameters

- Parametric choice is important, results can be sensitive

- Yet to be seen: <o>**how to validate**</o> an RD design?

- Tomorrow!

---

.center[

# Take-Home/Assignment Exercises

]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

<g>Part 1:</g> use `data_meyersson.csv`

Replicate the following:

- Standard RD Plot, equally sized bins

]
.pull-right[
<img src="figs/class02/unnamed-chunk-33-1.png" width="90%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

<g>Part 1:</g> use `data_meyersson.csv`

Replicate the following:

- Standard RD Plot, equally sized bins

- Then, zoom in `$h=25$` with a linear polynomial

- And estimate `$\hat{\tau_{SRD}}$` (uniform `$K()$`)

]
.pull-right[
<img src="figs/class02/unnamed-chunk-34-1.png" width="90%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

<g>Part 1:</g> use `data_meyersson.csv`

Replicate the following:

- Standard RD Plot, equally sized bins

- Then, zoom in `$h=25$` with a linear polynomial

- And estimate `$\hat{\tau_{SRD}}$` (uniform `$K()$`)

**Final challenge**: retrieve `$\hat{\mu_{+}} - \hat{\mu_{-}} = \hat{\tau_{SRD}}$`

- With the separate (local) linear regressions (`lm`)

]
.pull-right[
<img src="figs/class02/unnamed-chunk-35-1.png" width="90%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

<g>Part 2:</g> Alix-García et al. (2013) "*The Ecological Footprint of Poverty Alleviation: Evidence from Mexico's Oportunidades Program"*"

- Does this cash transfers led to deforestation?

- Program: <o>threshold based</o> on a marginality index

- Eligible municipalities: those with index `$> -1.2$`

]
.pull-right[
<img src="figs/alix1.png" style="width: 100%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

With `data_alixgarcia.csv`
 - Outcome: `pctdefor`
 - Score: `indice95`
 
<g>RD Plot:</g>

- Optimal number of bins

]
.pull-right[
<img src="figs/class02/unnamed-chunk-36-1.png" width="90%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

With `data_alixgarcia.csv`
 - Outcome: `pctdefor`
 - Score: `indice95`
 
<g>RD Plot:</g>
 - Unrestricted (optimal number of bins)
 - With 20 bins in each side
 
Then, <o>RD Point Estimate</o>
 - Use `$h=1$`
 - Multiply `pctdefor * 1e6`

]
.pull-right[
<img src="figs/class02/unnamed-chunk-37-1.png" width="90%" />
]

---

```
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                58587
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 3639        54948
## Eff. Number of Obs.            3571        12279
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   1.000        1.000
## BW bias (b)                   1.000        1.000
## rho (h/b)                     1.000        1.000
## Unique Obs.                    3639        54948
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional   399.298   151.453     2.636     0.008   [102.455 , 696.141]   
##         Robust         -         -     0.980     0.327  [-197.815 , 593.287]   
## =============================================================================
```

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

<g>Part 3:</g> Lalive (2008) "*How do extended benefits affect unemployment duration? A regression discontinuity approach"*"

- Similar to Britto (2022); benefits `$\leftrightarrow$` unemployment duration

- Austria: extension of beenfits in specific regions

- Stronger effects for women

]
.pull-right[
<img src="figs/lalive1.png" style="width: 100%" />
]

---

# Take-Home/Assignment Questions (2/4)

.pull-left[

With `data_lalive.csv`

- Outcome: `unemployment_duration`

- What is the score `$X_i$` and cutoff `$c$`?

Produce

1\. RD Plot that resembles the paper's

- Warning message: discrete score!
  
2\. What is `$\hat{\tau_{SRD}}$`, and how would you interpret it?

]
.pull-right[

```
## [1] "Mass points detected in the running variable."
```

<img src="figs/class02/unnamed-chunk-39-1.png" width="90%" />
]

---

.center[
# Thank you and

# see you tomorrow!
]

---

# References

- Alix-Garcia, J., McIntosh, C., Sims, K.R. and Welch, J.R., 2013. The ecological footprint of poverty alleviation: evidence from Mexico's Oportunidades program. *Review of Economics and Statistics*, 95(2), pp.417-435.

- Lalive, R., 2008. How do extended benefits affect unemployment duration? A regression discontinuity approach. *Journal of econometrics*, 142(2), pp.785-806.

- Meyersson, E., 2014. Islamic Rule and the Empowerment of the Poor and Pious. *Econometrica*, 82(1), pp.229-269.