class: center, middle, inverse, title-slide .title[ # Lecture 01 - Regression Discontinuity Designs
in Development Economics ] .subtitle[ ## Theory and Practice ] .author[ ### Bruno Conte ] .institute[ ### Barcelona School of Economics ] .date[ ### 08/Jul/2024 ] --- <style> g { color: rgb(0,130,155) } o { color: rgb(240,138,33) } </style> # RDD in Development Economics: This Course - Introduce conceptual and practical aspects of <o>**regression discontinuity designs**</o> - What are regression discontinuity designs (RDD)? - How is it used in <span style="color: rgb(0,130,155)">development economics</span> research? - What is the best practice, and which tools do we need to work with it? -- - Main goal: <u>concepts + tools</u> = practice with real-world applications - Concepts: theory behind causal inference and RDD assumptions - Tools: practical applications with `Stata` and/or ``R/RStudio`` --- # RDD in Development Economics: This Course - <u>Main philosophy</u>: a course by a development economist interested in <g>policy evaluation</g> - Rather than a course by an econometrician/statistician! - This course benefitted from many people - Joan Llull, Vincenzo Scrutinio, Diogo Britto, Leandro Magalhães, **Matias Cattaneo et al. (most importantly)**, ... --- # RDD in Development Economics: This Course <u>This course:</u> how to use RDD to empirically answer <o>**research questions of our interest**</o>. .pull-left[ **You will learn** - Concepts and theory of RDD - Basic `R`/`Stata` programming - Practioneer-oriented best RDD practices - Introductory (eventually) data visualization ] .pull-right[ **You will not learn** - All state-of-art RDD methods - Inference-related RD aspects (e.g., SE) - To write an efficient `R`/`Stata` code<sup>*</sup> - To solve every possible RDD/data problem<sup>*</sup> ] .footnote[ [*] This is up to you. ] --- # RDD in Development Economics: This Course ## Main References 1. Cattaneo, M.D., Idrobo, N. and Titiunik, R., 2019. A practical introduction to regression discontinuity designs: Foundations. *Cambridge University Press*. 2. Cattaneo, M.D., Idrobo, N. and Titiunik, R., 2023. A practical introduction to regression discontinuity designs: Extensions. arXiv preprint arXiv:2301.08958. ## Usefull reading 3\. Lee, D.S. and Lemieux, T., 2010. Regression discontinuity designs in economics. *Journal of economic literature*, 48(2), pp.281-355. 4\. Angrist, J.D. and Pischke, J.S., 2009. Mostly harmless econometrics: An empiricist's companion. Princeton university press. --- # RDD in Development Economics: About us ## Instructor: Bruno Conte - Assistant Professor of Economics, Universitat Pompeu Fabra since Jan 2024 - Previous positions: Università di Bologna, World Bank, IGC/LSE - Research interests: Development and environment economics - Tools: Causal inference, international trade, and spatial economics - Approach: <o>theory + empirics</o> = answers to policy-relevant research questions -- ## What about you? --- # RDD in Development Economics: Schedule 1. Introduction to treatment effects and RDD **[08/Jul/2024]** - Potential outcomes, treatment outcomes, and selection bias - Introduction to RDD and survey of applications <br> <br> 2. Sharp RD Design (Basics) **[09/Jul/2024]** 3. Sharp RD Design (Extensions) + Fuzzy RD Design **[10/Jul/2024]** 4. Spatial Regression Discontinuities **[11/Jul/2024]**<br> <br> 5. Additional RDD Methods **[12/Jul/2024]** --- # RDD in Development Economics: Logistics - Lectures: every Mon to Fri from 11:30-13:30 - 1 hour + (optional) 10' break + 45 mins (potentially mixed backgroud: **be patient**!) - Course material: [<u>webpage</u>](https://brunoconteleite.github.io/07-rdd-bse/) and [<u>syllabus</u>](https://www.dropbox.com/scl/fi/pckgr26fquyk85irt8we4/24DE05_RDD_Development_Syllabus.pdf?rlkey=pf9s2uov6dmpg9803960ob1y8&st=sdf04mgu&raw=1) ## Practical sessions: - Every Mon to Fri from 16:15-17:45 (your own computer + `Stata` and/or `RStudio`) ## End-of-course evaluation (ECTS-equivalent): - Final "project" with conceptual and replication exercises (more about it next) -- - Any <o>**questions**</o>? --- # RDD in Development Economics: Logistics Final project/end-of-course evaluation: <o>**not mandatory!!**</o> Who is interested in it? - Students that need ECTS-equivalency (i.e., credits for the course) - Important: it is OK if you requested equivalence but do not need it in practice! - Students that feedback on skills aquired during the course -- ## Structure At the end of practical classes, I will provide (theoretical and practical) exercises - Final project: compilation of the answers of all exercises in <u>one single document</u>! - Ideal setting: done progressively, handed in on Friday July 12 --- <br><br><br><br><br><br> .center[ # Getting started: why causal inference for # development economics? ] --- # Intro: Why Causal Inference for Development? Economists use **causal inference** to give empirical content to economic relations > *<g>Causal inference</g> consists of establishing a cause-effect relationship between two economic outcomes. It aims at answering "what if" type of questions* It is an important tool for <o>**policy evaluation**</o>, especially in low-income, developing contexts -- - Do micro-credit programs foster development in rural economies? - What are the consequences of cash transfer policies in developing settings? - Can family-targetted policies attenuate gender inequality and/or intimate violence? Governments and institutions invest billions every year in such (and other) policies - Causal inference tools can quantify their effectiveness (cost-benefit) --- # Causal Inference: Structural versus Treatment Effects > *The <o>structural approach</o> fits theory-based (economic) models of individual behaviour to data for policy evaluation (through simulations)* - **Main advantage**: versatility - Use structural models as a "mini-lab" for counterfactual simulations (ex-ante and -post) - Main limitations: <g>stylized assumptions</g>, complexity, and computational cost -- - <u>Examples</u>: - Income transfers and intimate violence (Ramos, 2016) - School construction and rural-urban market outcomes (Hsiao, 2023) - Rural insurance and agricultural productivity (Pietrobon, 2024) --- # Causal Inference: Structural versus Treatment Effects This course focus on the <o>**treatment effects**</o> approach > *Treatment effects evaluate the ex-post impact of an existing policy by comparing targetted (treatment) individuals with untargetted (control) ones* - **Main advantage**: transparency (it is comparison exercise that needs little assumptions) - It revolutionized evidence-based economics (Angrist and Pischke, 2010) - Main limitations: limited counterfactual answers, external validity and policy-invariance -- - <u>Examples</u> (see Michal Kremer, Esther Duflo, Abhijit Banerjee, ...) - Minimum wages policies and employment (Card and Krueger, 1993) - School construction and rural-urban market outcomes (Duflo, 2001) - Income transfers and development (Angelucci and De Giorgi, 2009) --- # Treatment Effects and Causal Inference: Crucial Questions Again, treatment effects = comparison of outcome distributions (treated versus control) > *Main challenge for causal interpretation: ensure that the <g>control group is a reasonable counterfactual</g> for the outcomes of the treatment group in the absence of treatment* Causal inference tools overcome this by addressing the following questions: -- 1. What is the causal relation of interest? 2. Which <o>research design</o> (or identification strategy) captures the causal effect of interest? - Randomized control trials, instrumental variables, regression discontinuities, ... 3. What is the ideal inference method? - Linear regressions, GMM, ... --- <br><br><br><br><br><br> .center[ # Basics of Treatment Effects: # Potential Outcomes and Selection Bias ] --- # Treatment Effects, Potential Outcomes, and Selection Bias Consider a population of `\(i\)` individuals potentially treated by a policy. Then, define: - `\(Y_{i}(1) \equiv\)` individual `\(i\)`'s outcome if treated, and `\(Y_{i}(0)\)` otherwise - `\(T_i \equiv\)` treatment indicator (i.e., equal to one if `\(i\)` is treated) `\(Y_{i}(1), Y_{i}(0)\)` are (unobserved) **potential outcomes** that relate to observed outcomes `\(Y_i\)` as `$$Y_i = T_i Y_{i}(1) + (1- T_i) Y_{i}(0)$$` Hence, it is impossible to infer *individual treatment effects* `$$\tau_i = Y_{i}(1) - Y_{i}(0)$$` --- # Treatment Effects, Potential Outcomes, and Selection Bias Hence, we focus on infering different **features of the distribution** of `\(\tau_i\)` - <u>Average treatment effect (ATE)</u>: `$$\tau_{ATE} = \mathbb{E}\left[ Y_{i}(1) - Y_{i}(0) \right]$$` - <u>Average treatment on the treated (ATT)</u>: `$$\tau_{ATT} = \mathbb{E}\left[ Y_{i}(1) - Y_{i}(0) | T_i = 1 \right]$$` These (theoretical) causal effects have different (policy) implications - Does the treated group `\((T_i = 1)\)` <g>differ systematically</g> from the control? - Does that affect the policy implications of the `\(\tau_{ATT}\)` effects? --- # Treatment Effects, Potential Outcomes, and Selection Bias In practice, sample comparisons are made with <g>observed outcomes</g> `\(Y_i\)`. Define `$$\tau^S = \bar{Y}_T - \bar{Y}_C = \frac{1}{N_1} \sum_{i = 1}^N T_i Y_i - \frac{1}{N_0} \sum_{i = 1}^N (1-T_i) Y_i$$` as the (sample) comparsion between treated/control averages and the *population counterpart* `$$\tau = \mathbb{E}\left[ Y_i | T_i = 1 \right] - \mathbb{E}\left[ Y_i | T_i = 0 \right]$$` -- One can show that `\(\tau\)` can be rewritten as `$$\tau=\underbrace{\mathbb{E}\left[ Y_{i}(1) - Y_{i}(0) | T_i = 1 \right]}_{\tau_{ATT}} + \underbrace{\mathbb{E}\left[ Y_{i}(0) | T_i = 1 \right] - \mathbb{E}\left[ Y_{i}(0) | T_i = 0 \right]}_{\text{selection bias}}$$` --- # Treatment Effects, Potential Outcomes, and Selection Bias > *<g>Selection bias</g> is the systematic differences between treatment and control groups in terms of potential outcomes in the absence of treatment* <u>Motivating examples</u>: - Does access to schools improve labor market outcomes (Duflo, 2001)? - Which type of individuals enroll in university? - Does rural microfinance affect household-level consumption smoothing, asset holdings, or occupational mobility? (Kaboski and Townsend, 2012; Banerjee et al., 2015)? - Which villages are covered by (micro) financial institutions? - ... How to identify `\(\tau_{ATT}\)` with `\(\tau\)` (i.e., with a comparison of sample averages)? --- <br><br><br><br><br><br> .center[ # Identification of Treatment Effects: # Basic Assumptions ] --- # Assumptions for Identification: Independence Obviously, in the absence of selection bias, `\(\tau = \tau_{ATT} \text{ (and eventually} = \tau_{ATE} \text{)}\)` - Which are the needed identification assumptions? Simplest case: <g>**independence of potential outcomes**</g> `$$Y_{i}(1), Y_{i}(0) \quad \bot \quad T_i \quad \forall i$$` Formally, it is represented by the equivalence on cummulative distributions `\(F\)` `$$F(Y_{i}(1) | T_i = 1) = F(Y_{i}(1)) \\ F(Y_{i}(0) | T_i = 0) = F(Y_{i}(0))$$` Under this assumption, `\(\tau_{ATE} = \tau_{ATT} = \tau \rightarrow \tau^S\)` is an unbiased estimate of `\(\tau_{ATE}\)` --- # Assumptions for Identification: Independence <o>Randomized Control Trials</o> ensure unconditional independence (see Banerjee, Duflo, Kremer) <u>Illustration</u>: Angelucci and Di Giorgi (2009) "*How Do Cash Transfers Affect [..] Consumption?*" .center[<img src="figs/angelucci1.png" style="width: 75%" />] --- # Assumptions for Identification: Conditional Independence Another case is the <g>**conditional independence assumption**</g> `$$Y_{i}(1), Y_{i}(0) \quad \bot \quad T_i | X_i \quad \forall i,$$` where `\(X_i\)` is a vector of covariates. Formally, conditional independence is equivalent to `$$F(Y_{i}(j) | X_i) = F(Y_{i}(j) | T_i = j, X_i) = F(Y_{i} | T_i = j, X_i) \quad \text{for } j = 0,1$$` Here, "controlling for `\(X_i\)`" identifies `\(\tau_{ATE}\)`: `$$\tau_{ATE} = \mathbb{E} \left[ Y_{i}(1) - Y_{i}(0) \right] = \int \left( \mathbb{E} \left[ Y_{i} | T_i = 1, X_i \right] - \mathbb{E} \left[ Y_{i} | T_i = 0, X_i \right] \right) dF(X_i)$$` This setting is also know as matching: contrasting treated and controls for each value `\(X_i\)` - Usual approach in <o>natural experiments</o> (quasi-random variation in `\(T_i\)`) --- # Assumptions for Identification: Lack of Independence and IV Finally, we might face the situation where our policy `\(T_i\)` is not (conditionally) exogenous `$$Y_{i}(1), Y_{i}(0) \quad \not\perp \quad T_i | X_i \quad \forall i,$$` In this case, we need a `\(D_i\)` variable that does satisfy the <g>independence assumption</g> `$$Y_{i}(1), Y_{i}(0) \quad \perp \quad D_i | X_i \quad \forall i$$` and also satisfies the <g>relevance condition</g> `$$D_i \quad \not\perp \quad T_i | X_i \quad \forall i,$$` --- # Assumptions for Identification: Lack of Independence and IV `\(D_i\)` is a <o>intrumental variable</o> (identifies treatment effects for a subset of the population) `\begin{align} Y_i &= \tau T_i + U_i \quad \quad \text{ } \text{ [structural model]} \\ Y_i &= \rho D_i + V_i \quad \quad \text{ [reduced form]} \\ T_i &= \alpha D_i + O_i \quad \quad \text{[first stage]} \\ \end{align}` One can show that `$$\tau_{IV} = \frac{\rho}{\alpha} = \frac{\mathbb{E}\left[Y_i | D_i = 1\right] - \mathbb{E}\left[Y_i | D_i = 0\right]}{\mathbb{E}\left[T_i | D_i = 1\right] - \mathbb{E}\left[T_i | D_i = 0\right]} \equiv \text{ Wald Estimand}$$` - This will be <g>important for fuzzy regression</g> descontinuity designs later on --- <br><br><br><br><br><br> .center[ # Treatment Effects and Linear Regressions ] --- # Treatment Effects and Linear Regressions The potential outcomes notation has a close link with <g>linear regression models</g> `\begin{align} Y_i &= T_i Y_{i}(1) + (1-T_i) Y_{i}(0) \equiv \underbrace{\mathbb{E} \left[ Y_{i}(0) \right]}_{\beta_0} + \underbrace{\left[ Y_{i}(1) - Y_{i}(0) \right]}_{\beta_i} T_i + \underbrace{Y_{i}(0) - \mathbb{E} \left[ Y_{i}(0) \right]}_{U_i} \end{align}` An OLS regression estimates `\(\mathbb{E} \left[ \bar{Y}_T - \bar{Y}_C\right] = \mathbb{E} \left[ Y_{i}(1) - Y_{i}(0) \right] = \tau_{ATE}\)` if `\((Y_i, T_i) \not\perp U_i\)` -- This is a close analoge to selection bias. To see that, suppose `$$Y_i := \beta_0 + \beta T_i + \gamma C_i + U_i \equiv \beta_0 + \beta T_i + \bar{U}_i$$` One can show that the unconditional OLS estimate `\(\hat{\beta}\)` has an <o>omitted variable bias</o> `$$\hat{\beta} \overset{p}{\to} \beta + \gamma \frac{var(C_i,T_i)}{var(T_i)} \neq \beta$$` --- <br><br><br><br><br><br> .center[ # Taking stock: why Treatment Effects # for RD Designs in Development Economics? ] --- # Taking Stock: Treatment Effects and RDD This course is about the <o>potentials and applications of RDD</o> in development economics - How does it relate to treatment effects and all the assumptions above? In developing settings, policies or experiments such that `\(Y_{i}(1), Y_{i}(0) \perp T_i | X_i\)` <g>are rare, costly, and/or ethically unfeasible</g> - It is also hard to find the right `\(D_i\)` instrument (again, more on that later) --- # RDD and Treatment Effects **Regression discontinuities** can be an alternative in these settings - It exploits the fact that policy elegibility is usually <g>threshold-based</g> - This creates a (quasi-) natural experiment around the threshold - Just above/below, the threshold individuals should be similar -- From 1990s onwards, part of the <o>credibility revolution</o> (Angrist and Pischke, 2010) as a versatile tool for causal inference "Explosion" of RD studies (causally) evaluating policies in economics > *Lee and Lemieux (2010) "<g>Regression discontinuity designs in economics</g>", Journal of Economic Literature* --- # RDD and Treatment Effects Comprehensive survey on (by then, but still) state-of-art RDD methods and applications > *Cattaneo, M.D., Idrobo, N. and Titiunik, R., (2019) "A practical introduction to regression discontinuity designs: <o>Foundations</o>", Cambridge University Press* - State-of-art techniques and tools [<u>`rd` packages</u>](https://rdpackages.github.io/) > *Cattaneo, M.D., Idrobo, N. and Titiunik, R., (2023) "A practical introduction to regression discontinuity designs: <o>Extensions</o>"* - Extensions on theory and tools (local randomization, fuzzy RD, ...) --- # Lee and Lemieux (2010): RDD Examples .center[<img src="figs/lee1.png" style="width: 75%" />] --- .center[<img src="figs/lee7.png" style="width: 75%" />] .center[<img src="figs/lee2.png" style="width: 75%" />] --- .center[<img src="figs/lee7.png" style="width: 75%" />] .center[<img src="figs/lee3.png" style="width: 75%" />] --- .center[<img src="figs/lee7.png" style="width: 75%" />] .center[<img src="figs/lee4.png" style="width: 75%" />] --- .center[<img src="figs/lee7.png" style="width: 75%" />] .center[<img src="figs/lee5.png" style="width: 75%" />] .center[<img src="figs/lee6.png" style="width: 75%" />] --- <br><br><br><br><br><br> .center[ # References ] --- ## References - Angelucci, M. and De Giorgi, G., 2009. Indirect effects of an aid program: how do cash transfers affect ineligibles' consumption?. *American economic review*, 99(1), pp.486-508. - Angrist, J.D. and Pischke, J.S., 2010. The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. *Journal of economic perspectives*, 24(2), pp.3-30. - Banerjee, A., Duflo, E., Glennerster, R. and Kinnan, C., 2015. The miracle of microfinance? Evidence from a randomized evaluation. American economic journal: Applied economics, 7(1), pp.22-53. - Card, D. and Krueger, A.B., 1993. Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania. - Campbell, D.T. and Cook, T.D., 1979. Quasi-experimentation. Chicago, IL: *Rand Mc-Nally*, 1(1), pp.1-384. --- # References - Hirano, K., Imbens, G.W. and Ridder, G., 2003. Efficient estimation of average treatment effects using the estimated propensity score. *Econometrica*, 71(4), pp.1161-1189. - Hsiao, A., 2023. Educational Investment in Spatial Equilibrium: Evidence from Indonesia. Working Paper. - Kaboski, J.P. and Townsend, R.M., 2012. The impact of credit on village economies. *American Economic Journal: Applied Economics*, 4(2), pp.98-133. - Lee, D.S. and Lemieux, T., 2010. Regression discontinuity designs in economics. Journal of economic literature, 48(2), pp.281-355. - Pitt, M.M. and Khandker, S.R., 1998. The impact of group-based credit programs on poor households in Bangladesh: Does the gender of participants matter?. *Journal of political economy*, 106(5), pp.958-996. - Pietrobon, D., 2024. The dual role of insurance in input use: Mitigating risk versus curtailing incentives. *Journal of Development Economics*, 166, p.103203. --- # References - Ramos, A., 2016. Household decision making with violence: Implications for conditional cash transfer programs. *Working Paper*