Overview

You can download the replication package here (about 1GB).

General structure

The main (root) folder contains the following folders:

  • programs/: contains all Stata/R codes that replicate the results of the paper.

  • data/inputs/: contains the data inputs used by the replication codes.

  • data/outputs/: where all processed data is stored.

  • results/: where all results (figures and tables) are stored.

Besides, it also contains the LOCUST_README.html file.

Replicating the results

Replicating the results consist of running a sequence of R and Stata codes stored in programs/. In what follows, we provide an overview of the sequential workflow of the replication and a brief description of each code. Sections 2 to 4 provide further details.

The sequence of codes are the following:

  1. replicate.R: Runs the following pre-processing routines that clean the raw data:
    • process_FAO.R: processes the FAO-DLIS locust data.
    • process_DHS_clusters.R: processes the DHS GIS data (clusters).
    • process_OMA.R: processes OMA market crop prices’ data.
    • process_DHS_BR.R: processes the DHS Birth Records (BR), which contain all children/mother/household data.
  2. replicate.do: Runs the following .do files that replicate the main empirical results of the paper:
    • reg_prepare.do: prepare the pre-processed data for the regressions.
    • reg_do.do: runs the main regressions of the paper.
  3. desc_stats.R: reproduces figures and maps.

Throughout, several inputs are needed (data and code). We describe them next.

Inputs1

  • programs/load_functions.R: loads several functions used to process data (e.g. distance-calculating algorithms).
  • data/inputs/DHS_BR/: raw birth records of the 2006 (MLBR53FL) and 2012 (MLBR6AFL) waves of the DHS surveys in Mali.
  • data/inputs/DHS_clusters/: raw geocoded data with the coordinates of the DHS clusters of all surveys.
  • data/inputs/DLIS_FAO/\(^*\): raw SWARMS/FAO-DLIS data (Cressman, 1997) with location of locust-related events worldwide.
  • data/inputs/OMA/\(^*\): raw market price data in Mali from OMA.
  • data/inputs/others/access_50k/: geographical data with access (distance) to nearby towns from the GAM project (Uchida and Nelson, 2010).
  • data/inputs/others/countries/countries.R: shapefile of the world in R format.
  • data/inputs/others/grump-v1-settlement-points-rev01-csv/: coordinates of the main towns used as inputs by the GAM project.
  • data/inputs/others/MLI_adm/: shapefiles of levels 1,2,and 3 administrative units in Mali.
  • data/inputs/others/prio/: raw static PRIO-GRID data (Tollefsen et al., 2012).
  • data/inputs/others/spei/: raw SPEI data (Vicente-Serrano et al., 2010).
  • results/figures/figure_A_1.jpg: maps of breeding and invasion areas based on Cressman and Stefanski (2016).
  • results/figures/figure_A_2.jpg: plot of historical locust invasions based on Cressman and Stefanski (2016).

Pre-processing data

The code replicate.R runs all pre-processing codes that clean the raw data. Hence, one needs only to run this code after setting the right working directory in line 12.2 In what follows, we provide more details of the tasks executed by replicate.R:

First, it runs process_FAO.R, which cleans the raw locust-related data (locust events and anti-locust spraying events) and aggregate them into unique, processed data files data/outputs/FAO/swarmdata_full.rdata and data/outputs/FAO/swarm_spraydata_full.rdata.

Then, it runs process_DHS_clusters.R. This code first loads and aggregates all DHS geocoded cluster shapefiles and then matches them to the locust data. This is where the treatment assignment at the cluster level is added. Subsequently, it calculates the shortest euclidean distance from each DHS cluster to the closest town. It stores the final processed data in data/outputs/DHS_clusters/clusters_withtreat.rdata.

Subsequently, it runs process_OMA.R. It first geocodes (i.e. adds the coordinates) of each market and add treatment status (with the locust data) and several covariates (i.e. travel distance to the nearest town). Then, it matches the price to weather (SPEI) data. This last step is time-consuming and takes about 6-8 hours in a regular computer. It stores the final processed price data in data/outputs/OMA/oma_price_with_data* (where * stands for .R and .dta formats).

Finally, it executes process_DHS_BR.R. It loads each DHS Birth Record and calculates, for each children of the relevant sample, several characteristics such as in-utero treatment and in-utero average crop prices. This is a time-consuming task that takes about 10 hours in a regular computer. It stores the final processed birth record data in data/outputs/DHS_BR/MLBR_withdata.csv.

Main results: regressions, descriptive statistics, tables, and figures

The code replicate.do runs all .do files that reproduces the regression results of the paper. Hence, one needs only to run this code after setting the right working directory in line 23.3 One also needs to store the .ado programs that estimates Spatial HAC standard errors.4 In what follows, we provide more details of the tasks executed by replicate.do:

First, it executes reg_prepare.do, which does some final processing and cleaning of the data/outputs/DHS_BR/MLBR_withdata.csv dataset. That includes generating the treatment dummies, calculating average weather shocks in-utero, exposure to anti-locust spraying, and more. It also exports data/outputs/DHS_BR/MLBR_for_descstats.csv, which is a subset of the birth records that is used to calculate descriptive statistics.

Then, it runs reg_do.do, which reproduces and stores all regression-related results (i.e. tables and plots) in results/figures/ and results/tables. Ii also prints on Stata’s console the results of several Hausman tests described in the paper.

To conculde the replication of the paper, one needs to run desc_stats.R after setting the right working directory in line 7. It reproduces the final elements of the paper, such as tables with descriptive statistics, additional maps, etc.

References


  1. Some data inputs (denoted with \(^*\)) cannot be published online but are available upon request.↩︎

  2. Note that lines 37-42 install all the required R libraries from CRAN.↩︎

  3. Note that lines 14-20 install the required libraries.↩︎

  4. To do so, one needs to add programs/ols_spatial_HAC.ado and programs/reg2hdfespatial-id1id2.ado to the local Stata’s ado/ folder. In a Mac computer, that is usually in ~/Application Support/Stata/ado/personal/. The reference for the Spatial HAC standard errors (and installation) is here.↩︎