class: inverse, center, title-slide, middle <style> .title-slide .remark-slide-number { display: none; } </style> # Lecture 01: Spatial Data ## Theory and Tools (a.k.a. GIS Tools Lab.) ### <img src="figs/unibo_logo_white.png" style="width: 15%" /><br><br>Bruno Conte ### 18/Sep/2023 --- # Spatial data in economics: this course - Introduce students to conceptual and practical aspects of <span style="color: rgb(0,114,178)">**spatial data**</span></br> - What is spatial (geographical) data? - How is it used in <span style="color: rgb(0,114,178)">**research in economics**</span>? - Which tools (i.e. computer systems/languages) do we need to work with it? - Main goal: <span style="color: rgb(213,94,0)">concepts + tools</span> = practice with real-world data - Concepts: types and formats of spatial data - Tools: programming in ``R`` and ``RStudio`` - Course's **main philosophy**: a course by an <span style="color: rgb(0,158,115)">economist working with spatial data</span> - Rather than a course by a spatial data's specialist! --- # Spatial data in economics: this course This course is about how we, (in principle) **economists**, can use spatial data to empirically answer <span style="color: rgb(213,94,0)">reserch questions of our interest</span>. .pull-left[ **You will learn** - What is spatial data and its applications in economic research - Basic `R` programming - Most common spatial data operations - Introductory (spatial) data visualization ] .pull-right[ **You will not learn** - All state-of-art GIS tools available in `R` - To write an efficient `R` code<sup>*</sup> - To handle big data<sup>*</sup> - To solve every possible problem ] .footnote[ [*] This is up to you. ] --- # Spatial data in economics: this course ## Good references 1. Donaldson, D. and Storeygard, A., 2016. The view from above: Applications of satellite data in economics. *Journal of Economic Perspectives*, *30(4)*, pp.171-98. 2. Lovelace, R., Nowosad, J. and Muenchow, J., 2019. Geocomputation with R. Chapman and Hall/CRC. 3. Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009 4. Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.". --- # Spatial data in economics: schedule 1. Introduction to (spatial) data and programming in `R` [18.Sep.2023] - Introduction to spatial data and examples in economics - Basic `R` programming: set up and practice 2. Spatial data basics: vector data <span style="color: rgb(0,114,178)">+ assignment</span> [21.Sep.2023] 3. Basic operations with vector data <span style="color: rgb(0,114,178)">+ assignment</span> [25.Sep.2023] 4. Geometry operations and miscelanea <span style="color: rgb(213,94,0)">+ follow-up</span> [28.Sep.2023] 5. Raster data and operations <span style="color: rgb(0,114,178)">+ assignment</span> [02.Oct.2023]<br> <br> 6. <span style="color: rgb(0,158,115)">Take-home exam</span> [03.Nov.2023] --- # Spatial data in economics: evaluation 1. Class participation (10%) 2. Practical assignments (3 x 10%, in teams) 3. <span style="color: rgb(0,158,115)">Take-home exam</span> (60%, `pdf` by email): - Research idea: spatial data + economics = **research question** - Replication of tasks: data + tools = empirical motivation - Make sure that you register to it (on almaesami)! - Any <span style="color: rgb(213,94,0)">questions</span>? --- # Spatial data in economics: logistics - Classes: every Tue from 17(:15)-20:00 pm - 1 hour theory + 15' break + 1 hour practice (hands-in) - Course material: <u>[webpage](https://brunoconteleite.github.io/02-gis-unibo/)</u> - Office hours: Tue from 16:00-17h00 (write me an email!) - Potentially mixed backgroud: **be patient**! ## Hands-in sessions: - Your own computer + `RStudio`: - We will set it up together! - Any <span style="color: rgb(213,94,0)">questions</span>? --- class: inverse, center, middle count: false # Getting started: what is Spatial Data? --- background-image: url(https://www.edmaps.com/europe_1789_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1789 (before the French Revolution) --- background-image: url(https://www.edmaps.com/europe_1812_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1812 (before the French Invasion of Russia) --- background-image: url(https://www.edmaps.com/europe_1815_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1815 (after the Congress of Vienna) --- background-image: url(https://hsat.space/wp-content/uploads/2020/08/brazil_planet_-6.35-53.55-768x512-1.jpg) background-size: cover class: center, bottom, inverse # Satellite picture of fires (and deforestation) in the Brazilian Amazon ??? Source: https://hsat.space/satellites-deforestation/ --- background-image: url(https://maaproject.org/wp-content/uploads/2021/01/maaproject.org-maap-132-amazon-deforestation-hotspots-2020-HS2-PFL-Amz-Biog2020-GLAD-Confirm-2-5-10-v3-Eng.jpg) background-size: contain class: center, top, inverse # Deforestation in the Brazilian Amazon ??? Source: https://maaproject.org/2021/amazon-hotspots-2020/ --- background-image: url(https://www.researchgate.net/publication/343842128/figure/fig1/AS:928229292511232@1598318517654/A-comparison-of-the-inter-and-intra-urban-variability-of-slums-Image-a-shows-a-typical.ppm) background-size: contain class: center, bottom, inverse # Urban slums in India ??? Source: https://www.researchgate.net/figure/A-comparison-of-the-inter-and-intra-urban-variability-of-slums-Image-a-shows-a-typical_fig1_343842128 --- # What is Spatial Data? - Data/information that has a <span style="color: rgb(213,94,0)">geographical attribute</span> - **Much more** than coordinates on a standard dataset - Polygons, areas, distances, height, overlaying, intersections, ... -- .pull-left[ - Common aspect: **<span style="color: rgb(0,158,115)">unstructured data</span>** (i.e. unconventional data format) <img src="https://previews.123rf.com/images/in8finity/in8finity1506/in8finity150600031/41436580-earth-globe-icons-set-.jpg" style="width: 40%" /> ] .pull-right[ - Our goal: manipulate it into the **<span style="color: rgb(0,114,178)">structure</span>** required by research <img src="https://cdn-icons-png.flaticon.com/512/6925/6925245.png" style="width: 40%" /> <img src="https://cdn-icons-png.flaticon.com/512/5198/5198977.png" style="width: 40%" /> ] --- # What is GIS? - GIS = <span style="color: rgb(0,114,178)">Geographic Information Systems</span> - (old) Systems used to manipulate/process spatial data (**1980's**) - 1990's: rise of user-friendly, **desktop softwares** (ArcGIS, QGIS) - <span style="color: rgb(213,94,0)">Data Science revolution:</span> full integration of GIS tools into data-processing pipelines; i.e. computer routines that process (potentially spatial) data in <span style="color: rgb(0,158,115)">modern languages</span> (e.g. `R`) - **Examples:** - Firm processing purchases across branches - Is revenue larger in branches *closer to public transportation?* - HR firm allocating seasonal workers across plants - Choose workers based on residence (reduce commuting time)? --- # Why not GIS in desktop, user-friendly, softwares? Require human interaction (e.g. clicking, moving files) to <span style="color: rgb(0,114,178)">structure spatial data</span>. .pull-left[ .center[**GIS in 1980's:** <img src="https://upload.wikimedia.org/wikipedia/commons/4/42/SYMAP_-_LAB-LOG_1980.png" style="width: 80%" /> ] ] .pull-right[ .center[**ArcGIS/QGIS:**] <img src="https://i.imgur.com/nuSANEH.jpg" style="width: 120%" /> ] --- class: inverse, center, middle count: false # How is Spatial Data used in Economics? --- # Spatial Data in Economics - **Motivation:** research questions that requires structuring spatial data. - Spatial data = unstructured - GIS tools: manipulating spatial data to the required structure - <span style="color: rgb(213,94,0)">Applications in economic research:</span> - Cholera in London (Snow, 1856) - Colonial institutions and development in Peru (Dell, 2010) - Railroads and welfare in India (Donaldson, 2018) - Climate change and urbanization in Africa (Henderson et al., 2017) --- # Application 01: John Snow's Cholera Maps in Soho (London) .pull-left[ .center[ <img src="https://www.researchgate.net/profile/Maaruf-Lawal/publication/333521032/figure/fig1/AS:764738510278658@1559339277494/John-Snow-map-of-cholera-death-in-London-Gilbert-EW-1958-There-are-three-important.png" style="width: 90%" /> ] ] .pull-right[ - **Cholera outbreak** in mid 19th century - Former theory: transmission by air - John Snow's hypothesis: <span style="color: rgb(0,114,178)">germ-contaminated water</span> - Different rates between locations with different water suppliers - Higher rates for those <span style="color: rgb(213,94,0)">supplied by (polluted) Thames River</span> - Snow's finding: revolution on public sanitation ] --- # Application 02: Long-term consequences of the Mita (colonial) system in Peru .pull-left[ .center[ <img src="figs/mita01.png" style="width: 100%" /> ] ] .pull-right[ - Spanish empire required **forced labor** to work on silver mines (PotosÃ) - Workers from high lands (Mita regions): resistent to the harsh mine conditions - Mita boundaries: regions that <span style="color: rgb(213,94,0)">provided more/less conscripts</span> (discontinuously!) - **Dell's findings:** long-lasting <span style="color: rgb(0,158,115)">development differences</span> - <span style="color: rgb(0,114,178)">Economic channels:</span> land ownership inequality, less public services, ... ] --- count: false # Application 02: Long-term consequences of the Mita (colonial) system in Peru .pull-left[ .center[ <img src="figs/mita02.png" style="width: 80%" /> ] ] .pull-right[ - Spanish empire required **forced labor** to work on silver mines (PotosÃ) - Workers from high lands (Mita regions): resistent to the harsh mine conditions - Mita boundaries: regions that <span style="color: rgb(213,94,0)">provided more/less conscripts</span> (discontinuously!) - **Dell's findings:** long-lasting <span style="color: rgb(0,158,115)">development differences</span> - <span style="color: rgb(0,114,178)">Economic channels:</span> land ownership inequality, less public services, ... ] --- # Application 03: Transportation integration and welfare in India .pull-left[ .center[ <img src="figs/donaldson.png" style="width: 100%" /> ] ] .pull-right[ - Vast **expansion of railroad network** in British colonial India - Standard trade theory: <span style="color: rgb(0,114,178)">welfare gains from market integration</span> - Lack of evidence within countries - **Donaldson's findings:** improved trade conditions increased welfare - <span style="color: rgb(0,158,115)">Integrated remote areas</span> (reduced price gaps, more trade flows) - <span style="color: rgb(213,94,0)">Welfare gains</span> (real income) from intraregional trade ] --- # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson01.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <span style="color: rgb(0,114,178)">migrate to cities?</span> - **Henderson et al.'s findings:** depends on <span style="color: rgb(213,94,0)">industry composition of cities</span> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <span style="color: rgb(0,158,115)">structural change!</span> ] --- count: false # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson02.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <span style="color: rgb(0,114,178)">migrate to cities?</span> - **Henderson et al.'s findings:** depends on <span style="color: rgb(213,94,0)">industry composition of cities</span> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <span style="color: rgb(0,158,115)">structural change!</span> ] --- count: false # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson03.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <span style="color: rgb(0,114,178)">migrate to cities?</span> - **Henderson et al.'s findings:** depends on <span style="color: rgb(213,94,0)">industry composition of cities</span> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <span style="color: rgb(0,158,115)">structural change!</span> ] --- class: inverse, center, middle count: false # How to work with spatial data in R? --- # Working with data (including spatial) in R - What is `R`? - Computer language for statistical computing and graphics - Open source, <span style="color: rgb(0,114,178)">free access</span> - Developers' community (CRAN) - Development of **libraries** (packages) for <span style="color: rgb(213,94,0)">specific applications</span> - `RStudio`: integrated development environment (IDE) - <span style="color: rgb(0,158,115)">User-friendlier environment</span> to work with `R` --- # Working with data (including spatial) in R - Data work in `R`: one of its <span style="color: rgb(0,114,178)">many capabilities</span> - Producing documents, slides, or webpages - **Our approach:** 1. <span style="color: rgb(0,114,178)">Basic programming and data work</span> in `R` `\(\rightarrow\)` **content of today!** 2. Working with spatial data in `R` (GIS tools) - Important: <span style="color: rgb(213,94,0)">focus on basics</span> - Introduction of basic concepts/tools rather than a presentation of **all possible** functions - Every GIS application is a challenge on its own - The goal is to introduce the basics so that you can apply (and improve) them on <span style="color: rgb(0,158,115)">your own application!</span> --- class: inverse, center, middle count: false # R Basics --- # Basics of programming and data work in R Open `01_class01.R` on your own computer, where we will cover the following topics. The subsequent slides here are for <span style="color: rgb(0,114,178)">reference only</span>. .pull-left[ - **Concepts covered:** 1. `R` basics: environment, main elements (vectors, lists, `data.frame`), libraries 2. Basic <span style="color: rgb(213,94,0)">data wrangling</span> with ``dplyr`` - Filtering, mutating, merging 3. <span style="color: rgb(0,114,178)">Data visualization</span> with `ggplot2` ] .pull-right[ - **Setting up `R` (or in `RStudio`)**: ```r # # Install packages (only first time) *# install.packages('data.table') *# install.packages('tidyverse') # Load them: library(data.table) library(dplyr) ``` Note: warning messages are OK! ] --- # Basics of programming and data work in R (1/3) - `R` is versatile working environment - Can handle **different** elements (e.g. datasets, images, texts) contempotaneously - Setting the **local** environment: working directory ```r getwd() # tells you the current wd ``` ``` ## [1] "/Users/brunoconteleite/Dropbox/Teaching/02-gis-unibo" ``` - Types of elements in `R` environment: - Vectors, `data.frame()`, `list()`, among (many) others - To check (or clean) current environment: `ls()` (or `rm()`) --- # Basics of programming and data work in R (2/3) - **Data wrangling:** manipulating raw data with `dplyr` - Creating new variables, filtering datasets, arranging, merging, reshaping - <span style="color: rgb(0,114,178)">Pipe syntax:</span> uses `%>%` operator. Example if merging datasets: ```r df <- merge.data.table(a,b,by = 'Month') # is equivalent to: df <- a %>% left_join(b,by = 'Month') ``` - Same reasoning with many other `dplyr` data-wrangling functions; e.g. `mutate()`, `filter()`, `select()`, `summarise()`, `arrange()` - Check wiki <u>[here](https://dplyr.tidyverse.org/)</u> --- # Basics of programming and data work in R (3/3) - **Data visualization** in `R` with `ggplot()`. Syntax that maps <span style="color: rgb(0,114,178)">data</span> `\(\rightarrow\)` <span style="color: rgb(213,94,0)">geometry</span> `\(\rightarrow\)` <span style="color: rgb(0,158,115)">visuals</span> .pull-left[ ```r library(ggplot2) try( # ignore this p <- ggplot(data = data) + geom_GEOM(mapping = aes(MAPPINGS)) + THEME() ) # Example p <- ggplot(data = airquality) + geom_point(mapping = aes(Wind,Temp, color = Month)) + theme_bw() ``` - Check wiki <u>[here](https://ggplot2.tidyverse.org/)</u> ] .pull-right[ .center[ <img src="00_class01_files/figure-html/unnamed-chunk-5-1.png" width="75%" /> ] ] --- # Hands-in: your turn! (1/2) .pull-left[ - Distribution (histogram) of CO2 uptake across plants in US/Canada - Distinguish plants by state (Quebec/Mississipi) - Extra: play with different `theme()` parameters of `ggplot()` - Use the `datasets::CO2` data! ] .pull-right[ <!-- --> ] --- # Hands-in: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- count: false # Hands-in: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- count: false # Hands-in: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- # References - Dell, M., 2010. The persistent effects of Peru's mining mita. *Econometrica*, *78(6)*, pp.1863-1903. - Donaldson, D., 2018. Railroads of the Raj: Estimating the impact of transportation infrastructure. *American Economic Review*, *108(4-5)*, pp.899-934. - Henderson, J.V., Storeygard, A. and Deichmann, U., 2017. Has climate change driven urbanization in Africa?. *Journal of development economics*, *124*, pp.60-82. - Snow, J., 1856. On the mode of communication of cholera. *Edinburgh medical journal*, *1(7)*, p.668.