<!-- class: inverse, center, title-slide, middle --> class: center, middle <style> .title-slide .remark-slide-number { display: none; } g { color: rgb(0,130,155) } r { color: rgb(174,77,41) } y { color: rgb(177,148,40) } </style> # Lecture 01: Geospatial Data Sciences # and Economic Spatial Models ## <img src="figs/bse_primary_logo.png" style="width: 35%" /><br><br>Bruno Conte ## 08/Jan/2025 --- # Geospatial Data and Spatial Models: This Course - Introduce students to conceptual and practical aspects of <r>spatial data</r> - What is spatial (geographical) data? - How is it used in <g>research in economics</g> or business applications? - Which tools (i.e. computer systems/languages) do we need to work with it? -- - Main goal: <y>**concepts + tools**</y> = practice with real-world data - Concepts: types and formats of spatial data + <g>economic spatial models</g> - Tools: programming in ``R`` and ``RStudio`` -- - Course's **main philosophy**: a course by an <r>economist working with spatial data</r> - Rather than a course by a spatial data's specialist! --- # Geospatial Data and Spatial Models: This Course <u>This course:</u> how to use spatial data to empirically answer <r>research questions of our interest</r>. .pull-left[ **You will learn** - What is spatial data and its applications like economic research - Basic `R` programming - Most common spatial data operations - Introductory (spatial) data visualization ] .pull-right[ **You will not learn** - All state-of-art tools available in `R` - To write an efficient `R` code<sup>*</sup> - To handle big data<sup>*</sup> - To solve every possible data problem ] .footnote[ [*] This is up to you. ] --- # Geospatial Data and Spatial Models: This Course ## Main Text Book 1. Lovelace, R., Nowosad, J. and Muenchow, J., 2019. **Geocomputation with R**. Chapman and Hall/CRC. ## Other (useful) References 1. Donaldson, D. and Storeygard, A., 2016. The view from above: Applications of satellite data in economics. *Journal of Economic Perspectives*, *30(4)*, pp.171-98. 2. Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009 3. Wickham, H. and Grolemund, G., 2016. R for data science: import, tidy, transform, visualize, and model data. " O'Reilly Media, Inc.". --- # Geospatial Data and Spatial Models: Schedule 1. Introduction to (spatial) data and programming in `R` **[08/Jan/2025]** - Introduction to spatial data and examples in economics - Basic `R` programming: set up and practice 2. Week 2-4: Vector spatial data **[14 - 29/Jan/2025]** 3. Week 5-7: Raster spatial data + (basic) interactive tools **[05 - 19/Feb/2025]** 4. Week 8-10: Spatial models and applications with data **[25/Feb - 12 Mar/2025]**<br> <br> 5. <y>Take-home exam</y> **[26/Mar/2025]** --- # Geospatial Data and Spatial Models: Evaluation 1. Class participation (10%) 2. Practical assignments (3 x 10%, in teams) 3. <g>Take-home exam</g> (60%, `pdf` by email): - Research idea: spatial data + problem = **research question** - Replication of tasks: data + tools = empirical motivation - Any <r>questions</r>? --- # Geospatial Data and Spatial Models: Logistics - Classes: every Mon from 08:30-10:30 am - 1 hour concepts + (optional) 10' break + 40 mins practice (hands-on) - Course material: [<u>webpage</u>](https://brunoconteleite.github.io/08-geospatial-ds-bse/) and [<u>syllabus</u>](https://www.dropbox.com/scl/fi/eig4vhhsirxx85dqex656/23dm017-geospatial-data-science-and-economic.pdf?rlkey=5grrq1v9icj8pjl031jcrsvg5&raw=1) - Office hours: Mon 11:00 am - 12:00 pm (write me an email!) - Potentially mixed backgroud: **be patient**! ## Hands-on sessions: - Your own computer + `RStudio`: - We will set it up together! - Any <r>questions</r>? --- class: inverse, center, middle # Getting started: what is Spatial Data? --- background-image: url(https://www.edmaps.com/europe_1789_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1789 (before the French Revolution) --- background-image: url(https://www.edmaps.com/europe_1812_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1812 (before the French Invasion of Russia) --- background-image: url(https://www.edmaps.com/europe_1815_excelsior_d.jpg) background-size: cover class: center, bottom, inverse # Europe in 1815 (after the Congress of Vienna) --- background-image: url(https://hsat.space/wp-content/uploads/2020/08/brazil_planet_-6.35-53.55-768x512-1.jpg) background-size: cover class: center, bottom, inverse # Satellite picture of fires (and deforestation) in the Brazilian Amazon ??? Source: https://hsat.space/satellites-deforestation/ --- background-image: url(https://maaproject.org/wp-content/uploads/2021/01/maaproject.org-maap-132-amazon-deforestation-hotspots-2020-HS2-PFL-Amz-Biog2020-GLAD-Confirm-2-5-10-v3-Eng.jpg) background-size: contain class: center, top, inverse # Deforestation in the Brazilian Amazon ??? Source: https://maaproject.org/2021/amazon-hotspots-2020/ --- background-image: url(https://www.researchgate.net/publication/343842128/figure/fig1/AS:928229292511232@1598318517654/A-comparison-of-the-inter-and-intra-urban-variability-of-slums-Image-a-shows-a-typical.ppm) background-size: contain class: center, bottom, inverse # Urban slums in India ??? Source: https://www.researchgate.net/figure/A-comparison-of-the-inter-and-intra-urban-variability-of-slums-Image-a-shows-a-typical_fig1_343842128 --- # What is Spatial Data? - Data/information that has a <r>geographical attribute</r> - **Much more** than coordinates on a standard dataset - Polygons, areas, distances, height, overlaying, intersections, ... -- .pull-left[ - Common aspect: **<y>unstructured data</y>** (i.e. unconventional data format) <img src="https://previews.123rf.com/images/in8finity/in8finity1506/in8finity150600031/41436580-earth-globe-icons-set-.jpg" style="width: 40%" /> ] .pull-right[ - Our goal: manipulate it into the **<g>structure</g>** required by research <img src="https://cdn-icons-png.flaticon.com/512/6925/6925245.png" style="width: 40%" /> <img src="https://cdn-icons-png.flaticon.com/512/5198/5198977.png" style="width: 40%" /> ] --- # What is GIS? - GIS = <g>Geographic Information Systems</g> - (old) Systems used to manipulate/process spatial data (**1980's**) - 1990's: rise of user-friendly, **desktop softwares** (ArcGIS, QGIS) - <r>Data Science revolution:</r> full integration of GIS tools into data-processing pipelines; i.e. computer routines that process (potentially spatial) data in <y>modern languages</y> (e.g. `R`) - **Examples:** - Firm processing purchases across branches - Is revenue larger in branches *closer to public transportation?* - HR firm allocating seasonal workers across plants - Choose workers based on residence (reduce commuting time)? --- # Why not GIS in desktop, user-friendly, softwares? Require human interaction (e.g. clicking, moving files) to <g>structure spatial data</g>. .pull-left[ .center[**GIS in 1980's:** <img src="https://upload.wikimedia.org/wikipedia/commons/4/42/SYMAP_-_LAB-LOG_1980.png" style="width: 80%" /> ] ] .pull-right[ .center[**ArcGIS/QGIS:**] <img src="https://i.imgur.com/nuSANEH.jpg" style="width: 120%" /> ] --- class: inverse, center, middle # How is Spatial Data used in Economics? --- # Spatial Data in Economics - **Motivation:** research questions that requires structuring spatial data. - Spatial data = unstructured - GIS tools: manipulating spatial data to the required structure - <r>Applications in economic research:</r> - Cholera in London (Snow, 1856) - Colonial institutions and development in Peru (Dell, 2010) - Railroads and welfare in India (Donaldson, 2018) - Climate change and urbanization in Africa (Henderson et al., 2017) - Business analytics for asset-exposure to climate risks --- # Application 01: John Snow's Cholera Maps in Soho (London) .pull-left[ .center[ <img src="https://www.researchgate.net/profile/Maaruf-Lawal/publication/333521032/figure/fig1/AS:764738510278658@1559339277494/John-Snow-map-of-cholera-death-in-London-Gilbert-EW-1958-There-are-three-important.png" style="width: 90%" /> ] ] .pull-right[ - **Cholera outbreak** in mid 19th century - Former theory: transmission by air - John Snow's hypothesis: <g>germ-contaminated water</g> - Different rates between locations with different water suppliers - Higher rates for those <r>supplied by (polluted) Thames River</r> - Snow's finding: revolution on public sanitation ] --- # Application 02: Long-term consequences of the Mita (colonial) system in Peru .pull-left[ .center[ <img src="figs/mita01.png" style="width: 100%" /> ] ] .pull-right[ - Spanish empire required **forced labor** to work on silver mines (Potosí) - Workers from high lands (Mita regions): resistent to the harsh mine conditions - Mita boundaries: regions that <r>provided more/less conscripts</r> (discontinuously!) - **Dell's findings:** long-lasting <y>development differences</y> - <g>Economic channels:</g> land ownership inequality, less public services, ... ] --- # Application 02: Long-term consequences of the Mita (colonial) system in Peru .pull-left[ .center[ <img src="figs/mita02.png" style="width: 80%" /> ] ] .pull-right[ - Spanish empire required **forced labor** to work on silver mines (Potosí) - Workers from high lands (Mita regions): resistent to the harsh mine conditions - Mita boundaries: regions that <r>provided more/less conscripts</r> (discontinuously!) - **Dell's findings:** long-lasting <y>development differences</y> - <g>Economic channels:</g> land ownership inequality, less public services, ... ] --- # Application 03: Transportation integration and welfare in India .pull-left[ .center[ <img src="figs/donaldson.png" style="width: 100%" /> ] ] .pull-right[ - Vast **expansion of railroad network** in British colonial India - Standard trade theory: <g>welfare gains from market integration</g> - Lack of evidence within countries - **Donaldson's findings:** improved trade conditions increased welfare - <y>Integrated remote areas</y> (reduced price gaps, more trade flows) - <r>Welfare gains</r> (real income) from intraregional trade ] --- # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson01.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <g>migrate to cities?</g> - **Henderson et al.'s findings:** depends on <r>industry composition of cities</r> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <y>structural change!</y> ] --- # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson02.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <g>migrate to cities?</g> - **Henderson et al.'s findings:** depends on <r>industry composition of cities</r> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <y>structural change!</y> ] --- # Application 04: Climate change and urbanization in Africa .pull-left[ .center[ <img src="figs/henderson03.png" style="width: 100%" /> ] ] .pull-right[ - **Increased dryness in Africa** (worse conditions for agriculture) - Question: do agents in affected rural regions <g>migrate to cities?</g> - **Henderson et al.'s findings:** depends on <r>industry composition of cities</r> - Increased size (**nightlights**) in manufacturing, exporting cities - Opposite evidence for market towns (service providers to agriculture) - Importance of <y>structural change!</y> ] --- # Application 05: Business analytics for ESG risk .pull-left[ .center[ <img src="figs/prometeia1.png" style="border: 2px solid gray; width: 90%" /> ] ] .pull-right[ - **ESG =** Environment, Social, and Government- related risks - Question: how to assess <g>asset exposure</g> to ESG-related risks? - **Business product:** Management tool for <r>climate risks</r> and advanced analytics - Balance sheet data (assets) + macro model (transition scenarios) - Asset exposure: location + <y>**climate risk data**</y> - Link <u>[here](https://www.dropbox.com/scl/fi/3nkieje7n4aj8qjm2h9wp/prometeia_report.pdf?rlkey=tr4h4m9ub9yjz10b2nhfzav0i&raw=1) ] --- # Application 05: Business analytics for ESG risk .pull-left[ .center[ <img src="figs/prometeia2.png" style="border: 2px solid gray; width: 80%" /> ] ] .pull-right[ - **ESG =** Environment, Social, and Government- related risks - Question: how to assess <g>asset exposure</g> to ESG-related risks? - **Business product:** Management tool for <r>climate risks</r> and advanced analytics - Balance sheet data (assets) + macro model (transition scenarios) - Asset exposure: location + <y>**climate risk data**</y> - Link <u>[here](https://www.dropbox.com/scl/fi/3nkieje7n4aj8qjm2h9wp/prometeia_report.pdf?rlkey=tr4h4m9ub9yjz10b2nhfzav0i&raw=1) ] --- # Application 05: Business analytics for ESG risk .pull-left[ .center[ <img src="figs/prometeia3.png" style="border: 2px solid gray; width: 80%" /> ] ] .pull-right[ - **ESG =** Environment, Social, and Government- related risks - Question: how to assess <g>asset exposure</g> to ESG-related risks? - **Business product:** Management tool for <r>climate risks</r> and advanced analytics - Balance sheet data (assets) + macro model (transition scenarios) - Asset exposure: location + <y>**climate risk data**</y> - Link <u>[here](https://www.dropbox.com/scl/fi/3nkieje7n4aj8qjm2h9wp/prometeia_report.pdf?rlkey=tr4h4m9ub9yjz10b2nhfzav0i&raw=1) ] --- class: inverse, center, middle # How to work with spatial data in R? --- # Working with data (including spatial) in R - What is `R`? - Computer language for statistical computing and graphics - Open source, <g>free access</g> - Developers' community (CRAN) - Development of **libraries** (packages) for <r>specific applications</r> - `RStudio`: integrated development environment (IDE) - <y>User-friendlier environment</y> to work with `R` --- # Working with data (including spatial) in R - Data work in `R`: one of its <g>many capabilities</g> - Producing documents, slides, or webpages - **Our approach:** 1. <g>Basic programming and data work</g> in `R` `\(\rightarrow\)` **content of today!** 2. Working with spatial data in `R` (GIS tools) - Important: <r>focus on basics</r> - Introduction of basic concepts/tools rather than a presentation of **all possible** functions - Every spatial data application is a challenge on its own - The goal is to introduce the basics so that you can apply (and improve) them on <y>your own application!</y> --- class: inverse, center, middle # Spatial Data with R: what have previous # students learned and done? --- # Spatial Data with R: examples from previous years - **Final projects** from past years: illustrate the course's <r>potential for practical applications</r> - Potentials for electricity interconnection in the USA (EECC) - Anti-drought policies and natural conservation in Catalunya (EPP+DSDM) - Risk profile of NBA teams and players (DSDM+DSM+ECON) - Many others (CRM+ECON, ...) <br> -- Note the **<y>multi-program feature of projects</y>** (richness and variety of students and ideas!) --- # Example 01: Cost-benefits of connecting eletricity markets .pull-left[ .center[ <img src="figs/proj1_1.png" style="width: 70%" /> ] ] .pull-right[ US geography: `\(\neq\)` **green energy potentials** - Connecting to urban/production centers <r>is costly</r> - Which are the high-potentials regions for renewable investments? Data: <g>Solar radiation + wind speed = LCOE</g> - High potentials (and benefits if connecting) Texas' market - Experiment with `\(\neq\)` **<y>transmission lines</y>** ] --- # Example 01: Cost-benefits of connecting eletricity markets .pull-left[ .center[ <img src="figs/proj1_2.png" style="width: 90%" /> ] ] .pull-right[ US geography: `\(\neq\)` **green energy potentials** - Connecting to urban/production centers <r>is costly</r> - Which are the high-potentials regions for renewable investments? Data: <g>Solar radiation + wind speed = LCOE</g> - High potentials (and benefits if connecting) Texas' market - Experiment with `\(\neq\)` **<y>transmission lines</y>** ] --- # Example 01: Cost-benefits of connecting eletricity markets .center[ <img src="figs/proj1_3.png" style="width: 70%" /> ] --- # Example 02: Effectiveness of anti-drought policies .pull-left[ **Severe drought in Catalunya** (2004-2008) - Strict emergency measures for efficient water usage in 2007 - Did that `\(\downarrow\)` water pumping and vegetation depletion? <g>Spatial data + RD Design</g> = causal effects - Comparison between CAT (treated) and Aragón (control) - Neat example of <y>policy evaluation</y> project! ] .pull-right[ .center[ <img src="figs/proj2_1.png" style="width: 100%" /> ] ] --- # Example 02: Effectiveness of anti-drought policies .center[ <img src="figs/proj2_2.png" style="width: 80%" /> ] --- # Example 02: Effectiveness of anti-drought policies .center[ <img src="figs/proj2_3.png" style="width: 80%" /> ] --- # Example 03: Risk profile of NBA teams and players .pull-left[ .center[ <img src="figs/proj3_1.png" style="width: 100%" /> ] ] .pull-right[ **NBA scoring:** security-risk trade off - 3-points shots: high return/risk - NBA team shooting profile `\(\equiv\)` <g>risky portfolio</g> Have NBA teams become <r>more risk-takers</r> over time? - Data: players' position, shooting, and efficiency - `\(\uparrow\)` risk taken (led by <y>individual players</y>) ] --- # Example 03: Risk profile of NBA teams and players .center[ <img src="figs/proj3_2.png" style="width: 70%" /> ] --- # Example 03: Risk profile of NBA teams and players .center[ <img src="figs/proj3_3.png" style="width: 70%" /> ] --- # Example 03: Risk profile of NBA teams and players .center[ <img src="figs/proj3_4.png" style="width: 45%" /> <img src="figs/proj3_5.png" style="width: 45%" /> ] --- # Example 03: Risk profile of NBA teams and players .center[ <img src="figs/proj3_6.png" style="width: 100%" /> ] Creativity + spatial data skills = <u>interesting research question!</u> **<r>Are you up for it?</r>** --- class: inverse, center, middle # R Basics --- # Basics of programming and data work in R Open `01_class01.R` on your own computer, where we will cover the following topics. The subsequent slides here are for <g>reference only</g>. .pull-left[ - **Concepts covered:** 1. `R` basics: environment, main elements (vectors, lists, `data.frame`), libraries 2. Basic <r>data wrangling</r> with ``dplyr`` - Filtering, mutating, merging 3. <g>Data visualization</g> with `ggplot2` ] .pull-right[ - **Setting up `R` (or in `RStudio`)**: ```r # # Install packages (only first time) *# install.packages('data.table') *# install.packages('tidyverse') # Load them: library(data.table) library(dplyr) ``` Note: warning messages are OK! ] --- # Basics of programming and data work in R (1/3) - `R` is versatile working environment - Can handle **different** elements (e.g. datasets, images, texts) contempotaneously - Setting the **local** environment: working directory ```r getwd() # tells you the current wd ``` ``` ## [1] "/Users/bruno/Dropbox/Teaching/08-geospatial-ds-bse" ``` - Types of elements in `R` environment: - Vectors, `data.frame()`, `list()`, among (many) others - To check (or clean) current environment: `ls()` (or `rm()`) --- # Basics of programming and data work in R (2/3) - **Data wrangling:** manipulating raw data with `dplyr` - Creating new variables, filtering datasets, arranging, merging, reshaping - <g>Pipe syntax:</g> uses `%>%` operator. Example if merging datasets: ```r df <- merge.data.table(a,b,by = 'Month') # is equivalent to: df <- a %>% left_join(b,by = 'Month') ``` - Same reasoning with many other `dplyr` data-wrangling functions; e.g. `mutate()`, `filter()`, `select()`, `summarise()`, `arrange()` - Check wiki <u>[here](https://dplyr.tidyverse.org/)</u> --- # Basics of programming and data work in R (3/3) - **Data visualization** in `R` with `ggplot()`. Syntax that maps <g>data</g> `\(\rightarrow\)` <r>geometry</r> `\(\rightarrow\)` <y>visuals</y> .pull-left[ ```r library(ggplot2) # load library # Standard syntax: p <- ggplot(data = data) + geom_GEOM(mapping = aes(MAPPINGS)) + THEME() # Example: p <- ggplot(data = airquality) + geom_point(mapping = aes(Wind,Temp, color = Month)) + theme_bw() ``` - Check wiki <u>[here](https://ggplot2.tidyverse.org/)</u> and <u>[here](https://ggplot2.tidyverse.org/reference/)</u> ] .pull-right[ .center[ <img src="figs/class01/unnamed-chunk-5-1.png" width="75%" /> ] ] --- # Hands-on: your turn! (1/2) .pull-left[ - Distribution (histogram) of CO2 uptake across plants in US/Canada - Distinguish plants by state (Quebec/Mississipi) - Extra: play with different `theme()` parameters of `ggplot()` - Use the `datasets::CO2` data! ] .pull-right[ <!-- --> ] --- # Hands-on: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- # Hands-on: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- # Hands-on: your turn! (2/2) .pull-left[ - Icome vs. Murder rates across US states (scatter plot). Use `state.x77` dataset - Distinguish between high/low density states - High density = (Population/Area) > median: use `mutate()` - Extra: additional geom layer with non-linear relationship? Use `geom_smooth()` - Can you remove outliers (i.e. states with Income higher than 6,000)? Use `filter()` ] .pull-right[ <!-- --> ] --- # References - Dell, M., 2010. The persistent effects of Peru's mining mita. *Econometrica*, *78(6)*, pp.1863-1903. - Donaldson, D., 2018. Railroads of the Raj: Estimating the impact of transportation infrastructure. *American Economic Review*, *108(4-5)*, pp.899-934. - Henderson, J.V., Storeygard, A. and Deichmann, U., 2017. Has climate change driven urbanization in Africa?. *Journal of development economics*, *124*, pp.60-82. - Snow, J., 1856. On the mode of communication of cholera. *Edinburgh medical journal*, *1(7)*, p.668.