R in Geospatial Analysis:

Building Spatially Interpolated Air Pollution Field

Dr. Hyesop Shin

MRC/CSO Social and Public Health Sciences Unit, University of Glasgow

2023-06-09

About Myself

I am a Geographer Interested in…

Interests

  • Transport and Health
  • Air Pollution
  • Population Mobility
  • Citizen Science
  • Children’s Physcial Activity

Techniques

  • Geospatial analysis
  • Agent-based modelling (ABM)
  • HPC: Cloud-based computing
  • Crowdsourcing
  • GPS and Mobility


Tools:

Retirement of geospatial packages

Context

Facts about NO2 Exposure: Did you know?

Methods to estimate ‘Population Exposure’

Methods to estimate ‘Population Exposure’

Geostatistical Modelling

  • Statistical properties of the observations (e.g. Kriging)
  • Pros: Mathematically sound concept, quick implementation speed, easy data aggregation, and useful software
  • Cons: Artefacts, Not fully addressing small-scale variation, smoothed (no emission)

Atmospheric Modelling

  • Mathematical assumptions to measure the impact of emission of atmosphere (e.g. CALPUFF)
  • Pros: Consider Meteorological Impact
  • Cons: Long execution, Requires lots of computational power and learning curve, Aggregated Measure, Multiple software

Methods to estimate ‘Population Exposure’

Geostatistical Modelling

  • Statistical properties of the observations (e.g. Kriging)
  • Pros: Mathematically sound concept, quick implementation speed, easy data aggregation, and useful software
  • Cons: Artefacts, Not fully addressing small-scale variation, smoothed (no emission)


Given we have a temporally rich but spatially poor pollution data, why don’t we start from a computationally light, reproducible, and mathematically sound model?

Objective

To develop an air pollution package in R that allows anyone to easily generate a pollution map

  • To examine small-scale variations that occur during SI prediction
  • To introduce a new road-scale spatial interpolation method that employs road weighting

Method

Roadmap

From Points to Areas

  • How are we going to create an air pollution field, given a set of points?
  • What is the distribution of air pollution?
  • How does the air quality in one station relate to the other ones?

For example, the air quality one meter ahead of you is more likely to be similar than 100 meters away.

Spatial Autocorrelation: “Closer things are more predictable and have less variability. While distant things are less predictable and are less related”.

Modelling with Universal Kriging

  • How do we create a systematic map? => build a semivariogram

  • To build a semivariogram, always remember we are looking at all distances between 2 samples and their variability

Modelling with Universal Kriging

Since NO2 varies substantially between nearby stations, it might be difficult to get a perfect empirical semivariogram.

“AutoKrige” in the automap package might help us find near to the goodness-of-fit.

Daily Kriged Output…but too smoothed


Adding Road Weights


Results

Code Demonstration

Code Demonstration

Wrap up

Summary

  • When modelling population exposure, accurate prediction of NO2 is key.
  • No golden rules for method choice, but for finer temporal intervals, you have to consider small-scale variability even more
  • Finer temporal population is helpful (e.g. Daytime/Nighttime Population)
  • Like other computer languages, R Ecosystem changes continuously.
  • Making the best use case is to add the road weight with the “big brothers” e.g. tibble, sf, dplyr

Next Steps


Thank You!

  • Email: hyesop.shin@glasgow.ac.uk
  • Twitter: @hyesop


Any questions or comments?