4.5 Assignment 3: package project briefs

How this assignment works

You will work in pairs. Each pair is assigned one of the five packages described below. Your brief tells you what the package should do, which functions you must implement, and what dataset to use for the demonstration.

Pairs and assignments are announced in the Week 9 lecture. Read your brief carefully before your Week 10 check-in. By Week 10 you should be able to explain the package’s purpose and your planned structure.

What you submit

By the end of Week 12, every pair submits the following.

A TestPyPI release of the package (and ideally a production PyPI release).
A GitHub repository with the source code, tests, and a working README.
A demo notebook that loads the data, calls the package, and produces the requested visualisation.
An A1 poster presented at the Week 11 showcase on Tuesday 25 May (B301-G10).
A short fit interview conducted in the Week 10 lab (about 3 to 5 minutes per person, see the next section).

Fit interview

Each pair member sits a short individual fit interview during the Week 10 lab. It is deliberately early, when the package skeleton is still taking shape, so that any uneven workload between partners can be caught and corrected before the final week. Expect three short prompts.

Walk us through one function you (personally) wrote so far, and explain what it returns.
Explain one design choice your pair has made. Why this CRS, why this filter, why this default? What did you reject and why?
Answer one short technical question on the spot (for example, “what would happen if the input had a different CRS?”).

The interview is not a test of memory and it is not a final exam. It is a quick check that both partners are actually contributing and understand the code their name is on. Bring your laptop with the package open. The interview is run during the Week 10 lab in pair-on-pair format with a demonstrator listening in.

The five package briefs

Each brief follows the same shape. A short rationale, the functions you must implement, the demonstration dataset, and notes on difficulty.

`pedestrian-exposure`

Answer this question for a walker in Auckland: which route is healthier, and why?

Pedestrians do not experience the same conditions along every part of a journey. A route along a busy arterial accumulates more traffic noise and roadside air pollution than one through a park, even if the total distance is the same. This package turns those qualitative differences into quantitative scores by joining three environmental layers to each edge of a walking route.

What we expect to see: a CBD walking route scored on three environmental factors (noise, air pollution, green space), mapped together so the trade-offs are obvious.
The deliverable: a map of the route with each edge coloured by total exposure, a small bar chart breaking out the three components, and a one-line summary score.
The insight: which streets are quietest? Which are the most polluted? Where does greenery actually offset the others?
Key functions: add_exposure(route_gdf, layer, kind) and summarise_exposure(route_gdf).
Demo dataset: a CBD walking route via OSMnx, an OSM-based noise proxy, LAWA air quality, and Sentinel-2 NDVI sampled along each edge.
Difficulty: intermediate to ambitious (three small pipelines, kept consistent).

Functions in brief.

load_route(origin, destination, place) returns a GeoDataFrame of edges in EPSG:2193 (NZTM) from OSMnx.

add_exposure(route_gdf, layer, kind) joins one named exposure layer onto each edge. kind is one of "noise", "air_quality", or "green", and the function adds a noise_exposure, air_exposure, or green_exposure column accordingly. Polygon layers are spatially joined and raster layers are sampled.

summarise_exposure(route_gdf, weights=None) returns a tidy DataFrame plus a route-level weighted summary. Weights default to edge length.

Tip

Build incrementally. Get load_route() and one of the three add_exposure() calls working end to end first. The remaining two follow the same pattern. Leave the weighted summary until all three layers attach correctly.

Tip

Coordinate reference systems are the most common source of bugs here. Reproject everything to EPSG:2193 (NZTM) before any spatial join. Write a tiny pytest that asserts the output CRS.

`equitransport`

Answer this question: does Auckland’s public transport access reach the most deprived neighbourhoods?

Transport planning that looks only at average accessibility can obscure deep inequalities. A city might have good average PT coverage while the most deprived neighbourhoods remain poorly served. This package connects an accessibility metric to the New Zealand Index of Deprivation (NZDep2018) at SA2 level, then computes how that access is distributed across the population.

What we expect to see: a single number (Gini coefficient) plus a five-row quintile table that turns an equity argument into evidence.
The deliverable: a choropleth of Auckland SA2s coloured by NZDep quintile and a parallel map of PT access (stops within 400 m walk).
The insight: do the most deprived quintiles get fewer stops per capita? By how much? Where are the worst gaps?
Key functions: load_nzdep(sa2_gdf) and equity_summary(gdf).
Demo dataset: Stats NZ SA2 boundaries + Auckland Transport GTFS PT-stop counts + NZDep2018 scores from the University of Otago.
Difficulty: intermediate (meshblock-to-SA2 aggregation is the tricky bit).

Functions in brief.

load_nzdep(sa2_gdf) joins NZDep2018 deprivation scores to an SA2-level GeoDataFrame by meshblock or SA2 code.

compute_access(gdf, metric_col) calculates the chosen accessibility metric per SA2 polygon (for example, PT stops within 400 m walk).

equity_summary(gdf) returns a quintile breakdown table and a Gini coefficient, along with a choropleth-ready GeoDataFrame for mapping.

Tip

The NZDep2018 data is published by the University of Otago and is freely downloadable as a CSV. Ship it as a small bundled data file inside your package rather than fetching it at runtime.

`treecrown-nz`

Answer this question: which Auckland neighbourhoods are tree-rich, and how shaded is a typical walk?

Urban tree canopy is increasingly recognised as critical infrastructure. It reduces heat stress, improves air quality, and makes walking more comfortable. Auckland Council has invested in LiDAR-derived canopy mapping, but using that data programmatically requires navigating the LINZ Data Service API, clipping geometries, and computing coverage statistics. This package automates that workflow.

What we expect to see: a coverage number (% canopy) for each of three Auckland suburbs, and a route map coloured by overhead canopy fraction.
The deliverable: a side-by-side comparison plot for the three suburbs plus a “shaded walk” map for one chosen route.
The insight: which suburbs have invested in canopy? Which routes stay shaded on a summer afternoon? Where is the canopy gap worst?
Key functions: canopy_coverage(area_gdf) and route_shade(route_gdf, buffer_m).
Demo dataset: the Auckland Council LiDAR-derived canopy layer available via the LINZ Data Service, compared across three Auckland suburbs of your choice.
Difficulty: intermediate to ambitious (the LINZ API has a learning curve).

Functions in brief.

load_canopy(bbox) fetches the Auckland Council canopy layer from LINZ for a given bounding box and returns a GeoDataFrame of canopy polygons.

canopy_coverage(area_gdf) computes the percentage of each input polygon covered by canopy.

route_shade(route_gdf, buffer_m) buffers each edge of a route by a given distance (in metres) and returns the canopy fraction within that buffer, a proxy for pedestrian shade.

Tip

The LINZ API requires an API key. Register for a free account at data.linz.govt.nz and store your key in a .env file. Never hard-code it in your package source.

`pavescore`

Answer this question: can a phone photo tell us which Auckland pavements need fixing?

Pavement condition inspections are currently conducted manually by Auckland Council field staff, a slow and expensive process. Computer vision offers a path to scalable, consistent assessment from street-level imagery. This package takes a pavement photograph and returns a structured condition assessment based on image analysis. A working basic version uses edge detection and texture analysis. A more ambitious version wraps a lightweight pre-trained classifier. Either approach is valid if your outputs are consistent and your methodology is clearly documented.

What we expect to see: every input image scored on a 0-to-100 condition scale with binary flags for cracking, surface degradation, and obstruction.
The deliverable: a map of scored photo locations along Dominion Road or Queen Street, coloured good / fair / poor, plus a small table of the worst spots.
The insight: how does pavement quality vary along one street? Which segments are worst? Could a council prioritise repairs from this map?
Key functions: score_image(path_or_url) and to_geodataframe(results, coords).
Demo dataset: Mapillary street-level images sampled along Dominion Road or Queen Street, Auckland. The Mapillary API is free for research use.
Difficulty: ambitious. The most open-ended brief of the five.

Functions in brief.

score_image(path_or_url) returns a PavementResult dataclass containing an overall score (0 to 100, where 100 is perfect), binary defect flags for cracking, surface degradation, and obstruction, and a confidence value.

score_batch(image_list) processes a list of images and returns a summary DataFrame with one row per image.

to_geodataframe(results, coords) takes a list of PavementResult objects and a matching list of (lat, lon) tuples and returns a GeoDataFrame suitable for mapping.

Tip

Focus on getting score_image() returning a consistent and sensible output before building the batch and spatial methods. A simple texture-based scorer that works reliably beats an ambitious classifier that breaks.

Warning

Image analysis can be computationally slow. Pre-sample 10 to 20 images for the demo notebook rather than fetching hundreds at runtime during the presentation.

`escooter-akl`

Answer this question: how do e-scooters move around the CBD, and where are operators failing to enforce parking rules?

Flamingo operates a shared e-scooter fleet in Auckland and generates a stream of trip records. The raw data is messy. Trips can have missing endpoints, occur during prohibited hours, or end inside no-parking zones. Cities and operators need clean OD flows for parking infrastructure decisions, equity audits, and integration with public transport planning. This package takes a raw Flamingo trip log and turns it into something you can map and reason about.

What we expect to see: a clean origin-destination flow map plus a count of trips that ended inside no-parking zones.
The deliverable: an OD chord diagram or arrow map between SA2 zones plus a marker map of geofence violations with zone labels for reporting.
The insight: which CBD zones are origin-heavy versus destination-heavy? Where are violations concentrated? What story would you tell Auckland Council?
Key functions: od_flows(trips_gdf, zones_gdf) and geofence_violations(trips_gdf, no_park_gdf).
Demo dataset: a sample CSV of Flamingo Auckland CBD trips + Stats NZ SA2 zones + Auckland Council CBD pedestrianised polygons as a no-park proxy.
Difficulty: intermediate (messy data; loud failure beats silent failure).

Functions in brief.

load_trips(path) reads a Flamingo trip CSV with columns for start and end latitude/longitude and a timestamp, and returns a GeoDataFrame of trip records with start and end Point geometries in EPSG:2193 (NZTM).

od_flows(trips_gdf, zones_gdf) performs a spatial join from trip endpoints to zones, then aggregates trip counts into a tidy origin-destination DataFrame indexed by origin zone code and destination zone code.

geofence_violations(trips_gdf, no_park_gdf) returns a GeoDataFrame of trips whose end point lies inside any of the supplied no-parking polygons, with the violated zone name attached for reporting.

Tip

Start with load_trips() and a tiny CSV of 20 to 50 rows you make by hand. Once load_trips() returns a clean GeoDataFrame, the other two functions are straightforward spatial joins.

Note

Live operator feeds typically require a data-sharing agreement. For the demo, use a sample CSV in the shape of a Flamingo trip log. The point is to show your pipeline works on realistic data, not to ship a live integration.

Working timeline

A suggested rhythm. Adjust as needed.

Week 9. Read your brief, set up the package skeleton in pairs, draft the public API as a list of function signatures, and implement one function end to end. The Week 9 lab walks you through this.

Week 10. Add tests, write the README, publish to TestPyPI, start the demo notebook, and sit the short fit interview with your partner. The Week 10 lab covers pytest patterns, TestPyPI release, poster outlining, and the fit interview itself.

Week 11. Polish the code, finish the demo notebook, and present the A1 poster at the Tuesday 25 May showcase (B301-G10).

Week 12. Publish to production PyPI once the TestPyPI install is fully confirmed.

Common mistakes to avoid

These come up every year. None of them are fatal, but each one costs you time.

Hard-coded paths. Use importlib.resources or pathlib to find files relative to the package. Never write '/Users/me/Desktop/...' inside the package source.

No error handling on missing data. If a file does not exist or an API key is missing, raise a clear FileNotFoundError or ValueError rather than letting a deep KeyError surface later.

Print statements left in functions. Remove print calls from your final code. Use return values for outputs and let the user decide what to display.

Vague docstrings. Every public function needs a one-line summary, a Parameters section, and a Returns section. NumPy style is the convention. Look at akl-ped-counts for examples.

Tests that only check imports. A test that just confirms from your_package import f is not really a test. Write tests that exercise the function with real-shaped inputs and check the output shape and values.

Forgetting the demo notebook. The notebook is what convinces a reader that your package works on real data. Leave time for it. Run it from a clean kernel before submitting.

Resources

akl-ped-counts repository, https://github.com/dataandcrowd/akl-ped-counts
Python Packaging Guide, https://packaging.python.org/
uv documentation, https://docs.astral.sh/uv/
pytest documentation, https://docs.pytest.org/
Heart of the City pedestrian data, https://www.hotcity.co.nz/pedestrian-counts
LINZ Data Service, https://data.linz.govt.nz/
Stats NZ open data, https://datafinder.stats.govt.nz/
NZDep2018, https://www.otago.ac.nz/wellington/departments/publichealth/research/hirp/otago020194.html
Mapillary API, https://www.mapillary.com/developer/api-documentation
Auckland Council micromobility information, https://www.aucklandcouncil.govt.nz/
Mobility Data Specification (MDS), https://github.com/openmobilityfoundation/mobility-data-specification