4.3 A real geospatial package: akl-ped-counts

What this chapter is for

So far we have built a tiny package (geohello-yourname) and added a handful of pytest tests. This chapter takes you through a real, working package so you can see what a finished one looks like before you start your own.

The package is akl-ped-counts, which ships hourly Heart of the City pedestrian counts for 21 Auckland CBD sensors from 2019 to 2025. It is published on PyPI and lives at https://github.com/dataandcrowd/akl-ped-counts.

Rather than reinvent it here, we will read the repository together. By the end of this chapter you should be able to do the following.

  1. Open the repo and find your way around.
  2. Recognise the same parts you met in Chapter 4.1 (the src/ layout, pyproject.toml, the public API in __init__.py).
  3. Use the package in a notebook to load real Auckland footfall data.
  4. Use it as a model for your own Assignment 3 package.

How to read along

Open the repo in your browser at https://github.com/dataandcrowd/akl-ped-counts and keep it visible while you read this chapter. The discussion below points at specific files. The aim is to demystify a real package, not to memorise it.

You can also install it now and try it.

pip install akl-ped-counts
from akl_ped_counts import load_hourly, load_locations, list_sensors

print(len(list_sensors()))      # 21
counts = load_hourly()
print(counts.shape)              # (61367, 24)
locations = load_locations()
print(locations.head())

That is the whole user-facing API. Three functions, one short line each.

A tour of the repository

The folder layout follows the patterns from Chapter 4.1.

akl-ped-counts/
├── pyproject.toml
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── examples/
│   ├── march_trajectories.py
│   ├── march_trajectories.png
│   ├── heatmap_hour_dow.png
│   ├── above_below_average.png
│   └── sensor_map.html
└── src/
    └── akl_ped_counts/
        ├── __init__.py
        ├── loader.py
        ├── polars_loader.py
        ├── py.typed
        └── data/
            ├── hourly_counts.csv
            ├── locations.csv
            └── missing_data_report.json

Three things to notice.

  • src/akl_ped_counts/ is the Python package. The folder name uses underscores (Python identifier), while the package name on PyPI uses a hyphen (akl-ped-counts).
  • src/akl_ped_counts/data/ ships the CSV files inside the package because they are small (around 8 MB). For larger datasets you would download on first use instead.
  • examples/ holds standalone scripts and image outputs that show what the package can do.

A look at pyproject.toml

The configuration file is short. Open it in the repo. The important fields, with light commentary, look like this.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "akl-ped-counts"
version = "0.1.1"
description = "Hourly pedestrian count data from Heart of the City Auckland..."
readme = "README.md"
license = "CC-BY-4.0"
requires-python = ">=3.9"
authors = [
    { name = "Hyesop Shin", email = "shinhyesop@gmail.com" },
]
dependencies = [
    "pandas>=1.5.0",
]

The runtime requirement is small, only pandas. Everything else is optional.

[project.optional-dependencies]
polars = ["polars>=0.20.0"]
plot = ["matplotlib>=3.5.0", "seaborn>=0.12.0"]
geo = ["folium>=0.14.0"]
all = ["polars>=0.20.0", "matplotlib>=3.5.0", "seaborn>=0.12.0", "folium>=0.14.0"]

A user who just wants the data installs pip install akl-ped-counts. Someone who also wants the plotting scripts installs pip install "akl-ped-counts[plot]". Someone who wants everything installs pip install "akl-ped-counts[all]". This pattern keeps the core install lightweight.

The hatch build target tells the build tool to bundle the CSV files in data/ along with the Python code.

[tool.hatch.build.targets.wheel]
packages = ["src/akl_ped_counts"]

A look at the public API in __init__.py

The package re-exports its main functions so users can write from akl_ped_counts import load_hourly rather than from akl_ped_counts.loader import load_hourly. This is exactly the pattern shown in Chapter 4.1.

A look at one function in loader.py

The simplest function is list_sensors. Open src/akl_ped_counts/loader.py in the repo and find it. It is roughly this.

SENSORS: list[str] = [
    "107 Quay Street",
    "188 Quay Street Lower Albert (EW)",
    # ... 19 more entries ...
    "183 K Road",
]


def list_sensors() -> list[str]:
    """Return the canonical list of all 21 sensor location names.

    Returns
    -------
    list of str
        Sensor location names in geographic order (north to south).

    Examples
    --------
    >>> from akl_ped_counts import list_sensors
    >>> sensors = list_sensors()
    >>> len(sensors)
    21
    """
    return SENSORS.copy()

The function does almost nothing. It returns a copy of a constant list. What makes it a good function to ship is the surrounding work.

  • The list itself is curated and corrected (for example, “59 High Stret” was fixed to “59 High Street”).
  • The order is meaningful (north to south).
  • The docstring tells the user what to expect, including the count.
  • The return type is annotated.

This is the level of polish to aim for in your own package.

A look at the data loading function

load_hourly is the workhorse. It accepts optional filters for years and sensors, returns a clean pandas DataFrame, and uses importlib.resources to find the CSV file inside the installed package.

def load_hourly(years=None, sensors=None, dropna=False):
    """Load hourly pedestrian counts.

    Returns a DataFrame with columns date, hour, year, plus one
    column per sensor location.
    """
    path = _data_path("hourly_counts.csv")
    df = pd.read_csv(path, parse_dates=["date"])

    if years is not None:
        df = df[df["year"].isin(years)]
    if sensors is not None:
        df = df[["date", "hour", "year"] + list(sensors)]
    if dropna:
        df = df.dropna()

    return df

(See the actual code in the repo for the full version with type hints and validation.)

Two patterns to copy.

  • The defaults work without arguments. A new user can run load_hourly() and get something useful.
  • The same function handles common filters. Years and sensors are common slices, so they live in the function rather than forcing every user to write the same boilerplate.

How the package is built and published

The full instructions are in CONTRIBUTING.md in the repo. The short version, which mirrors what you will do in Lab Week 10 and Week 12.

# 1. Build
uv build
# Produces dist/akl_ped_counts-X.Y.Z-py3-none-any.whl and a .tar.gz

# 2. Publish to TestPyPI first
uv publish --publish-url https://test.pypi.org/legacy/

# 3. Verify the install in a clean environment
uv venv /tmp/test-install
source /tmp/test-install/bin/activate
uv pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple/ \
    akl-ped-counts

# 4. Once that works, publish to real PyPI
uv publish

The two-step rehearsal (TestPyPI first, then PyPI) is the same pattern you should follow for Assignment 3.

What to copy and what to adapt

When you start your own package, the following parts of akl-ped-counts are worth copying directly.

  • The src/ layout. No surprises.
  • The thin __init__.py that re-exports the public functions.
  • The optional dependencies pattern. Keep the core install small.
  • The README structure. Install command, quick start, API reference, examples, data sources, and licence.

Things you will adapt for your own work include the following.

  • The function set. Your package will solve a different problem, so the functions will be different.
  • The data. If your data is large, download it on first use rather than shipping it inside the package.
  • The licence. akl-ped-counts uses CC BY 4.0 because it ships data. A package that ships only code typically uses MIT or BSD-3-Clause.

A reading exercise

Spend 20 minutes with the repo and answer these questions in your own notes.

  1. What does the [project.urls] section in pyproject.toml do?
  2. Why does the data loading function use importlib.resources rather than a hard-coded file path?
  3. The polars_loader.py mirrors loader.py but returns Polars DataFrames. Why might you ship both?
  4. The README links a heatmap image (examples/heatmap_hour_dow.png). Where in the repo is the script that produced it, and what would you change to apply the same heatmap to a different sensor?

These are the same questions you will face when designing your own package. Working through them on a real repo is faster than working through them in the abstract.

Summary

You have walked through a real, published Python package. The patterns from Chapter 4.1 (the src/ layout, pyproject.toml, a thin __init__.py) and from Chapter 4.2 (small, focused functions with docstrings) are all there.

For your own Assignment 3, model your scope and structure on akl-ped-counts. Aim for one clear purpose, a small core, and a short README that someone can read in 30 seconds.

Next. In the next chapter on Publishing to PyPI, we walk through the production publication step in more detail.

Further reading

  • akl-ped-counts source, https://github.com/dataandcrowd/akl-ped-counts
  • akl-ped-counts on PyPI, https://pypi.org/project/akl-ped-counts/
  • CONTRIBUTING.md (build and publish steps), https://github.com/dataandcrowd/akl-ped-counts/blob/main/CONTRIBUTING.md
  • Hatchling build backend, https://hatch.pypa.io/latest/
  • importlib.resources, https://docs.python.org/3/library/importlib.resources.html