4.1 Package fundamentals: structure, configuration, and building

What this chapter is for

A Python package is a folder of code that you (and others) can install with pip install. This chapter walks you through the smallest possible example, end to end, in about 30 minutes. Once that works, the rest of the chapter is a short reference you can come back to.

By the end you will be able to do the following.

  1. Create a small package called geohello-yourname.
  2. Add one function to it.
  3. Build it on your own machine.
  4. Install it locally and import it from another folder.

Everything else, optional dependencies, advanced configuration, CI/CD, comes later. Today is about the minimum that works.

Quick start in five steps

We will use uv for this. If you have already used uv earlier in the course, this should feel familiar.

Step 1. Initialise

Pick a working folder and run the following.

mkdir geohello-yourname
uv init --lib --name geohello-yourname .

You will see a folder that looks like this.

geohello-yourname/
├── src/
│   └── geohello_yourname/
│       ├── __init__.py
│       └── py.typed
├── tests/
├── pyproject.toml
└── README.md

Step 2. Write one function

Open src/geohello_yourname/__init__.py in your editor and replace it with the following.

"""A simple geospatial greeting package."""

__version__ = "0.1.0"


def hello(place="Auckland"):
    """Return a greeting for a place."""
    return f"Hello from {place}!"

That is the entire package. One function, one docstring, one version string.

Step 3. Look at pyproject.toml

uv init --lib already wrote a working pyproject.toml for you. Open it. It looks roughly like this.

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "geohello-yourname"
version = "0.1.0"
description = "A simple geospatial greeting"
readme = "README.md"
requires-python = ">=3.10"
dependencies = []

You only need to know two things about this file right now. The [project] section describes what your package is. The [build-system] section tells uv how to build it. We will revisit the other fields in later chapters.

Step 4. Build the package

Run this in the project root.

uv build

You should see two new files inside a dist/ folder.

dist/
├── geohello_yourname-0.1.0-py3-none-any.whl
└── geohello_yourname-0.1.0.tar.gz

The .whl file is the wheel, a pre-built file that installs quickly. The .tar.gz file is the source distribution, which contains the source code as a fallback. For a pure Python package like this, the wheel is what most people will install.

Step 5. Try it locally

Two quick checks. Both should print Hello from Auckland!.

# Run the function directly with uv
uv run python -c "from geohello_yourname import hello; print(hello())"
# Or open an interactive Python session
uv run python
>>> from geohello_yourname import hello
>>> print(hello())
Hello from Auckland!
>>> print(hello("Tokyo"))
Hello from Tokyo!

That is a complete package. You wrote one function, you built it, and you ran it.

What just happened

The uv build step did three things.

  1. It read pyproject.toml and worked out the package name and version.
  2. It bundled the contents of src/geohello_yourname/ into a wheel file.
  3. It produced a source distribution as a backup.

Anyone with the wheel file can now install it with pip install path/to/geohello_yourname-0.1.0-py3-none-any.whl. In Week 10 we will upload this to TestPyPI so the install step becomes pip install geohello-yourname from anywhere.

Quick reference: the parts of a package

You do not need to memorise this. Use it later as a checklist.

Module vs package

A module is a single Python file.

# metrics.py is a module
def calculate_density(gdf):
    pass

A package is a folder of modules with an __init__.py file.

geohello_yourname/
├── __init__.py
├── data.py
└── metrics.py

The src/ layout

Modern Python packaging puts the package inside an src/ folder.

geohello-yourname/
├── src/
│   └── geohello_yourname/
│       ├── __init__.py
│       └── core.py
├── tests/
├── pyproject.toml
└── README.md

Why bother with src/? It forces tests to use the installed package rather than accidentally importing from the local folder, which catches bugs earlier. You do not need to remember why right now. Just trust that uv init --lib does the right thing.

Naming conventions

There are two names to be aware of.

  • The PyPI name in pyproject.toml, written with hyphens (geohello-yourname).
  • The Python package name, the folder under src/, written with underscores (geohello_yourname).
PyPI name      ->  pip install geohello-yourname
Python name    ->  from geohello_yourname import hello

uv init --lib --name geohello-yourname . sets both of these correctly.

What goes in __init__.py

The smallest version is a docstring and a version string.

"""A simple geospatial greeting package."""

__version__ = "0.1.0"

Once your package has multiple modules, you can re-export the most common ones so users do not have to type long imports.

"""A small Auckland GIS toolkit."""

__version__ = "0.1.0"

from .data import load_sa2
from .metrics import calculate_density

__all__ = ["load_sa2", "calculate_density"]

Then a user can write from your_package import load_sa2 rather than from your_package.data import load_sa2.

Going a little further: organising modules

When your package grows beyond one function, split it into modules by purpose.

src/
└── your_package/
    ├── __init__.py
    ├── data.py        # loading and validation
    ├── metrics.py     # calculations
    └── viz.py         # plotting

Each file should have one clear purpose. A common mistake is to call modules things like utilities.py or helpers.py, which tend to attract every leftover function and become impossible to navigate. Prefer names that describe the topic.

Where does the data live?

Almost every Assignment 3 package needs data. Real Auckland data, not synthetic toys. There are two patterns to choose from, and most packages will use both.

The first is to bundle the data inside the package. This is what akl-ped-counts does with its 8 MB of pedestrian count CSVs. Good for small reference data, sensor lists, schemas, lookup tables, and anything under about 10 MB.

The second is to fetch the data from an API on first use, then cache locally. This is what most A3 briefs will need for SA2 boundaries, NZDep scores, LINZ canopy layers, and similar Stats NZ or LINZ datasets that are too large to ship inside a wheel.

Pattern 1: bundling small data

The convention is a data/ folder inside the package, not at the top level of the repository.

your-package/
├── pyproject.toml
├── README.md
└── src/
    └── your_package/
        ├── __init__.py
        ├── loader.py
        └── data/
            ├── sensors.csv
            └── zones.geojson

This is exactly where akl-ped-counts puts hourly_counts.csv and locations.csv. Same idea, same place.

To find the file at runtime, use importlib.resources rather than a hard-coded path. The path of the data file changes when the package is installed (it ends up inside the user’s site-packages folder), so you cannot rely on "data/sensors.csv" working.

from importlib import resources
import geopandas as gpd
import pandas as pd


def _data_path(filename: str) -> str:
    """Resolve a path to a bundled data file."""
    ref = resources.files("your_package") / "data" / filename
    return str(ref)


def load_sensors():
    return pd.read_csv(_data_path("sensors.csv"))


def load_zones():
    return gpd.read_file(_data_path("zones.geojson"))

This works whether the package is installed from PyPI, from a wheel, or in editable mode (uv pip install -e .).

There is one more step that catches almost everyone the first time. Bundled data does not ship by default. You must tell the build tool to include it. With hatchling (which uv init --lib configures by default), add the following to pyproject.toml.

[tool.hatch.build.targets.sdist]
include = [
    "src/your_package/**",
    "README.md",
    "LICENSE",
]

[tool.hatch.build.targets.wheel]
packages = ["src/your_package"]

The ** glob picks up everything under src/your_package/, including the data/ folder. Without these lines, your CSVs are missing from the wheel. pip install looks fine, but load_sensors() raises FileNotFoundError at runtime.

After running uv build, sanity-check by unzipping the .whl file and confirming that the data/ folder is inside.

unzip -l dist/your_package-0.1.0-py3-none-any.whl
# Look for src/your_package/data/sensors.csv in the listing.

When to bundle, in plain language. Yes, bundle small reference data: sensor metadata, NZDep2018 scores (a few hundred KB), schema files, static lookup tables. Maybe, bundle a CSV under 10 MB that does not change between releases. No, do not bundle anything over 10 MB, especially SA2 boundary GeoPackages or LINZ shapefiles. Fetch those instead.

Pattern 2: fetching from the Stats NZ API

Stats NZ runs a Koordinates-based portal at https://datafinder.stats.govt.nz/. It exposes Web Feature Service (WFS) endpoints for vector layers, which means you can download a layer with a single HTTP call from your package.

The trade-off is one API call on first use. The benefit is a tiny package and always-fresh data.

You will need an API key. Sign up at the Datafinder, go to your profile, then “API keys”, and create one. The free tier is plenty for any A3 package.

Never hard-code the key in your package source. Use an environment variable instead.

import os


def _get_api_key():
    key = os.getenv("STATS_NZ_API_KEY")
    if key is None:
        raise EnvironmentError(
            "Set STATS_NZ_API_KEY in your environment "
            "or .env file before calling this function."
        )
    return key

Users (and you) then run with the key set in the shell.

export STATS_NZ_API_KEY=4f1a...
uv run python -c "from your_package import load_sa2_auckland; load_sa2_auckland()"

The download itself is a single GET request to the WFS endpoint. The example below pulls Auckland SA2 boundaries for a CBD bounding box.

import requests
import geopandas as gpd
from io import BytesIO

WFS_URL = (
    "https://datafinder.stats.govt.nz/services;"
    "key={key}/wfs"
)


def load_sa2_auckland(api_key=None):
    """Download Auckland SA2 boundaries from Stats NZ."""
    api_key = api_key or _get_api_key()
    params = {
        "service": "WFS",
        "version": "2.0.0",
        "request": "GetFeature",
        "typeNames": "layer-111228",       # SA2 2023 generalised
        "outputFormat": "json",
        "srsName": "EPSG:2193",
        "bbox": "1740000,5900000,1790000,5950000,EPSG:2193",
    }
    response = requests.get(
        WFS_URL.format(key=api_key),
        params=params,
        timeout=30,
    )
    response.raise_for_status()
    return gpd.read_file(BytesIO(response.content))

Calling this function once works. Calling it ten times is wasteful. Add a small cache so the API is only hit on first use.

from pathlib import Path
import geopandas as gpd

CACHE_DIR = Path.home() / ".cache" / "your_package"


def load_sa2_cached(api_key=None):
    CACHE_DIR.mkdir(parents=True, exist_ok=True)
    cache_file = CACHE_DIR / "sa2_auckland.gpkg"

    if cache_file.exists():
        return gpd.read_file(cache_file)

    gdf = load_sa2_auckland(api_key)
    gdf.to_file(cache_file, driver="GPKG")
    return gdf

The first call hits the API and writes a GeoPackage to ~/.cache/your_package/. Every later call reads the local file. Your demo notebook stays fast and the user only needs network access once.

The same pattern works for the LINZ Data Service (https://data.linz.govt.nz/) by changing the host and the layer ID.

Choosing the pattern for your A3 brief

Each brief leans towards one pattern, sometimes both.

equitransport benefits from bundling NZDep2018 scores (a few hundred KB) and fetching SA2 geometries from Stats NZ. treecrown-nz and escooter-akl will mostly fetch (LINZ canopy, e-scooter trip CSVs are too large to bundle). pedestrian-exposure will fetch routes from OSMnx and bundle a small fixture route for tests. pavescore will fetch images from Mapillary at runtime and bundle a single small fixture image for tests.

The rule of thumb is to pick the lightest option that works. A package that ships 200 MB of shapefiles is a package nobody installs.

Adding dependencies

If your function uses geopandas, you must list it in pyproject.toml so that pip install brings it in too.

[project]
dependencies = [
    "geopandas>=0.14.0",
    "pandas>=2.0.0",
]

You can also separate optional dependencies, which users only install if they want a particular feature. The akl-ped-counts package does this for plotting and mapping.

[project.optional-dependencies]
plot = ["matplotlib>=3.7.0", "seaborn>=0.13.0"]
geo = ["folium>=0.15.0"]
all = ["akl-ped-counts[plot,geo]"]

A user can then choose pip install your-package[plot] rather than installing every dependency.

You do not need any of this for geohello-yourname. We are listing it here so you know it exists when you start your real Assignment 3 package.

Common build issues and fixes

ImportError: No module named your_package. Check that the folder under src/ exists and contains an __init__.py. The folder name must match the import name (with underscores).

Wrong package name. The name field in pyproject.toml uses hyphens. The folder under src/ uses underscores. They look different on purpose.

Missing dependencies at install time. Add the missing package to the dependencies list in pyproject.toml and rebuild.

FileNotFoundError on a bundled CSV after install. Your data/ folder is missing from the wheel. Add the [tool.hatch.build.targets.wheel] and [tool.hatch.build.targets.sdist] blocks shown above and rebuild. Confirm by unzipping the .whl and looking for the file.

STATS_NZ_API_KEY is None at runtime. The environment variable is not set in the shell that runs the function. Run export STATS_NZ_API_KEY=... before uv run (or use a .env file plus a loader like python-dotenv).

If something else goes wrong, run uv build again and read the first error message carefully. The most common fix is a small typo in pyproject.toml.

Week 9 lab

The Week 9 lab is essentially what you just did, but with screenshots, time limits, and a short reflection. Bring the geohello-yourname folder you created above and we will polish it together.

In Week 10 you will add tests, write a README, and publish to TestPyPI. In Week 11 you will use the poster showcase to explain the design of your real Assignment 3 package. In Week 12 you will publish that real package to PyPI.

Summary

You have learnt how to do the following.

  1. Create a small package with uv init --lib --name geohello-yourname ..
  2. Add one function inside src/your_package/__init__.py.
  3. Build it with uv build.
  4. Install it locally and use it.

You have also met the parts of pyproject.toml and the difference between PyPI and Python package names. That is enough to move on.

Next steps. In the next chapter on Testing and Code Quality, you will learn to test, document, and publish to TestPyPI.

Further reading

  • Python Packaging Guide, https://packaging.python.org/
  • PEP 621 (pyproject.toml), https://peps.python.org/pep-0621/
  • uv documentation, https://docs.astral.sh/uv/
  • src/ layout explanation, https://hynek.me/articles/testing-packaging/
  • Semantic Versioning, https://semver.org/
  • importlib.resources, https://docs.python.org/3/library/importlib.resources.html
  • Hatchling build configuration, https://hatch.pypa.io/latest/config/build/
  • Stats NZ Datafinder, https://datafinder.stats.govt.nz/
  • LINZ Data Service, https://data.linz.govt.nz/

Practise exercises

  1. Rename your package. Change geohello-yourname to a name of your choice. Update pyproject.toml and the folder under src/. Rebuild and check it still works.
  2. Add a second function. Write bye(place="Auckland") that returns "Goodbye from Auckland!". Re-export it from __init__.py and try it.
  3. Add a dependency. Add pandas to dependencies in pyproject.toml. Rebuild. Confirm pandas installs alongside your package.
  4. Bundle a small CSV. Add a data/ folder under src/your_package/ containing a tiny sensors.csv (3 to 5 rows). Add a load_sensors() function that uses importlib.resources to read it. Add the hatchling build targets shown in this chapter, rebuild, and confirm the CSV is inside the wheel.
  5. Fetch from Stats NZ. Sign up at https://datafinder.stats.govt.nz/, generate an API key, and add a load_sa2_auckland() function based on the WFS example. Set STATS_NZ_API_KEY in your shell and confirm the function returns a sensible GeoDataFrame.
  6. Add caching. Wrap your Stats NZ download with the cache pattern shown in this chapter. Run the function twice and confirm the second call is much faster.
  7. Read a real one. Open the akl-ped-counts repo at https://github.com/dataandcrowd/akl-ped-counts and find its pyproject.toml. Compare it to yours. What is the same? What is different? Pay particular attention to the [tool.hatch.build.targets] blocks.