4.1 Package fundamentals: structure, configuration, and building
What this chapter is for
A Python package is a folder of code that you (and others) can install with pip install. This chapter walks you through the smallest possible example, end to end, in about 30 minutes. Once that works, the rest of the chapter is a short reference you can come back to.
By the end you will be able to do the following.
- Create a small package called
geohello-yourname. - Add one function to it.
- Build it on your own machine.
- Install it locally and import it from another folder.
Everything else, optional dependencies, advanced configuration, CI/CD, comes later. Today is about the minimum that works.
Quick start in five steps
We will use uv for this. If you have already used uv earlier in the course, this should feel familiar.
Step 1. Initialise
Pick a working folder and run the following.
mkdir geohello-yourname
uv init --lib --name geohello-yourname .You will see a folder that looks like this.
geohello-yourname/
├── src/
│ └── geohello_yourname/
│ ├── __init__.py
│ └── py.typed
├── tests/
├── pyproject.toml
└── README.md
Step 2. Write one function
Open src/geohello_yourname/__init__.py in your editor and replace it with the following.
"""A simple geospatial greeting package."""
__version__ = "0.1.0"
def hello(place="Auckland"):
"""Return a greeting for a place."""
return f"Hello from {place}!"That is the entire package. One function, one docstring, one version string.
Step 3. Look at pyproject.toml
uv init --lib already wrote a working pyproject.toml for you. Open it. It looks roughly like this.
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "geohello-yourname"
version = "0.1.0"
description = "A simple geospatial greeting"
readme = "README.md"
requires-python = ">=3.10"
dependencies = []You only need to know two things about this file right now. The [project] section describes what your package is. The [build-system] section tells uv how to build it. We will revisit the other fields in later chapters.
Step 4. Build the package
Run this in the project root.
uv buildYou should see two new files inside a dist/ folder.
dist/
├── geohello_yourname-0.1.0-py3-none-any.whl
└── geohello_yourname-0.1.0.tar.gz
The .whl file is the wheel, a pre-built file that installs quickly. The .tar.gz file is the source distribution, which contains the source code as a fallback. For a pure Python package like this, the wheel is what most people will install.
Step 5. Try it locally
Two quick checks. Both should print Hello from Auckland!.
# Run the function directly with uv
uv run python -c "from geohello_yourname import hello; print(hello())"# Or open an interactive Python session
uv run python
>>> from geohello_yourname import hello
>>> print(hello())
Hello from Auckland!
>>> print(hello("Tokyo"))
Hello from Tokyo!That is a complete package. You wrote one function, you built it, and you ran it.
What just happened
The uv build step did three things.
- It read
pyproject.tomland worked out the package name and version. - It bundled the contents of
src/geohello_yourname/into a wheel file. - It produced a source distribution as a backup.
Anyone with the wheel file can now install it with pip install path/to/geohello_yourname-0.1.0-py3-none-any.whl. In Week 10 we will upload this to TestPyPI so the install step becomes pip install geohello-yourname from anywhere.
Quick reference: the parts of a package
You do not need to memorise this. Use it later as a checklist.
Module vs package
A module is a single Python file.
# metrics.py is a module
def calculate_density(gdf):
passA package is a folder of modules with an __init__.py file.
geohello_yourname/
├── __init__.py
├── data.py
└── metrics.py
The src/ layout
Modern Python packaging puts the package inside an src/ folder.
geohello-yourname/
├── src/
│ └── geohello_yourname/
│ ├── __init__.py
│ └── core.py
├── tests/
├── pyproject.toml
└── README.md
Why bother with src/? It forces tests to use the installed package rather than accidentally importing from the local folder, which catches bugs earlier. You do not need to remember why right now. Just trust that uv init --lib does the right thing.
Naming conventions
There are two names to be aware of.
- The PyPI name in
pyproject.toml, written with hyphens (geohello-yourname). - The Python package name, the folder under
src/, written with underscores (geohello_yourname).
PyPI name -> pip install geohello-yourname
Python name -> from geohello_yourname import hello
uv init --lib --name geohello-yourname . sets both of these correctly.
What goes in __init__.py
The smallest version is a docstring and a version string.
"""A simple geospatial greeting package."""
__version__ = "0.1.0"Once your package has multiple modules, you can re-export the most common ones so users do not have to type long imports.
"""A small Auckland GIS toolkit."""
__version__ = "0.1.0"
from .data import load_sa2
from .metrics import calculate_density
__all__ = ["load_sa2", "calculate_density"]Then a user can write from your_package import load_sa2 rather than from your_package.data import load_sa2.
Going a little further: organising modules
When your package grows beyond one function, split it into modules by purpose.
src/
└── your_package/
├── __init__.py
├── data.py # loading and validation
├── metrics.py # calculations
└── viz.py # plotting
Each file should have one clear purpose. A common mistake is to call modules things like utilities.py or helpers.py, which tend to attract every leftover function and become impossible to navigate. Prefer names that describe the topic.
Where does the data live?
Almost every Assignment 3 package needs data. Real Auckland data, not synthetic toys. There are two patterns to choose from, and most packages will use both.
The first is to bundle the data inside the package. This is what akl-ped-counts does with its 8 MB of pedestrian count CSVs. Good for small reference data, sensor lists, schemas, lookup tables, and anything under about 10 MB.
The second is to fetch the data from an API on first use, then cache locally. This is what most A3 briefs will need for SA2 boundaries, NZDep scores, LINZ canopy layers, and similar Stats NZ or LINZ datasets that are too large to ship inside a wheel.
Pattern 1: bundling small data
The convention is a data/ folder inside the package, not at the top level of the repository.
your-package/
├── pyproject.toml
├── README.md
└── src/
└── your_package/
├── __init__.py
├── loader.py
└── data/
├── sensors.csv
└── zones.geojson
This is exactly where akl-ped-counts puts hourly_counts.csv and locations.csv. Same idea, same place.
To find the file at runtime, use importlib.resources rather than a hard-coded path. The path of the data file changes when the package is installed (it ends up inside the user’s site-packages folder), so you cannot rely on "data/sensors.csv" working.
from importlib import resources
import geopandas as gpd
import pandas as pd
def _data_path(filename: str) -> str:
"""Resolve a path to a bundled data file."""
ref = resources.files("your_package") / "data" / filename
return str(ref)
def load_sensors():
return pd.read_csv(_data_path("sensors.csv"))
def load_zones():
return gpd.read_file(_data_path("zones.geojson"))This works whether the package is installed from PyPI, from a wheel, or in editable mode (uv pip install -e .).
There is one more step that catches almost everyone the first time. Bundled data does not ship by default. You must tell the build tool to include it. With hatchling (which uv init --lib configures by default), add the following to pyproject.toml.
[tool.hatch.build.targets.sdist]
include = [
"src/your_package/**",
"README.md",
"LICENSE",
]
[tool.hatch.build.targets.wheel]
packages = ["src/your_package"]The ** glob picks up everything under src/your_package/, including the data/ folder. Without these lines, your CSVs are missing from the wheel. pip install looks fine, but load_sensors() raises FileNotFoundError at runtime.
After running uv build, sanity-check by unzipping the .whl file and confirming that the data/ folder is inside.
unzip -l dist/your_package-0.1.0-py3-none-any.whl
# Look for src/your_package/data/sensors.csv in the listing.When to bundle, in plain language. Yes, bundle small reference data: sensor metadata, NZDep2018 scores (a few hundred KB), schema files, static lookup tables. Maybe, bundle a CSV under 10 MB that does not change between releases. No, do not bundle anything over 10 MB, especially SA2 boundary GeoPackages or LINZ shapefiles. Fetch those instead.
Pattern 2: fetching from the Stats NZ API
Stats NZ runs a Koordinates-based portal at https://datafinder.stats.govt.nz/. It exposes Web Feature Service (WFS) endpoints for vector layers, which means you can download a layer with a single HTTP call from your package.
The trade-off is one API call on first use. The benefit is a tiny package and always-fresh data.
You will need an API key. Sign up at the Datafinder, go to your profile, then “API keys”, and create one. The free tier is plenty for any A3 package.
Never hard-code the key in your package source. Use an environment variable instead.
import os
def _get_api_key():
key = os.getenv("STATS_NZ_API_KEY")
if key is None:
raise EnvironmentError(
"Set STATS_NZ_API_KEY in your environment "
"or .env file before calling this function."
)
return keyUsers (and you) then run with the key set in the shell.
export STATS_NZ_API_KEY=4f1a...
uv run python -c "from your_package import load_sa2_auckland; load_sa2_auckland()"The download itself is a single GET request to the WFS endpoint. The example below pulls Auckland SA2 boundaries for a CBD bounding box.
import requests
import geopandas as gpd
from io import BytesIO
WFS_URL = (
"https://datafinder.stats.govt.nz/services;"
"key={key}/wfs"
)
def load_sa2_auckland(api_key=None):
"""Download Auckland SA2 boundaries from Stats NZ."""
api_key = api_key or _get_api_key()
params = {
"service": "WFS",
"version": "2.0.0",
"request": "GetFeature",
"typeNames": "layer-111228", # SA2 2023 generalised
"outputFormat": "json",
"srsName": "EPSG:2193",
"bbox": "1740000,5900000,1790000,5950000,EPSG:2193",
}
response = requests.get(
WFS_URL.format(key=api_key),
params=params,
timeout=30,
)
response.raise_for_status()
return gpd.read_file(BytesIO(response.content))Calling this function once works. Calling it ten times is wasteful. Add a small cache so the API is only hit on first use.
from pathlib import Path
import geopandas as gpd
CACHE_DIR = Path.home() / ".cache" / "your_package"
def load_sa2_cached(api_key=None):
CACHE_DIR.mkdir(parents=True, exist_ok=True)
cache_file = CACHE_DIR / "sa2_auckland.gpkg"
if cache_file.exists():
return gpd.read_file(cache_file)
gdf = load_sa2_auckland(api_key)
gdf.to_file(cache_file, driver="GPKG")
return gdfThe first call hits the API and writes a GeoPackage to ~/.cache/your_package/. Every later call reads the local file. Your demo notebook stays fast and the user only needs network access once.
The same pattern works for the LINZ Data Service (https://data.linz.govt.nz/) by changing the host and the layer ID.
Choosing the pattern for your A3 brief
Each brief leans towards one pattern, sometimes both.
equitransport benefits from bundling NZDep2018 scores (a few hundred KB) and fetching SA2 geometries from Stats NZ. treecrown-nz and escooter-akl will mostly fetch (LINZ canopy, e-scooter trip CSVs are too large to bundle). pedestrian-exposure will fetch routes from OSMnx and bundle a small fixture route for tests. pavescore will fetch images from Mapillary at runtime and bundle a single small fixture image for tests.
The rule of thumb is to pick the lightest option that works. A package that ships 200 MB of shapefiles is a package nobody installs.
Adding dependencies
If your function uses geopandas, you must list it in pyproject.toml so that pip install brings it in too.
[project]
dependencies = [
"geopandas>=0.14.0",
"pandas>=2.0.0",
]You can also separate optional dependencies, which users only install if they want a particular feature. The akl-ped-counts package does this for plotting and mapping.
[project.optional-dependencies]
plot = ["matplotlib>=3.7.0", "seaborn>=0.13.0"]
geo = ["folium>=0.15.0"]
all = ["akl-ped-counts[plot,geo]"]A user can then choose pip install your-package[plot] rather than installing every dependency.
You do not need any of this for geohello-yourname. We are listing it here so you know it exists when you start your real Assignment 3 package.
Common build issues and fixes
ImportError: No module named your_package. Check that the folder under src/ exists and contains an __init__.py. The folder name must match the import name (with underscores).
Wrong package name. The name field in pyproject.toml uses hyphens. The folder under src/ uses underscores. They look different on purpose.
Missing dependencies at install time. Add the missing package to the dependencies list in pyproject.toml and rebuild.
FileNotFoundError on a bundled CSV after install. Your data/ folder is missing from the wheel. Add the [tool.hatch.build.targets.wheel] and [tool.hatch.build.targets.sdist] blocks shown above and rebuild. Confirm by unzipping the .whl and looking for the file.
STATS_NZ_API_KEY is None at runtime. The environment variable is not set in the shell that runs the function. Run export STATS_NZ_API_KEY=... before uv run (or use a .env file plus a loader like python-dotenv).
If something else goes wrong, run uv build again and read the first error message carefully. The most common fix is a small typo in pyproject.toml.
Week 9 lab
The Week 9 lab is essentially what you just did, but with screenshots, time limits, and a short reflection. Bring the geohello-yourname folder you created above and we will polish it together.
In Week 10 you will add tests, write a README, and publish to TestPyPI. In Week 11 you will use the poster showcase to explain the design of your real Assignment 3 package. In Week 12 you will publish that real package to PyPI.
Summary
You have learnt how to do the following.
- Create a small package with
uv init --lib --name geohello-yourname .. - Add one function inside
src/your_package/__init__.py. - Build it with
uv build. - Install it locally and use it.
You have also met the parts of pyproject.toml and the difference between PyPI and Python package names. That is enough to move on.
Next steps. In the next chapter on Testing and Code Quality, you will learn to test, document, and publish to TestPyPI.
Further reading
- Python Packaging Guide, https://packaging.python.org/
- PEP 621 (
pyproject.toml), https://peps.python.org/pep-0621/ - uv documentation, https://docs.astral.sh/uv/
src/layout explanation, https://hynek.me/articles/testing-packaging/- Semantic Versioning, https://semver.org/
importlib.resources, https://docs.python.org/3/library/importlib.resources.html- Hatchling build configuration, https://hatch.pypa.io/latest/config/build/
- Stats NZ Datafinder, https://datafinder.stats.govt.nz/
- LINZ Data Service, https://data.linz.govt.nz/
Practise exercises
- Rename your package. Change
geohello-yournameto a name of your choice. Updatepyproject.tomland the folder undersrc/. Rebuild and check it still works. - Add a second function. Write
bye(place="Auckland")that returns"Goodbye from Auckland!". Re-export it from__init__.pyand try it. - Add a dependency. Add
pandastodependenciesinpyproject.toml. Rebuild. Confirmpandasinstalls alongside your package. - Bundle a small CSV. Add a
data/folder undersrc/your_package/containing a tinysensors.csv(3 to 5 rows). Add aload_sensors()function that usesimportlib.resourcesto read it. Add the hatchling build targets shown in this chapter, rebuild, and confirm the CSV is inside the wheel. - Fetch from Stats NZ. Sign up at https://datafinder.stats.govt.nz/, generate an API key, and add a
load_sa2_auckland()function based on the WFS example. SetSTATS_NZ_API_KEYin your shell and confirm the function returns a sensible GeoDataFrame. - Add caching. Wrap your Stats NZ download with the cache pattern shown in this chapter. Run the function twice and confirm the second call is much faster.
- Read a real one. Open the
akl-ped-countsrepo at https://github.com/dataandcrowd/akl-ped-counts and find itspyproject.toml. Compare it to yours. What is the same? What is different? Pay particular attention to the[tool.hatch.build.targets]blocks.