4.2 Testing and code quality

What this chapter is for

In Week 9 you built a small package and ran it once. That is not enough. As soon as you change the code, or someone else uses it, you want a quick way to confirm the package still does what you said it did.

The simplest way to do that in Python is pytest. This chapter walks through the very basics. We do not try to cover every feature, just enough to write your first three tests and run them.

By the end you will be able to do the following.

  1. Add pytest to your package.
  2. Write three small tests.
  3. Run them with one command.
  4. Recognise the most common kinds of geospatial test.

The full step-by-step task list, including publishing to TestPyPI, is in Lab Week 10. Refer to that file when you sit down to work through the exercise.

What is pytest?

pytest is a small library that finds files starting with test_, runs every function inside them whose name starts with test_, and tells you which ones pass or fail. That is the whole idea.

You write a function. You add an assert line that says what should be true. If the assertion fails, the test fails.

def test_two_plus_two():
    assert 2 + 2 == 4

If you save that as tests/test_basics.py and run uv run pytest, you will see one passing test.

A simple module to test

Before writing tests, you need something worth testing. Open src/geohello_yourname/__init__.py and add the following function next to hello().

def filter_by_month(df, month: int):
    """Return rows for a given month (1-12)."""
    dates = pd.to_datetime(df["Date"], errors="coerce")
    return df[dates.dt.month == month]

The errors="coerce" is important. Real-world CSV files often contain non-date values (such as "Daylight Savings" or blank rows). Without it, pd.to_datetime will crash on those rows instead of skipping them.

Check that it works. Create a file called demo_filter.py in your project root.

# demo_filter.py
from geohello_yourname import load_data, filter_by_month

df = load_data()
print("All rows:", len(df))
print("March only:", len(filter_by_month(df, month=3)))
uv run python demo_filter.py

You now have two functions worth testing: hello() and filter_by_month().

What is an assertion?

An assertion confirms that a programme functions as intended by examining whether specific conditions are met during execution. Each test should have only one reason to fail or succeed.

assert 2 + 2 == 4        # passes — condition is True
assert 2 + 2 == 5        # fails  — condition is False

If assert passes, nothing happens. If it fails, pytest reports exactly which line failed and what the values were.

Try it now: a three-line test

Let’s add this to your geohello-yourname package from Week 9.

Step 1. Install pytest

uv add --dev pytest

This adds pytest to your development dependencies in pyproject.toml. After running the command, open pyproject.toml and you will see a new section near the bottom.

[dependency-groups]
dev = [
    "pytest>=8.0",
]

The --dev flag keeps pytest out of the main [project.dependencies] list. Users who install your package do not need pytest — only you (the developer) need it for running tests.

Step 2. Write one test

Create the file tests/test_geohello.py with this content.

"""Tests for geohello-yourname."""

from geohello_yourname import hello


def test_hello_default():
    result = hello()
    assert "Auckland" in result

The from geohello_yourname import hello line imports your function. The assert line checks that the result contains the word “Auckland”. That is the entire test.

Step 3. Run it

uv run pytest

You should see something like the following.

============================= test session starts ==============================
collected 1 item

tests/test_geohello.py .                                                 [100%]

============================== 1 passed in 0.05s ===============================

A single dot . next to the file name means one passing test. You can also run uv run pytest -v for verbose output, which shows each test name and its result instead of just a dot.

tests/test_geohello.py::test_hello_default PASSED                        [100%]

-v stands for verbose. Use it so you can see exactly which test passed or failed.

That is your first test. Everything else in this chapter is variations on this idea.

What happens when a test fails?

If your test had failed, you would see an F instead of a dot, with a clear explanation. For example:

def test_hello_wrong_city():
    result = hello()
    assert "Wellington" in result   # wrong on purpose
FAILED test_geohello.py::test_hello_wrong_city

    def test_hello_wrong_city():
        result = hello()
>       assert "Wellington" in result
E       AssertionError: assert 'Wellington' in 'Hello from Auckland!'

pytest shows the exact line, the expected value, and the actual value. No guessing.

Two more tests, slightly more interesting

def test_hello_custom_place():
    assert hello("Tokyo") == "Hello from Tokyo!"


def test_hello_returns_string():
    assert isinstance(hello(), str)

Add these to the same file and rerun uv run pytest. You should now see three dots and 3 passed.

Some informal guidance.

  • One assertion per test is a good default. If a test fails, the failure should tell you exactly what is wrong.
  • Test names should describe the behaviour, not the function. test_hello_default is better than test_function_1.
  • Run the tests often. They are cheap.

Why this matters more for geospatial code

Geospatial functions can fail in quiet, hard-to-spot ways. Three short examples.

A spatial join silently drops zones.

result = gpd.sjoin(boundaries, census, predicate='intersects')
# len(boundaries) was 100, len(result) is 95.
# No error is raised. You only notice when the map looks odd.

Distance is computed in the wrong CRS.

gdf_4326 = boundaries.to_crs('EPSG:4326')
distance = gdf_4326.geometry[0].distance(gdf_4326.geometry[1])
# Returns 0.05 (degrees), not 5000 (metres).

A negative buffer produces an empty geometry.

buffered = point.buffer(-100)
# buffered is empty. Downstream code computes zeros instead of failing.

Tests that you write once and run forever will catch these.

Three small geospatial tests you might write

You do not need to memorise these. They are templates you can copy when your package starts doing real spatial work.

A test that confirms the CRS.

def test_load_sa2_has_correct_crs():
    gdf = load_sa2('tests/fixtures/sample_sa2.gpkg')
    assert gdf.crs == 'EPSG:2193'

A test that confirms a join did not lose rows.

def test_join_keeps_all_zones():
    result = validate_join(boundaries, census, key='SA2_code')
    assert len(result) == len(boundaries)

A test that uses pytest.approx for floating-point area.

import pytest

def test_area_in_km2():
    poly = Polygon([(0, 0), (1000, 0), (1000, 1000), (0, 1000)])
    gdf = gpd.GeoDataFrame({'geometry': [poly]}, crs='EPSG:2193')
    area_km2 = gdf.geometry.area[0] / 1_000_000
    assert area_km2 == pytest.approx(1.0, rel=1e-3)

Floating-point comparisons need pytest.approx because computers cannot represent decimals exactly.

Two pytest features you will quickly appreciate

You do not need these for your first test, but they are useful as your package grows. Here is a quick guide to when each pattern is most useful.

Pattern When to use Example
CRS guard Your function loads or returns spatial data Check the output is in EPSG:2193
Fixture Multiple tests need the same input data Create a small DataFrame once, reuse in 5 tests
Parametrise Same logic, different inputs Test hello() with Auckland, Tokyo, Wellington

You pick the pattern after you know what your function takes in and gives back.

Fixtures. Reusable test data, set up once and shared across tests. Here is a fixture that creates a small DataFrame and uses it to test filter_by_month().

import pytest
import pandas as pd
from geohello_yourname import filter_by_month


@pytest.fixture
def sample_df():
    return pd.DataFrame({
        "Date": ["2024-01-15", "2024-01-20", "2024-03-10"],
        "Count": [100, 200, 150],
    })


def test_filter_returns_only_january(sample_df):
    result = filter_by_month(sample_df, month=1)
    assert len(result) == 2


def test_filter_empty_month(sample_df):
    result = filter_by_month(sample_df, month=7)
    assert len(result) == 0

When pytest sees sample_df as a parameter name, it looks for a fixture with that name and passes its return value into the test. You write the setup once and reuse it across as many tests as you like.

Parametrize. Run the same test with several inputs.

@pytest.mark.parametrize("place,expected", [
    ("Auckland", "Hello from Auckland!"),
    ("Tokyo", "Hello from Tokyo!"),
    ("Wellington", "Hello from Wellington!"),
])
def test_hello_for_many_places(place, expected):
    assert hello(place) == expected

These are optional. Use them when they help, ignore them otherwise.

Where the lab work lives

The full hands-on exercise for this week, including TestPyPI publication, is in Lab Week 10. The lab walks through testing, documentation, and the publish step in order. Bring your geohello-yourname package from Week 9 to the lab.

Real-world example: how akl-ped-counts is tested

The akl-ped-counts package on GitHub uses pytest in the same way you have just seen. Its tests check things like the following.

  • The CSV files load successfully and return the expected number of rows.
  • list_sensors() returns 21 sensor names.
  • Filtering by year returns rows only from that year.

Open the repo at https://github.com/dataandcrowd/akl-ped-counts and look at the tests/ folder. The tests are short, focused, and easy to read. That is what we are aiming for in your package too.

Documenting with docstrings

Tests confirm that code works. Docstrings explain what it does. NumPy style is the most common in scientific Python.

def calculate_density(gdf, pop_column='population'):
    """Calculate population density per square kilometre.

    Parameters
    ----------
    gdf : GeoDataFrame
        Must contain geometry in a projected CRS and a population column.
    pop_column : str, default 'population'
        Name of the column with population counts.

    Returns
    -------
    GeoDataFrame
        The input with a new 'density' column (people per km²).
    """

Three short sections, parameters, returns, and a one-line summary. You can add Examples, Raises, and Notes later if useful.

A short README

Your package needs a short README that someone arriving from PyPI can read in 30 seconds. The minimum is the install command, a quick start, and one short paragraph describing the package. Look at akl-ped-counts for a longer example you can model yours on.

Publishing to TestPyPI

TestPyPI is a safe practice version of the Python Package Index. Packages are deleted after six months, so mistakes do not matter. Do this before the real PyPI publish.

Create an account and API token

  1. Register at https://test.pypi.org/account/register/.
  2. Verify your email.
  3. Log in and click your username (top right) → Account settings.
  4. Scroll to API tokens → click “Add API token”.
  5. Token name: uv-publish (or anything you like).
  6. Scope: “Entire account”.
  7. Click “Create token” and copy it immediately — it starts with pypi- and is shown only once.

Build and publish

# 1. Make sure the version in pyproject.toml is unique
# 2. Build
uv build

# 3. Publish — paste your token when prompted for password
uv publish --publish-url https://test.pypi.org/legacy/

When prompted, enter __token__ as the username and paste your pypi-... token as the password. Alternatively, set an environment variable to skip the prompt.

export UV_PUBLISH_TOKEN=pypi-YOUR_TOKEN_HERE
uv publish --publish-url https://test.pypi.org/legacy/

Verify the install

uv venv /tmp/test-install
source /tmp/test-install/bin/activate

uv pip install \
    --index-url https://test.pypi.org/simple/ \
    --extra-index-url https://pypi.org/simple/ \
    geohello-yourname

The --extra-index-url is needed because dependencies like pandas and geopandas only live on real PyPI.

Common publication errors

  • The name is already taken. Choose a unique name (add your initials to the name field in pyproject.toml).
  • File already exists. Increment version in pyproject.toml.
  • Invalid or expired token. Generate a new TestPyPI token and try again.
  • Could not find a version that satisfies the requirement. Add --extra-index-url https://pypi.org/simple/ to the install command.

Summary

You have learnt how to do the following.

  1. Install pytest with uv add --dev pytest.
  2. Write a test by creating a function whose name starts with test_ and adding an assert.
  3. Run all tests with uv run pytest.
  4. Recognise common geospatial test patterns (CRS checks, join validation, area approximations).
  5. Document functions with NumPy-style docstrings.

The full hands-on workflow, including TestPyPI publication, is in Lab Week 10.

Next. In the next chapter on the Geospatial Package Example, we walk through akl-ped-counts as a complete real-world package.

Further reading

  • pytest documentation, https://docs.pytest.org/
  • Setting up testing with pytest and uv, https://pydevtools.com/handbook/tutorial/setting-up-testing-with-pytest-and-uv/
  • NumPy docstring guide, https://numpydoc.readthedocs.io/
  • uv publish guide, https://docs.astral.sh/uv/guides/publish/
  • TestPyPI, https://test.pypi.org/