4.2 Testing and Code Quality
Story problem
Preventing silent GIS errors
- A join drops 5 zones and nobody notices.
- A CRS mismatch shifts geometry and distorts distances.
- Tests catch these before the map reaches the report.
What you will learn
- Test join key validation for Auckland zones.
- Test CRS conversion behaviour.
- Test one indicator function end-to-end.
- Write comprehensive documentation with docstrings.
- Publish to TestPyPI safely.
- Troubleshoot common publication issues.
Why Testing Matters for Geospatial Code
The Silent Failures
Geospatial code fails in insidious ways:
Example 1: The Missing Zones
# Spatial join silently drops 5 SA2 zones
result = gpd.sjoin(boundaries, census, predicate='intersects')
# len(boundaries) = 100, len(result) = 95
# No error raised! 🚨Example 2: The CRS Disaster
# Calculate distance in wrong CRS
gdf_4326 = boundaries.to_crs('EPSG:4326') # Geographic CRS
distance = gdf_4326.geometry[0].distance(gdf_4326.geometry[1])
# Returns 0.05 (degrees), not 5000 (metres)!
# Map looks right, but all distances are wrong 🚨Example 3: The Geometry Confusion
# Empty geometry after buffer
point = Point(0, 0)
buffered = point.buffer(-100) # Negative buffer!
# Returns empty geometry, not an error
# Downstream analysis produces zeros 🚨What Tests Catch
Tests save you from: - Silent data loss (dropped rows, missing columns) - Wrong coordinate systems - Invalid geometries - Incorrect calculations - Breaking changes when you update code
Tests give you: - Confidence your code works - Documentation of expected behaviour - Safety net for refactoring - Reproducible analysis
Introduction to pytest
Why pytest?
pytest is Python’s most popular testing framework:
- Simple to write tests
- Powerful assertion introspection
- Excellent plugin ecosystem
- Great for geospatial code
Installation
# Add to your package dev dependencies
uv add --dev pytest pytest-covIn pyproject.toml:
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
]Your First Test
Create tests/test_geohello.py:
"""Tests for geohello package."""
from geohello import hello
def test_hello_default():
"""Test default greeting returns Auckland."""
result = hello()
assert "Auckland" in result
assert "Hello from" in result
def test_hello_custom_place():
"""Test custom place name works."""
result = hello("Tokyo")
assert "Tokyo" in result
assert result == "Hello from Tokyo!"
def test_hello_empty_string():
"""Test empty string defaults to Auckland."""
result = hello("")
assert "Auckland" in resultRunning Tests
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_geohello.py
# Run specific test function
uv run pytest tests/test_geohello.py::test_hello_default
# Run with coverage report
uv run pytest --cov=geohello --cov-report=term-missingOutput:
============================= test session starts ==============================
collected 3 items
tests/test_geohello.py ... [100%]
============================== 3 passed in 0.12s ===============================
Testing Geospatial Operations
Test Structure
Organise tests by module:
tests/
├── __init__.py
├── test_data.py # Test data loading
├── test_metrics.py # Test calculations
├── test_viz.py # Test visualisations
└── fixtures/ # Test data files
├── sample_sa2.gpkg
└── sample_census.csv
Testing Data Loading
# tests/test_data.py
import pytest
import geopandas as gpd
from pathlib import Path
from auckland_gis.data import load_sa2, validate_geometry
def test_load_sa2_returns_geodataframe():
"""Test load_sa2 returns GeoDataFrame."""
gdf = load_sa2('tests/fixtures/sample_sa2.gpkg')
assert isinstance(gdf, gpd.GeoDataFrame)
def test_load_sa2_has_correct_crs():
"""Test loaded data has NZTM projection."""
gdf = load_sa2('tests/fixtures/sample_sa2.gpkg')
assert gdf.crs == 'EPSG:2193'
def test_load_sa2_has_required_columns():
"""Test required columns present."""
gdf = load_sa2('tests/fixtures/sample_sa2.gpkg')
required = ['SA2_code', 'SA2_name', 'geometry']
for col in required:
assert col in gdf.columns
def test_load_sa2_raises_on_missing_file():
"""Test appropriate error for missing file."""
with pytest.raises(FileNotFoundError):
load_sa2('nonexistent.gpkg')
def test_validate_geometry_catches_invalid():
"""Test geometry validation catches invalid geometries."""
# Create invalid geometry
from shapely.geometry import Polygon
invalid = Polygon([(0, 0), (1, 1), (1, 0), (0, 1), (0, 0)]) # Self-intersecting
gdf = gpd.GeoDataFrame({'geometry': [invalid]}, crs='EPSG:2193')
is_valid, report = validate_geometry(gdf)
assert not is_valid
assert report['invalid_count'] == 1Testing CRS Operations
# tests/test_crs.py
import geopandas as gpd
from shapely.geometry import Point
from auckland_gis.data import ensure_nztm, reproject_safe
def test_ensure_nztm_converts_wgs84():
"""Test conversion from WGS84 to NZTM."""
# Auckland Sky Tower in WGS84
gdf = gpd.GeoDataFrame(
{'name': ['Sky Tower']},
geometry=[Point(174.7633, -36.8485)],
crs='EPSG:4326'
)
result = ensure_nztm(gdf)
assert result.crs == 'EPSG:2193'
# Check coordinates are in expected range (NZTM)
x, y = result.geometry[0].x, result.geometry[0].y
assert 1_700_000 < x < 1_800_000 # NZTM easting
assert 5_900_000 < y < 6_000_000 # NZTM northing
def test_ensure_nztm_preserves_nztm():
"""Test NZTM data unchanged."""
gdf = gpd.GeoDataFrame(
{'name': ['Test']},
geometry=[Point(1750000, 5920000)],
crs='EPSG:2193'
)
result = ensure_nztm(gdf)
assert result.crs == 'EPSG:2193'
assert result.geometry[0].x == 1750000
assert result.geometry[0].y == 5920000
def test_reproject_safe_raises_on_missing_crs():
"""Test error when CRS is None."""
gdf = gpd.GeoDataFrame(
{'name': ['Test']},
geometry=[Point(0, 0)],
crs=None # No CRS!
)
with pytest.raises(ValueError, match="CRS is not defined"):
reproject_safe(gdf, 'EPSG:2193')Testing Spatial Operations
# tests/test_spatial.py
import geopandas as gpd
from shapely.geometry import Point, Polygon
from auckland_gis.metrics import calculate_density, points_in_polygon
def test_calculate_density():
"""Test density calculation."""
# Create test polygon (1000m x 1000m = 1 km²)
poly = Polygon([
(0, 0), (1000, 0), (1000, 1000), (0, 1000), (0, 0)
])
gdf = gpd.GeoDataFrame(
{'population': [2000]},
geometry=[poly],
crs='EPSG:2193' # NZTM (metres)
)
result = calculate_density(gdf)
assert 'density' in result.columns
# 2000 people / 1 km² = 2000
assert result['density'].iloc[0] == pytest.approx(2000, rel=0.01)
def test_calculate_density_zero_area():
"""Test handling of zero-area geometries."""
# Degenerate polygon (line)
line = Polygon([(0, 0), (1000, 0), (1000, 0), (0, 0)])
gdf = gpd.GeoDataFrame(
{'population': [100]},
geometry=[line],
crs='EPSG:2193'
)
result = calculate_density(gdf)
# Should handle gracefully (NaN or 0)
assert result['density'].iloc[0] in [0, float('nan')]
def test_points_in_polygon():
"""Test point-in-polygon counting."""
polygon = Polygon([(0, 0), (100, 0), (100, 100), (0, 100), (0, 0)])
poly_gdf = gpd.GeoDataFrame(
{'id': [1]},
geometry=[polygon],
crs='EPSG:2193'
)
# 3 points inside, 1 outside
points = [
Point(50, 50), # Inside
Point(25, 75), # Inside
Point(75, 25), # Inside
Point(200, 200), # Outside
]
points_gdf = gpd.GeoDataFrame(
{'id': range(4)},
geometry=points,
crs='EPSG:2193'
)
result = points_in_polygon(poly_gdf, points_gdf)
assert result['point_count'].iloc[0] == 3Testing Spatial Joins
# tests/test_joins.py
import geopandas as gpd
import pandas as pd
from auckland_gis.data import validate_join
def test_validate_join_all_match():
"""Test successful join with all records matching."""
boundaries = gpd.GeoDataFrame({
'SA2_code': ['001', '002', '003'],
'SA2_name': ['CBD', 'Newmarket', 'Ponsonby'],
'geometry': [Point(0, 0), Point(1, 1), Point(2, 2)]
}, crs='EPSG:2193')
census = pd.DataFrame({
'SA2_code': ['001', '002', '003'],
'population': [1000, 2000, 1500]
})
result, report = validate_join(boundaries, census, key='SA2_code')
assert report['all_matched'] == True
assert report['unmatched_count'] == 0
assert len(result) == 3
assert 'population' in result.columns
def test_validate_join_missing_zones():
"""Test detection of missing SA2 zones."""
boundaries = gpd.GeoDataFrame({
'SA2_code': ['001', '002', '003'],
'SA2_name': ['CBD', 'Newmarket', 'Ponsonby'],
'geometry': [Point(0, 0), Point(1, 1), Point(2, 2)]
}, crs='EPSG:2193')
# Census missing SA2_code 003
census = pd.DataFrame({
'SA2_code': ['001', '002'],
'population': [1000, 2000]
})
result, report = validate_join(boundaries, census, key='SA2_code')
assert report['all_matched'] == False
assert report['unmatched_count'] == 1
assert '003' in report['unmatched_keys']
def test_validate_join_duplicate_keys():
"""Test detection of duplicate keys."""
boundaries = gpd.GeoDataFrame({
'SA2_code': ['001', '002'],
'geometry': [Point(0, 0), Point(1, 1)]
}, crs='EPSG:2193')
# Census has duplicate SA2_code
census = pd.DataFrame({
'SA2_code': ['001', '001', '002'], # Duplicate!
'population': [1000, 1100, 2000]
})
with pytest.raises(ValueError, match="Duplicate keys"):
validate_join(boundaries, census, key='SA2_code')pytest Features for Geospatial Testing
1. Parametrize: Test Multiple Cases
@pytest.mark.parametrize("crs_from,crs_to", [
('EPSG:4326', 'EPSG:2193'),
('EPSG:4326', 'EPSG:3857'),
('EPSG:2193', 'EPSG:4326'),
])
def test_reproject_multiple_crs(crs_from, crs_to):
"""Test reprojection between different CRS pairs."""
gdf = gpd.GeoDataFrame(
{'id': [1]},
geometry=[Point(174.76, -36.85)],
crs=crs_from
)
result = gdf.to_crs(crs_to)
assert result.crs == crs_to2. Fixtures: Reusable Test Data
@pytest.fixture
def sample_sa2():
"""Fixture providing sample SA2 data."""
return gpd.GeoDataFrame({
'SA2_code': ['001', '002', '003'],
'SA2_name': ['CBD', 'Newmarket', 'Ponsonby'],
'geometry': [
Polygon([(0, 0), (10, 0), (10, 10), (0, 10)]),
Polygon([(10, 0), (20, 0), (20, 10), (10, 10)]),
Polygon([(0, 10), (10, 10), (10, 20), (0, 20)]),
]
}, crs='EPSG:2193')
def test_with_fixture(sample_sa2):
"""Test using fixture data."""
assert len(sample_sa2) == 3
assert sample_sa2.crs == 'EPSG:2193'3. Approx: Floating Point Comparisons
def test_area_calculation():
"""Test area calculation with floating point tolerance."""
poly = Polygon([(0, 0), (100, 0), (100, 100), (0, 100)])
gdf = gpd.GeoDataFrame({'geometry': [poly]}, crs='EPSG:2193')
area_km2 = gdf.geometry.area[0] / 1_000_000
# Use approx for floating point comparison
assert area_km2 == pytest.approx(0.01, rel=0.001) # 0.01 km² ± 0.1%4. Markers: Skip or Mark Tests
@pytest.mark.slow
def test_large_dataset():
"""Test with large dataset (marked as slow)."""
# This test takes minutes
pass
# Run: pytest -m "not slow" # Skip slow tests
@pytest.mark.skipif(not HAS_INTERNET, reason="No internet connection")
def test_download_boundaries():
"""Test downloading data (requires internet)."""
passTest Coverage
Measuring Coverage
# Run tests with coverage
uv run pytest --cov=geohello --cov-report=term-missing
# Generate HTML coverage report
uv run pytest --cov=geohello --cov-report=html
# Open htmlcov/index.htmlOutput:
----------- coverage: platform darwin, python 3.12.0 -----------
Name Stmts Miss Cover Missing
---------------------------------------------------------
src/geohello/__init__.py 5 0 100%
src/geohello/core.py 23 2 91% 45-46
---------------------------------------------------------
TOTAL 28 2 93%
Target: >80% Coverage
For Week 10 assessment, aim for >80% test coverage:
# Ensure all main functions tested
def test_all_public_functions():
"""Test all functions in public API."""
from geohello import __all__
# Every function in __all__ should have tests
for func_name in __all__:
test_name = f"test_{func_name}"
assert test_name in dir(pytest) # Simplified checkWhat to Test
✅ Do test: - All public functions - Edge cases (empty input, None values) - Error conditions - CRS handling - Data validation - Spatial operations
❌ Don’t test: - Third-party libraries (e.g., geopandas) - Simple getters/setters - Trivial functions with no logic
Documentation with Docstrings
NumPy Style Docstrings
Use NumPy style for consistency with scientific Python:
def calculate_accessibility(
origins_gdf,
destinations_gdf,
max_distance_m=800,
crs='EPSG:2193'
):
"""
Calculate accessibility scores based on nearby destinations.
Counts the number of destinations within walking distance of each origin.
Assumes planar coordinates in metres.
Parameters
----------
origins_gdf : GeoDataFrame
Points representing origin locations (e.g., homes)
destinations_gdf : GeoDataFrame
Points representing destinations (e.g., shops, parks)
max_distance_m : float, default 800
Maximum walking distance in metres
crs : str, default 'EPSG:2193'
Coordinate reference system to use for distance calculations.
Must be a projected CRS in metres.
Returns
-------
GeoDataFrame
Origins with added 'accessibility_score' column containing
count of nearby destinations
Raises
------
ValueError
If CRS is not projected (geographic coordinates not supported)
Examples
--------
>>> homes = gpd.read_file('homes.gpkg')
>>> shops = gpd.read_file('shops.gpkg')
>>> result = calculate_accessibility(homes, shops, max_distance_m=400)
>>> result['accessibility_score'].mean()
3.2
See Also
--------
calculate_density : Calculate population density
buffer_analysis : General buffer-based analysis
Notes
-----
Uses Euclidean distance. For network-based accessibility, use
r5py routing instead.
This function assumes both GeoDataFrames are in the same CRS.
Reprojection is not performed automatically.
"""
# Implementation here
passREADME Documentation
Update README.md:
# geohello
A simple geospatial greeting package for learning Python packaging.
## Installation
```bash
pip install geohello-yournameQuick Start
from geohello import hello
# Default greeting
print(hello())
# Output: Hello from Auckland!
# Custom location
print(hello("Tokyo"))
# Output: Hello from Tokyo!Features
- Simple greeting function
- Customizable location
- Well-tested and documented
Development
# Clone repository
git clone https://github.com/yourusername/geohello.git
cd geohello
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=geohelloTesting
pytest # Run all tests
pytest -v # Verbose output
pytest --cov=geohello # With coverageLicense
MIT License - see LICENSE file
Common Publication Errors
Error 1: Package Name Already Exists
HTTPError: 403 Forbidden
The name 'geohello' is already taken
Solution: Choose unique name
[project]
name = "geohello-yourname" # Add your name/initialsError 2: Version Already Published
HTTPError: 400 Bad Request
File already exists
Solution: Increment version
[project]
version = "0.1.1" # Was 0.1.0Error 3: Invalid Token
HTTPError: 403 Forbidden
Invalid or expired token
Solution: Generate new token, check scope
Error 4: Missing Dependencies
ERROR: Could not find a version that satisfies the requirement geopandas
Solution: TestPyPI doesn’t have all packages. Your package dependencies must be on real PyPI. For testing, remove heavy dependencies or use --extra-index-url:
pip install --index-url https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple/ \
geohello-yournameWeek 10 Lab: Test, Document, and Publish
Lab Exercise (110 minutes)
Part 1: Add Tests (30 mins)
- Create
tests/test_geohello.py - Write 3-5 tests covering main functions
- Run tests:
uv run pytest - Check coverage:
uv run pytest --cov=geohello - Aim for >80% coverage
Part 2: Add Documentation (20 mins)
- Add NumPy-style docstrings to all functions
- Update README with installation and usage examples
- Add code examples to docstrings
Part 3: Publish to TestPyPI (40 mins)
- Create TestPyPI account
- Generate API token
- Build package:
uv build - Publish:
uv publish --index testpypi --token TOKEN - Troubleshoot any errors
Part 4: Verify Installation (20 mins)
- Create fresh environment
- Install from TestPyPI
- Test import and functionality
- Screenshot for submission
Week 10 Deliverable (5% of Grade)
Due: Friday 22 May, 5pm
Submit to Canvas:
- TestPyPI URL: Link to your package
- Example:
https://test.pypi.org/project/geohello-yourname/
- Example:
- GitHub Repository: Link with:
- Complete package code in
src/layout - Tests in
tests/directory (>80% coverage) - Comprehensive README
- Clear commit history showing development
- Complete package code in
- Installation Screenshot: Shows:
- Successful install from TestPyPI
- Working import
- Function execution
- Reflection (150-200 words):
- What challenges did you encounter?
- How did you resolve them?
- What will you apply to Assignment 3?
Evaluation Criteria:
- Published to TestPyPI (2%): Visible and downloadable
- Tests passing (1%): ≥3 meaningful tests, >80% coverage
- Documentation complete (1%): Docstrings + README
- Installation verified (1%): Works in fresh environment
Assignment 3 Preview
Week 10’s work is practice for Assignment 3:
Assignment 3 (Week 12, 30% of grade): - Publish to real PyPI (not TestPyPI) - More comprehensive functionality - Higher test coverage - Professional documentation - Optional: CI/CD pipeline
Suggested packages: - Urban accessibility toolkit - Pedestrian flow analysis - Micromobility analytics - Street network utilities - 15-minute city tools
Start planning now—use Weeks 10-11 to build incrementally!
Summary
You’ve learned:
- Why testing matters: Prevents silent GIS failures
- pytest basics: Writing and running tests
- Geospatial testing: CRS, joins, spatial operations
- Documentation: NumPy docstrings and README
- TestPyPI: Safe practice publication
- Troubleshooting: Common errors and solutions
Next: In sec-geospatial-package-example, see complete real-world package examples including CI/CD.
Further Reading
- pytest documentation: https://docs.pytest.org/
- pytest-cov: https://pytest-cov.readthedocs.io/
- NumPy docstring guide: https://numpydoc.readthedocs.io/
- TestPyPI: https://test.pypi.org/
- uv publish guide: https://docs.astral.sh/uv/guides/publish/