2.5. Creating Functions
Introduction
Functions are fundamental building blocks in Python programming, and mastering them marks the transition from writing scripts to writing software. A function encapsulates a specific operation into a reusable, named block of code. This matters for four interconnected reasons. First, functions allow you to write code once and reuse it across multiple analyses, eliminating redundancy. Second, they make your analysis more readable by replacing complex operations with descriptive function names (e.g. calculate_walkability_score() is immediately clearer than 15 lines of inline arithmetic). Third, they reduce errors through standardisation, because once a function is tested and verified, every call to it produces consistent results. Fourth, they enable you to share functionality across projects and with collaborators, which is the foundation of the Python packaging workflow you will learn in Part 4.
In geospatial analysis specifically, functions are indispensable because the same operations recur across different datasets and study areas. Data cleaning, coordinate transformations, distance calculations, buffer generation, and map creation are tasks you will perform repeatedly, and wrapping them in well-documented functions saves time and reduces the risk of introducing errors through copy-paste modifications. Sevtsuk’s open-source Madina library for pedestrian trip modelling, for example, is structured entirely as a collection of functions and classes that users call with different parameters for different cities and scenarios.
Why Create Functions?
Consider this scenario: you need to calculate population density for multiple regions:
Without functions (repetitive):
# Calculate density for Auckland
auckland_density = auckland_pop / auckland_area
# Calculate density for Wellington
wellington_density = wellington_pop / wellington_area
# Calculate density for Christchurch
christchurch_density = christchurch_pop / christchurch_areaWith a function (reusable):
def calculate_density(population, area):
"""Calculate population density."""
return population / area
auckland_density = calculate_density(auckland_pop, auckland_area)
wellington_density = calculate_density(wellington_pop, wellington_area)
christchurch_density = calculate_density(christchurch_pop, christchurch_area)Function Basics
Defining Functions
The basic syntax for creating a function:
def function_name(parameter1, parameter2):
"""
Docstring: Brief description of what the function does.
Parameters:
-----------
parameter1 : type
Description of parameter1
parameter2 : type
Description of parameter2
Returns:
--------
type
Description of return value
"""
# Function body
result = parameter1 + parameter2
return resultA function definition has five key components. The def keyword signals to Python that you are defining a function. The function_name should be descriptive and follow Python’s naming convention of lowercase words separated by underscores (e.g. calculate_buffer_area rather than CalcBufArea). The parameters in parentheses define the inputs your function accepts. The docstring (the triple-quoted text immediately after the function signature) documents what the function does, what it expects, and what it returns. Finally, the return statement specifies the output. Both parameters and return statements are optional, but functions without them are less useful in analytical workflows.
Simple Example
def greet(name):
"""Return a greeting message."""
return f"Hello, {name}!"
# Use the function
message = greet("Auckland")
print(message) # Output: Hello, Auckland!Parameters and Arguments
Positional Arguments
def calculate_area(length, width):
"""Calculate area of a rectangle."""
return length * width
area = calculate_area(10, 5) # length=10, width=5Keyword Arguments
# More explicit - order doesn't matter
area = calculate_area(width=5, length=10)Default Parameters
def calculate_buffer(geometry, distance=1000, resolution=16):
"""
Create a buffer around a geometry.
Parameters:
-----------
geometry : shapely geometry
The geometry to buffer
distance : float, default 1000
Buffer distance in map units (metres)
resolution : int, default 16
Number of segments per quarter circle
"""
return geometry.buffer(distance, resolution=resolution)
# Use default distance and resolution
buffered1 = calculate_buffer(point)
# Override distance but use default resolution
buffered2 = calculate_buffer(point, distance=500)
# Override both
buffered3 = calculate_buffer(point, distance=500, resolution=32)Return Values
Single Return Value
def get_area(gdf):
"""Calculate total area of geometries."""
return gdf.geometry.area.sum()Multiple Return Values
def calculate_statistics(gdf, column):
"""
Calculate basic statistics for a column.
Returns: mean, median, std_dev
"""
mean_val = gdf[column].mean()
median_val = gdf[column].median()
std_val = gdf[column].std()
return mean_val, median_val, std_val
# Unpack the returned values
mean, median, std = calculate_statistics(regions, 'population')Returning Different Types
def analyse_region(gdf):
"""Return a dictionary of region statistics."""
return {
'count': len(gdf),
'total_area': gdf.geometry.area.sum(),
'mean_pop': gdf['population'].mean(),
'geometries': gdf.geometry
}
results = analyse_region(nz_regions)
print(f"Total regions: {results['count']}")Geospatial Function Examples
Example 1: Calculate Centroid Coordinates
def get_centroid_coords(geometry):
"""
Extract centroid coordinates from a geometry.
Parameters:
-----------
geometry : shapely geometry
Input geometry
Returns:
--------
tuple
(x, y) coordinates of centroid
"""
centroid = geometry.centroid
return centroid.x, centroid.y
# Usage
for idx, row in gdf.iterrows():
x, y = get_centroid_coords(row.geometry)
print(f"Region {row['name']}: ({x:.2f}, {y:.2f})")Example 2: Reproject GeoDataFrame
def reproject_gdf(gdf, target_crs='EPSG:2193'):
"""
Reproject a GeoDataFrame to a target CRS.
Parameters:
-----------
gdf : GeoDataFrame
Input GeoDataFrame
target_crs : str, default 'EPSG:2193'
Target coordinate reference system
Returns:
--------
GeoDataFrame
Reprojected GeoDataFrame
"""
if gdf.crs is None:
raise ValueError("Input GeoDataFrame has no CRS defined")
if gdf.crs == target_crs:
print(f"Already in {target_crs}")
return gdf
print(f"Reprojecting from {gdf.crs} to {target_crs}")
return gdf.to_crs(target_crs)
# Usage
nz_regions_nztm = reproject_gdf(nz_regions, 'EPSG:2193')Example 3: Calculate Distance Between Points
def calculate_distance(point1, point2, crs='EPSG:2193'):
"""
Calculate distance between two points.
Parameters:
-----------
point1, point2 : tuple
(x, y) coordinates
crs : str, default 'EPSG:2193'
Coordinate system (must be projected for accurate distances)
Returns:
--------
float
Distance in map units (metres for EPSG:2193)
"""
from shapely.geometry import Point
import geopandas as gpd
# Create Point geometries
p1 = Point(point1)
p2 = Point(point2)
# Calculate distance
return p1.distance(p2)
# Usage
auckland = (1756673, 5917750) # NZTM coordinates
wellington = (1748894, 5427483)
distance_km = calculate_distance(auckland, wellington) / 1000
print(f"Distance: {distance_km:.1f} km")Example 4: Create Buffers with Multiple Distances
def create_multi_ring_buffer(geometry, distances, crs='EPSG:2193'):
"""
Create multiple buffer rings around a geometry.
Parameters:
-----------
geometry : shapely geometry
Center geometry
distances : list
List of buffer distances in metres
crs : str
Coordinate reference system
Returns:
--------
GeoDataFrame
GeoDataFrame with buffer rings
"""
import geopandas as gpd
buffers = []
for dist in distances:
buffer_geom = geometry.buffer(dist)
buffers.append({
'distance': dist,
'geometry': buffer_geom
})
return gpd.GeoDataFrame(buffers, crs=crs)
# Usage
city_center = Point(1756673, 5917750) # Auckland in NZTM
buffer_rings = create_multi_ring_buffer(
city_center,
distances=[1000, 5000, 10000, 20000] # 1, 5, 10, 20 km
)
# Visualise
buffer_rings.plot(column='distance', cmap='Reds', alpha=0.5)Example 5: Standardised Choropleth Map
def create_choropleth(gdf, column, title,
scheme='quantiles', k=5,
cmap='YlOrRd', figsize=(10, 8)):
"""
Create a standardised choropleth map.
Parameters:
-----------
gdf : GeoDataFrame
Input geographic data
column : str
Column name to map
title : str
Map title
scheme : str, default 'quantiles'
Classification scheme (quantiles, equal_interval, natural_breaks)
k : int, default 5
Number of classes
cmap : str, default 'YlOrRd'
Colour map
figsize : tuple, default (10, 8)
Figure size
Returns:
--------
matplotlib Figure and Axes objects
"""
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=figsize)
gdf.plot(ax=ax,
column=column,
scheme=scheme,
k=k,
cmap=cmap,
legend=True,
edgecolor='black',
linewidth=0.5,
legend_kwds={'title': column,
'loc': 'lower left'})
ax.set_title(title, fontsize=14, fontweight='bold')
ax.set_axis_off()
return fig, ax
# Usage
fig, ax = create_choropleth(
nz_regions,
column='pop_density',
title='New Zealand Population Density by Region',
scheme='natural_breaks',
k=5
)
plt.show()Error Handling in Functions
Using Try-Except
def safe_divide(numerator, denominator):
"""
Safely divide two numbers with error handling.
Returns None if division is not possible.
"""
try:
result = numerator / denominator
return result
except ZeroDivisionError:
print("Warning: Division by zero")
return None
except TypeError:
print("Warning: Invalid data types for division")
return None
# Usage
print(safe_divide(10, 2)) # 5.0
print(safe_divide(10, 0)) # Warning message, returns NoneInput Validation
def calculate_population_density(gdf, pop_column='population'):
"""
Calculate population density for regions.
Parameters:
-----------
gdf : GeoDataFrame
Must contain a population column and be in a projected CRS
pop_column : str
Name of the population column
Returns:
--------
GeoDataFrame
Original GeoDataFrame with added 'pop_density' column
"""
# Validate inputs
if not isinstance(gdf, gpd.GeoDataFrame):
raise TypeError("Input must be a GeoDataFrame")
if pop_column not in gdf.columns:
raise ValueError(f"Column '{pop_column}' not found in GeoDataFrame")
if gdf.crs is None:
raise ValueError("GeoDataFrame must have a CRS defined")
if not gdf.crs.is_projected:
raise ValueError("GeoDataFrame must be in a projected CRS for accurate area calculation")
# Calculate density
gdf = gdf.copy()
gdf['area_km2'] = gdf.geometry.area / 1_000_000
gdf['pop_density'] = gdf[pop_column] / gdf['area_km2']
return gdf
# Usage with proper error handling
try:
result = calculate_population_density(nz_regions)
except (TypeError, ValueError) as e:
print(f"Error: {e}")Best Practices
1. Write Clear Docstrings
def good_function(param1, param2):
"""
One-line summary of what the function does.
More detailed explanation if needed. Describe the purpose,
any assumptions, and what the function does.
Parameters:
-----------
param1 : type
Description of first parameter
param2 : type
Description of second parameter
Returns:
--------
type
Description of return value
Examples:
---------
>>> result = good_function(value1, value2)
>>> print(result)
Expected output
"""
pass2. Keep Functions Focused
Each function should do one thing well:
# Good: Single, clear purpose
def calculate_area_km2(geometry):
"""Calculate area in square kilometres."""
return geometry.area / 1_000_000
# Less good: Doing too many things
def process_everything(gdf):
"""Calculate area, reproject, filter, plot..."""
# Too many responsibilities
pass3. Use Meaningful Names
# Good: Descriptive names
def calculate_population_density(population, area):
return population / area
# Poor: Unclear names
def calc(p, a):
return p / a4. Avoid Global Variables
# Poor: Relies on global variable
base_year = 2023
def calculate_years_since(year):
return base_year - year
# Good: All inputs are parameters
def calculate_years_between(start_year, end_year):
return end_year - start_yearCreating a Personal GIS Toolkit
Build your own library of reusable functions:
# my_gis_tools.py
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
def load_and_reproject(filepath, target_crs='EPSG:2193'):
"""Load and reproject spatial data."""
gdf = gpd.read_file(filepath)
return gdf.to_crs(target_crs)
def quick_map(gdf, column=None, title='Map'):
"""Create a quick choropleth map."""
fig, ax = plt.subplots(figsize=(10, 8))
if column:
gdf.plot(ax=ax, column=column, legend=True)
else:
gdf.plot(ax=ax)
ax.set_title(title)
ax.set_axis_off()
return fig, ax
def measure_distance_km(point1, point2):
"""Calculate distance in kilometres between two points."""
p1 = Point(point1)
p2 = Point(point2)
return p1.distance(p2) / 1000
# More functions...Use your toolkit in other scripts:
# In your analysis script
from my_gis_tools import load_and_reproject, quick_map
# Use your custom functions
data = load_and_reproject('regions.shp')
quick_map(data, column='population', title='Population Map')Summary
This section has covered the principles and practice of creating functions in Python, completing the core programming toolkit you need for geospatial analysis. You have learned how to define functions with parameters, default values, and return statements, and why docstrings are essential for making your code understandable to others (and to your future self). Through geospatial examples, you have seen how to encapsulate common operations like centroid extraction, CRS reprojection, distance calculation, multi-ring buffering, and standardised choropleth map creation into reusable, well-documented functions. Error handling with try-except blocks and input validation ensure that your functions fail gracefully and provide informative messages when given unexpected inputs.
The best practices discussed in this section, including writing clear docstrings, keeping functions focused on a single task, using meaningful names, and avoiding reliance on global variables, are not merely stylistic preferences. They are engineering principles that determine whether your code can be maintained, extended, and shared effectively. Building a personal GIS toolkit of reusable functions, as demonstrated with the my_gis_tools.py example, is the first step towards the Python packaging workflow you will learn in Part 4, where you will publish your tools as installable packages that others can use in their own pedestrian mobility and urban analytics projects.
Practice Exercises
Write a function that takes a GeoDataFrame and returns basic statistics (count, total area, mean population)
Create a function that filters a GeoDataFrame to only include features within a specified distance of a point
Write a function that creates a grid of points within a bounding box
Develop a function that standardises column names in a GeoDataFrame (e.g., converts all to lowercase, replaces spaces with underscores)
Create a function that generates a series of maps showing different classification schemes for the same dataset