3.4. Origin-Destination Analysis
Story Problem: Where Do People Travel?
Understanding Movement Patterns
Transport planning requires answering fundamental questions: - Where do people start their journeys? - Where are they going? - Which origin-destination pairs have the highest demand? - How does this vary by time of day or day of week?
Origin-Destination (OD) analysis provides the framework for understanding these movement patterns and planning accordingly.
What You Will Learn
- Represent OD pairs and aggregate by geographic zones
- Create and analyse OD matrices
- Calculate flow statistics and identify top OD pairs
- Design effective visualisations for movement data
- Integrate OD analysis into dashboards
Origin-Destination Fundamentals
OD Pairs
An OD pair represents travel from one location (origin) to another (destination).
od_trip = {
'origin': 'Newmarket',
'destination': 'Auckland CBD',
'count': 1500, # trips per day
'mode': 'bus'
}OD Matrix
An OD matrix tabulates trips between all zone pairs:
| CBD | Newmarket | Ponsonby | Mt Eden | |
|---|---|---|---|---|
| CBD | 500 | 300 | 200 | 150 |
| Newmarket | 350 | 100 | 80 | 120 |
| Ponsonby | 280 | 90 | 50 | 70 |
| Mt Eden | 200 | 110 | 65 | 80 |
- Rows: Origins
- Columns: Destinations
- Diagonal: Intra-zonal trips (trips within same zone)
- Off-diagonal: Inter-zonal trips
Data Sources
OD data comes from various sources:
Census Journey-to-Work Data - Residential location → Workplace location - Aggregated by census areas (SA2, SA3) - Updated every 5 years
Travel Surveys - Detailed trip diaries - Multiple trip purposes - Sample-based (needs expansion)
Electronic Ticketing (HOP Card) - Public transport boarding → alighting - High temporal resolution - Only captures PT users
Mobile Phone Data - GPS/CDR data - High spatial and temporal resolution - Privacy considerations
E-Scooter/Bike Share Data - Start station → End station - Trip times and distances - Mode-specific
Creating OD Matrices
From Trip Records
import pandas as pd
import numpy as np
# Sample trip data
trips = pd.DataFrame({
'origin': ['CBD', 'Newmarket', 'CBD', 'Ponsonby', 'CBD'],
'destination': ['Newmarket', 'CBD', 'Ponsonby', 'CBD', 'Mt Eden'],
'trips': [300, 350, 200, 280, 150]
})
# Create OD matrix
od_matrix = trips.pivot_table(
index='origin',
columns='destination',
values='trips',
fill_value=0
)
print(od_matrix)Output:
destination CBD Mt Eden Newmarket Ponsonby
origin
CBD 0 150 300 200
Newmarket 350 0 0 0
Ponsonby 280 0 0 0
From Geographic Data
import geopandas as gpd
from shapely.geometry import Point
def create_od_from_points(origins_gdf, destinations_gdf, zones_gdf):
"""
Create OD matrix from point data assigned to zones.
Parameters:
-----------
origins_gdf : GeoDataFrame
Origin points (e.g., home locations)
destinations_gdf : GeoDataFrame
Destination points (e.g., workplaces)
zones_gdf : GeoDataFrame
Zone polygons (e.g., SA2 boundaries)
Returns:
--------
DataFrame
OD matrix
"""
# Spatial join to assign origins to zones
origins_zones = gpd.sjoin(
origins_gdf,
zones_gdf[['zone_id', 'geometry']],
predicate='within'
)
# Spatial join to assign destinations to zones
destinations_zones = gpd.sjoin(
destinations_gdf,
zones_gdf[['zone_id', 'geometry']],
predicate='within'
)
# Combine into OD pairs
od_pairs = pd.DataFrame({
'origin_zone': origins_zones['zone_id'],
'dest_zone': destinations_zones['zone_id']
})
# Count trips per OD pair
od_counts = od_pairs.groupby(
['origin_zone', 'dest_zone']
).size().reset_index(name='trips')
# Convert to matrix
od_matrix = od_counts.pivot(
index='origin_zone',
columns='dest_zone',
values='trips'
).fillna(0)
return od_matrixAnalysing OD Patterns
Summary Statistics
def analyse_od_matrix(od_matrix):
"""
Calculate summary statistics for OD matrix.
Returns:
--------
dict
Summary statistics
"""
# Total trips
total_trips = od_matrix.sum().sum()
# Intra-zonal trips (diagonal)
intra_zonal = np.diag(od_matrix).sum()
# Inter-zonal trips (off-diagonal)
inter_zonal = total_trips - intra_zonal
# Average trip distance (if available)
# Would require distance matrix
# Top OD pairs
od_long = od_matrix.stack().reset_index()
od_long.columns = ['origin', 'destination', 'trips']
od_long = od_long[od_long['trips'] > 0]
top_10 = od_long.nlargest(10, 'trips')
return {
'total_trips': total_trips,
'intra_zonal': intra_zonal,
'inter_zonal': inter_zonal,
'intra_pct': (intra_zonal / total_trips) * 100,
'top_pairs': top_10
}
# Example usage
stats = analyse_od_matrix(od_matrix)
print(f"Total trips: {stats['total_trips']:,.0f}")
print(f"Intra-zonal: {stats['intra_pct']:.1f}%")
print("\nTop 10 OD pairs:")
print(stats['top_pairs'])Production and Attraction
def calculate_production_attraction(od_matrix):
"""
Calculate trip production and attraction by zone.
Returns:
--------
DataFrame
Zone-level statistics
"""
production = od_matrix.sum(axis=1) # Row sums (origins)
attraction = od_matrix.sum(axis=0) # Column sums (destinations)
summary = pd.DataFrame({
'production': production,
'attraction': attraction,
'balance': production - attraction
})
return summary
# Example
zone_stats = calculate_production_attraction(od_matrix)
print(zone_stats)Output:
production attraction balance
CBD 650 1010 -360
Mt Eden 150 150 0
Newmarket 350 300 50
Ponsonby 200 200 0
Interpretation: - CBD: Net attractor (360 more destinations than origins) — employment centre - Mt Eden: Balanced (equal production and attraction) - Newmarket: Net producer (50 more origins) — residential area
Visualising OD Flows
Chord Diagram
Circular representation showing flows between zones:
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import numpy as np
def plot_od_chord(od_matrix, threshold=50):
"""
Create chord diagram for OD flows.
Parameters:
-----------
od_matrix : DataFrame
OD matrix
threshold : float
Minimum flow to display
"""
zones = od_matrix.index.tolist()
n_zones = len(zones)
# Create circular layout
angles = np.linspace(0, 2 * np.pi, n_zones, endpoint=False)
fig, ax = plt.subplots(figsize=(10, 10))
ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
ax.axis('off')
# Plot zones as arcs
radius = 1.0
for i, (zone, angle) in enumerate(zip(zones, angles)):
x = radius * np.cos(angle)
y = radius * np.sin(angle)
ax.text(x * 1.2, y * 1.2, zone,
ha='center', va='center', fontsize=10)
# Plot flows as curves
for i, orig in enumerate(zones):
for j, dest in enumerate(zones):
if i != j: # Skip intra-zonal
flow = od_matrix.loc[orig, dest]
if flow >= threshold:
x_start = radius * np.cos(angles[i])
y_start = radius * np.sin(angles[i])
x_end = radius * np.cos(angles[j])
y_end = radius * np.sin(angles[j])
# Draw curved arrow
arrow = FancyArrowPatch(
(x_start, y_start),
(x_end, y_end),
arrowstyle='->,head_width=0.4,head_length=0.8',
connectionstyle="arc3,rad=.3",
linewidth=flow / 100, # Scale by flow
color='steelblue',
alpha=0.5
)
ax.add_patch(arrow)
ax.set_title('Origin-Destination Flows', fontsize=14)
return fig, axDesire Lines on Map
Geographic representation of flows:
def plot_desire_lines(od_matrix, zones_gdf, threshold=100):
"""
Plot desire lines showing OD flows on map.
Parameters:
-----------
od_matrix : DataFrame
OD matrix
zones_gdf : GeoDataFrame
Zone geometries with zone_id matching matrix
threshold : float
Minimum flow to display
"""
from shapely.geometry import LineString
fig, ax = plt.subplots(figsize=(12, 12))
# Plot zone boundaries
zones_gdf.plot(ax=ax, facecolor='lightgrey',
edgecolor='black', linewidth=0.5)
# Calculate zone centroids
zones_gdf['centroid'] = zones_gdf.geometry.centroid
centroids = zones_gdf.set_index('zone_id')['centroid']
# Plot desire lines
for orig in od_matrix.index:
for dest in od_matrix.columns:
if orig != dest: # Skip intra-zonal
flow = od_matrix.loc[orig, dest]
if flow >= threshold:
# Get centroids
orig_pt = centroids.loc[orig]
dest_pt = centroids.loc[dest]
# Create line
line = LineString([
(orig_pt.x, orig_pt.y),
(dest_pt.x, dest_pt.y)
])
# Plot with width proportional to flow
ax.plot(*line.xy,
color='red',
linewidth=flow / 50, # Scale
alpha=0.5)
ax.set_title('Desire Lines (OD Flows)', fontsize=14)
ax.set_axis_off()
return fig, axSankey Diagram
Shows aggregate flows between origin and destination regions:
import plotly.graph_objects as go
def create_sankey(od_matrix, min_flow=50):
"""
Create Sankey diagram for OD flows.
"""
# Prepare data
origins = []
destinations = []
values = []
for orig in od_matrix.index:
for dest in od_matrix.columns:
flow = od_matrix.loc[orig, dest]
if flow >= min_flow:
origins.append(f"{orig} (O)")
destinations.append(f"{dest} (D)")
values.append(flow)
# Get unique labels
labels = list(set(origins + destinations))
# Map to indices
label_to_idx = {label: i for i, label in enumerate(labels)}
source_idx = [label_to_idx[o] for o in origins]
target_idx = [label_to_idx[d] for d in destinations]
# Create Sankey
fig = go.Figure(data=[go.Sankey(
node=dict(
pad=15,
thickness=20,
label=labels
),
link=dict(
source=source_idx,
target=target_idx,
value=values
)
)])
fig.update_layout(
title_text="Origin-Destination Flow Sankey",
font_size=10
)
return figDashboard Integration
OD Analysis Dashboard
from shiny import App, ui, render, reactive
import pandas as pd
import matplotlib.pyplot as plt
# Load OD data
od_matrix = pd.read_csv('od_matrix.csv', index_col=0)
app_ui = ui.page_fluid(
ui.h2("Auckland Origin-Destination Analysis"),
ui.layout_sidebar(
ui.panel_sidebar(
ui.input_slider(
"min_flow",
"Minimum Flow to Display",
min=0,
max=500,
value=100,
step=50
),
ui.input_select(
"viz_type",
"Visualisation Type",
choices=["Matrix", "Chord", "Desire Lines"]
),
ui.hr(),
ui.h4("Summary"),
ui.output_text_verbatim("summary_stats")
),
ui.panel_main(
ui.navset_tab(
ui.nav("Visualisation", ui.output_plot("od_viz")),
ui.nav("Top Pairs", ui.output_table("top_pairs")),
ui.nav("Zone Stats", ui.output_table("zone_stats"))
)
)
)
)
def server(input, output, session):
@reactive.Calc
def filtered_od():
"""Filter OD matrix by minimum flow"""
threshold = input.min_flow()
return od_matrix[od_matrix >= threshold].fillna(0)
@output
@render.plot
def od_viz():
"""Create selected visualisation"""
od = filtered_od()
viz_type = input.viz_type()
if viz_type == "Matrix":
fig, ax = plt.subplots(figsize=(10, 8))
im = ax.imshow(od, cmap='YlOrRd')
ax.set_xticks(range(len(od.columns)))
ax.set_yticks(range(len(od.index)))
ax.set_xticklabels(od.columns, rotation=45)
ax.set_yticklabels(od.index)
plt.colorbar(im, ax=ax, label='Trips')
ax.set_title('OD Matrix')
elif viz_type == "Chord":
fig, ax = plot_od_chord(od, threshold=input.min_flow())
else: # Desire Lines
fig, ax = plot_desire_lines(od, zones_gdf, threshold=input.min_flow())
return fig
@output
@render.text
def summary_stats():
"""Display summary statistics"""
stats = analyse_od_matrix(od_matrix)
return f"""
Total Trips: {stats['total_trips']:,.0f}
Intra-zonal: {stats['intra_zonal']:,.0f} ({stats['intra_pct']:.1f}%)
Inter-zonal: {stats['inter_zonal']:,.0f}
Filtered (≥{input.min_flow()}): {filtered_od().sum().sum():,.0f}
""".strip()
@output
@render.table
def top_pairs():
"""Show top OD pairs"""
stats = analyse_od_matrix(od_matrix)
return stats['top_pairs']
@output
@render.table
def zone_stats():
"""Show zone-level statistics"""
return calculate_production_attraction(od_matrix)
app = App(app_ui, server)Advanced Topics
Temporal OD Matrices
Analyse how flows change over time:
def create_temporal_od(trips_df, time_column, time_bins):
"""
Create OD matrices for different time periods.
Parameters:
-----------
trips_df : DataFrame
Trip data with origin, destination, and time
time_column : str
Column name for time
time_bins : list
Time period boundaries
Returns:
--------
dict
OD matrices by time period
"""
trips_df['time_period'] = pd.cut(
trips_df[time_column],
bins=time_bins,
labels=range(len(time_bins) - 1)
)
od_by_time = {}
for period in trips_df['time_period'].unique():
period_data = trips_df[trips_df['time_period'] == period]
od_matrix = period_data.pivot_table(
index='origin',
columns='destination',
values='count',
aggfunc='sum',
fill_value=0
)
od_by_time[period] = od_matrix
return od_by_timeOD Estimation from Counts
When only traffic counts are available, estimate OD matrix:
def estimate_od_from_counts(link_counts, network, seed_od=None):
"""
Estimate OD matrix from link counts using gravity model.
This is a simplified version. Real implementations use
iterative proportional fitting or entropy maximization.
"""
# This would require:
# 1. Gravity model parameters
# 2. Iterative adjustment to match observed counts
# 3. Network assignment model
# Placeholder for concept
passSummary
You’ve learned:
- OD fundamentals: Pairs, matrices, and data sources
- Creating OD matrices: From trip records and geographic data
- Analysis: Production/attraction, summary statistics
- Visualisation: Chord diagrams, desire lines, Sankey diagrams
- Dashboard integration: Interactive OD exploration
- Advanced topics: Temporal OD and estimation methods
Practice Exercises
- Create an OD matrix from sample trip data
- Calculate production and attraction by zone
- Identify top 10 OD pairs and visualise
- Plot desire lines on a map
- Build a dashboard showing OD patterns by time of day
Next Steps
You’ve now covered all the technical components for Assignment 2. In sec-shiny-assignment, you’ll learn how to integrate everything into a complete dashboard.