3.4. Origin-Destination Analysis

Story Problem: Where Do People Travel?

Understanding Movement Patterns

Transport planning requires answering fundamental questions: - Where do people start their journeys? - Where are they going? - Which origin-destination pairs have the highest demand? - How does this vary by time of day or day of week?

Origin-Destination (OD) analysis provides the framework for understanding these movement patterns and planning accordingly.

What You Will Learn

  • Represent OD pairs and aggregate by geographic zones
  • Create and analyse OD matrices
  • Calculate flow statistics and identify top OD pairs
  • Design effective visualisations for movement data
  • Integrate OD analysis into dashboards

Origin-Destination Fundamentals

OD Pairs

An OD pair represents travel from one location (origin) to another (destination).

od_trip = {
    'origin': 'Newmarket',
    'destination': 'Auckland CBD',
    'count': 1500,  # trips per day
    'mode': 'bus'
}

OD Matrix

An OD matrix tabulates trips between all zone pairs:

CBD Newmarket Ponsonby Mt Eden
CBD 500 300 200 150
Newmarket 350 100 80 120
Ponsonby 280 90 50 70
Mt Eden 200 110 65 80
  • Rows: Origins
  • Columns: Destinations
  • Diagonal: Intra-zonal trips (trips within same zone)
  • Off-diagonal: Inter-zonal trips

Data Sources

OD data comes from various sources:

Census Journey-to-Work Data - Residential location → Workplace location - Aggregated by census areas (SA2, SA3) - Updated every 5 years

Travel Surveys - Detailed trip diaries - Multiple trip purposes - Sample-based (needs expansion)

Electronic Ticketing (HOP Card) - Public transport boarding → alighting - High temporal resolution - Only captures PT users

Mobile Phone Data - GPS/CDR data - High spatial and temporal resolution - Privacy considerations

E-Scooter/Bike Share Data - Start station → End station - Trip times and distances - Mode-specific

Creating OD Matrices

From Trip Records

import pandas as pd
import numpy as np

# Sample trip data
trips = pd.DataFrame({
    'origin': ['CBD', 'Newmarket', 'CBD', 'Ponsonby', 'CBD'],
    'destination': ['Newmarket', 'CBD', 'Ponsonby', 'CBD', 'Mt Eden'],
    'trips': [300, 350, 200, 280, 150]
})

# Create OD matrix
od_matrix = trips.pivot_table(
    index='origin',
    columns='destination',
    values='trips',
    fill_value=0
)

print(od_matrix)

Output:

destination   CBD  Mt Eden  Newmarket  Ponsonby
origin                                          
CBD             0      150        300       200
Newmarket     350        0          0         0
Ponsonby      280        0          0         0

From Geographic Data

import geopandas as gpd
from shapely.geometry import Point

def create_od_from_points(origins_gdf, destinations_gdf, zones_gdf):
    """
    Create OD matrix from point data assigned to zones.
    
    Parameters:
    -----------
    origins_gdf : GeoDataFrame
        Origin points (e.g., home locations)
    destinations_gdf : GeoDataFrame
        Destination points (e.g., workplaces)
    zones_gdf : GeoDataFrame
        Zone polygons (e.g., SA2 boundaries)
    
    Returns:
    --------
    DataFrame
        OD matrix
    """
    # Spatial join to assign origins to zones
    origins_zones = gpd.sjoin(
        origins_gdf,
        zones_gdf[['zone_id', 'geometry']],
        predicate='within'
    )
    
    # Spatial join to assign destinations to zones
    destinations_zones = gpd.sjoin(
        destinations_gdf,
        zones_gdf[['zone_id', 'geometry']],
        predicate='within'
    )
    
    # Combine into OD pairs
    od_pairs = pd.DataFrame({
        'origin_zone': origins_zones['zone_id'],
        'dest_zone': destinations_zones['zone_id']
    })
    
    # Count trips per OD pair
    od_counts = od_pairs.groupby(
        ['origin_zone', 'dest_zone']
    ).size().reset_index(name='trips')
    
    # Convert to matrix
    od_matrix = od_counts.pivot(
        index='origin_zone',
        columns='dest_zone',
        values='trips'
    ).fillna(0)
    
    return od_matrix

Analysing OD Patterns

Summary Statistics

def analyse_od_matrix(od_matrix):
    """
    Calculate summary statistics for OD matrix.
    
    Returns:
    --------
    dict
        Summary statistics
    """
    # Total trips
    total_trips = od_matrix.sum().sum()
    
    # Intra-zonal trips (diagonal)
    intra_zonal = np.diag(od_matrix).sum()
    
    # Inter-zonal trips (off-diagonal)
    inter_zonal = total_trips - intra_zonal
    
    # Average trip distance (if available)
    # Would require distance matrix
    
    # Top OD pairs
    od_long = od_matrix.stack().reset_index()
    od_long.columns = ['origin', 'destination', 'trips']
    od_long = od_long[od_long['trips'] > 0]
    top_10 = od_long.nlargest(10, 'trips')
    
    return {
        'total_trips': total_trips,
        'intra_zonal': intra_zonal,
        'inter_zonal': inter_zonal,
        'intra_pct': (intra_zonal / total_trips) * 100,
        'top_pairs': top_10
    }

# Example usage
stats = analyse_od_matrix(od_matrix)
print(f"Total trips: {stats['total_trips']:,.0f}")
print(f"Intra-zonal: {stats['intra_pct']:.1f}%")
print("\nTop 10 OD pairs:")
print(stats['top_pairs'])

Production and Attraction

def calculate_production_attraction(od_matrix):
    """
    Calculate trip production and attraction by zone.
    
    Returns:
    --------
    DataFrame
        Zone-level statistics
    """
    production = od_matrix.sum(axis=1)  # Row sums (origins)
    attraction = od_matrix.sum(axis=0)  # Column sums (destinations)
    
    summary = pd.DataFrame({
        'production': production,
        'attraction': attraction,
        'balance': production - attraction
    })
    
    return summary

# Example
zone_stats = calculate_production_attraction(od_matrix)
print(zone_stats)

Output:

            production  attraction  balance
CBD                650        1010     -360
Mt Eden            150         150        0
Newmarket          350         300       50
Ponsonby           200         200        0

Interpretation: - CBD: Net attractor (360 more destinations than origins) — employment centre - Mt Eden: Balanced (equal production and attraction) - Newmarket: Net producer (50 more origins) — residential area

Visualising OD Flows

Chord Diagram

Circular representation showing flows between zones:

import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import numpy as np

def plot_od_chord(od_matrix, threshold=50):
    """
    Create chord diagram for OD flows.
    
    Parameters:
    -----------
    od_matrix : DataFrame
        OD matrix
    threshold : float
        Minimum flow to display
    """
    zones = od_matrix.index.tolist()
    n_zones = len(zones)
    
    # Create circular layout
    angles = np.linspace(0, 2 * np.pi, n_zones, endpoint=False)
    
    fig, ax = plt.subplots(figsize=(10, 10))
    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.axis('off')
    
    # Plot zones as arcs
    radius = 1.0
    for i, (zone, angle) in enumerate(zip(zones, angles)):
        x = radius * np.cos(angle)
        y = radius * np.sin(angle)
        ax.text(x * 1.2, y * 1.2, zone, 
                ha='center', va='center', fontsize=10)
    
    # Plot flows as curves
    for i, orig in enumerate(zones):
        for j, dest in enumerate(zones):
            if i != j:  # Skip intra-zonal
                flow = od_matrix.loc[orig, dest]
                if flow >= threshold:
                    x_start = radius * np.cos(angles[i])
                    y_start = radius * np.sin(angles[i])
                    x_end = radius * np.cos(angles[j])
                    y_end = radius * np.sin(angles[j])
                    
                    # Draw curved arrow
                    arrow = FancyArrowPatch(
                        (x_start, y_start),
                        (x_end, y_end),
                        arrowstyle='->,head_width=0.4,head_length=0.8',
                        connectionstyle="arc3,rad=.3",
                        linewidth=flow / 100,  # Scale by flow
                        color='steelblue',
                        alpha=0.5
                    )
                    ax.add_patch(arrow)
    
    ax.set_title('Origin-Destination Flows', fontsize=14)
    return fig, ax

Desire Lines on Map

Geographic representation of flows:

def plot_desire_lines(od_matrix, zones_gdf, threshold=100):
    """
    Plot desire lines showing OD flows on map.
    
    Parameters:
    -----------
    od_matrix : DataFrame
        OD matrix
    zones_gdf : GeoDataFrame
        Zone geometries with zone_id matching matrix
    threshold : float
        Minimum flow to display
    """
    from shapely.geometry import LineString
    
    fig, ax = plt.subplots(figsize=(12, 12))
    
    # Plot zone boundaries
    zones_gdf.plot(ax=ax, facecolor='lightgrey', 
                   edgecolor='black', linewidth=0.5)
    
    # Calculate zone centroids
    zones_gdf['centroid'] = zones_gdf.geometry.centroid
    centroids = zones_gdf.set_index('zone_id')['centroid']
    
    # Plot desire lines
    for orig in od_matrix.index:
        for dest in od_matrix.columns:
            if orig != dest:  # Skip intra-zonal
                flow = od_matrix.loc[orig, dest]
                if flow >= threshold:
                    # Get centroids
                    orig_pt = centroids.loc[orig]
                    dest_pt = centroids.loc[dest]
                    
                    # Create line
                    line = LineString([
                        (orig_pt.x, orig_pt.y),
                        (dest_pt.x, dest_pt.y)
                    ])
                    
                    # Plot with width proportional to flow
                    ax.plot(*line.xy, 
                           color='red',
                           linewidth=flow / 50,  # Scale
                           alpha=0.5)
    
    ax.set_title('Desire Lines (OD Flows)', fontsize=14)
    ax.set_axis_off()
    return fig, ax

Sankey Diagram

Shows aggregate flows between origin and destination regions:

import plotly.graph_objects as go

def create_sankey(od_matrix, min_flow=50):
    """
    Create Sankey diagram for OD flows.
    """
    # Prepare data
    origins = []
    destinations = []
    values = []
    
    for orig in od_matrix.index:
        for dest in od_matrix.columns:
            flow = od_matrix.loc[orig, dest]
            if flow >= min_flow:
                origins.append(f"{orig} (O)")
                destinations.append(f"{dest} (D)")
                values.append(flow)
    
    # Get unique labels
    labels = list(set(origins + destinations))
    
    # Map to indices
    label_to_idx = {label: i for i, label in enumerate(labels)}
    source_idx = [label_to_idx[o] for o in origins]
    target_idx = [label_to_idx[d] for d in destinations]
    
    # Create Sankey
    fig = go.Figure(data=[go.Sankey(
        node=dict(
            pad=15,
            thickness=20,
            label=labels
        ),
        link=dict(
            source=source_idx,
            target=target_idx,
            value=values
        )
    )])
    
    fig.update_layout(
        title_text="Origin-Destination Flow Sankey",
        font_size=10
    )
    
    return fig

Dashboard Integration

OD Analysis Dashboard

from shiny import App, ui, render, reactive
import pandas as pd
import matplotlib.pyplot as plt

# Load OD data
od_matrix = pd.read_csv('od_matrix.csv', index_col=0)

app_ui = ui.page_fluid(
    ui.h2("Auckland Origin-Destination Analysis"),
    
    ui.layout_sidebar(
        ui.panel_sidebar(
            ui.input_slider(
                "min_flow",
                "Minimum Flow to Display",
                min=0,
                max=500,
                value=100,
                step=50
            ),
            
            ui.input_select(
                "viz_type",
                "Visualisation Type",
                choices=["Matrix", "Chord", "Desire Lines"]
            ),
            
            ui.hr(),
            ui.h4("Summary"),
            ui.output_text_verbatim("summary_stats")
        ),
        
        ui.panel_main(
            ui.navset_tab(
                ui.nav("Visualisation", ui.output_plot("od_viz")),
                ui.nav("Top Pairs", ui.output_table("top_pairs")),
                ui.nav("Zone Stats", ui.output_table("zone_stats"))
            )
        )
    )
)

def server(input, output, session):
    
    @reactive.Calc
    def filtered_od():
        """Filter OD matrix by minimum flow"""
        threshold = input.min_flow()
        return od_matrix[od_matrix >= threshold].fillna(0)
    
    @output
    @render.plot
    def od_viz():
        """Create selected visualisation"""
        od = filtered_od()
        viz_type = input.viz_type()
        
        if viz_type == "Matrix":
            fig, ax = plt.subplots(figsize=(10, 8))
            im = ax.imshow(od, cmap='YlOrRd')
            ax.set_xticks(range(len(od.columns)))
            ax.set_yticks(range(len(od.index)))
            ax.set_xticklabels(od.columns, rotation=45)
            ax.set_yticklabels(od.index)
            plt.colorbar(im, ax=ax, label='Trips')
            ax.set_title('OD Matrix')
            
        elif viz_type == "Chord":
            fig, ax = plot_od_chord(od, threshold=input.min_flow())
            
        else:  # Desire Lines
            fig, ax = plot_desire_lines(od, zones_gdf, threshold=input.min_flow())
        
        return fig
    
    @output
    @render.text
    def summary_stats():
        """Display summary statistics"""
        stats = analyse_od_matrix(od_matrix)
        return f"""
Total Trips: {stats['total_trips']:,.0f}

Intra-zonal: {stats['intra_zonal']:,.0f} ({stats['intra_pct']:.1f}%)
Inter-zonal: {stats['inter_zonal']:,.0f}

Filtered (≥{input.min_flow()}): {filtered_od().sum().sum():,.0f}
        """.strip()
    
    @output
    @render.table
    def top_pairs():
        """Show top OD pairs"""
        stats = analyse_od_matrix(od_matrix)
        return stats['top_pairs']
    
    @output
    @render.table
    def zone_stats():
        """Show zone-level statistics"""
        return calculate_production_attraction(od_matrix)

app = App(app_ui, server)

Advanced Topics

Temporal OD Matrices

Analyse how flows change over time:

def create_temporal_od(trips_df, time_column, time_bins):
    """
    Create OD matrices for different time periods.
    
    Parameters:
    -----------
    trips_df : DataFrame
        Trip data with origin, destination, and time
    time_column : str
        Column name for time
    time_bins : list
        Time period boundaries
    
    Returns:
    --------
    dict
        OD matrices by time period
    """
    trips_df['time_period'] = pd.cut(
        trips_df[time_column],
        bins=time_bins,
        labels=range(len(time_bins) - 1)
    )
    
    od_by_time = {}
    for period in trips_df['time_period'].unique():
        period_data = trips_df[trips_df['time_period'] == period]
        od_matrix = period_data.pivot_table(
            index='origin',
            columns='destination',
            values='count',
            aggfunc='sum',
            fill_value=0
        )
        od_by_time[period] = od_matrix
    
    return od_by_time

OD Estimation from Counts

When only traffic counts are available, estimate OD matrix:

def estimate_od_from_counts(link_counts, network, seed_od=None):
    """
    Estimate OD matrix from link counts using gravity model.
    
    This is a simplified version. Real implementations use
    iterative proportional fitting or entropy maximization.
    """
    # This would require:
    # 1. Gravity model parameters
    # 2. Iterative adjustment to match observed counts
    # 3. Network assignment model
    
    # Placeholder for concept
    pass

Summary

You’ve learned:

  • OD fundamentals: Pairs, matrices, and data sources
  • Creating OD matrices: From trip records and geographic data
  • Analysis: Production/attraction, summary statistics
  • Visualisation: Chord diagrams, desire lines, Sankey diagrams
  • Dashboard integration: Interactive OD exploration
  • Advanced topics: Temporal OD and estimation methods

Practice Exercises

  1. Create an OD matrix from sample trip data
  2. Calculate production and attraction by zone
  3. Identify top 10 OD pairs and visualise
  4. Plot desire lines on a map
  5. Build a dashboard showing OD patterns by time of day

Next Steps

You’ve now covered all the technical components for Assignment 2. In sec-shiny-assignment, you’ll learn how to integrate everything into a complete dashboard.