Using TransBigData for Python Transportation Data Analysis and Visualization
This article demonstrates how to install the TransBigData Python package and use it for preprocessing, grid‑based aggregation, OD extraction, and both static and interactive visualizations of taxi GPS data, showcasing code examples and detailed explanations for each step.
1. Introduction
The TransBigData library is an open‑source Python package designed for efficient processing, analysis, and visualization of transportation spatio‑temporal big data such as taxi GPS, bike‑share, and bus GPS records. It provides concise, high‑performance functions that simplify complex data‑handling tasks.
2. Installation
Install the library via pip (or conda) with the following command:
<code>pip install -U transbigdata</code>After installation, import the package in Python:
<code>import transbigdata as tbd</code>3. Data Pre‑processing
Read raw taxi GPS data with pandas, load the study area shapefile with GeoPandas, and then apply built‑in cleaning functions:
<code>import pandas as pd
import geopandas as gpd
data = pd.read_csv('TaxiData-Sample.csv', header=None)
data.columns = ['VehicleNum','time','lon','lat','OpenStatus','Speed']
sz = gpd.read_file(r'sz/sz.shp')
# Remove points outside the study area
data = tbd.clean_outofshape(data, sz, col=['lon','lat'], accuracy=500)
# Remove instantaneous status changes
data = tbd.clean_taxi_status(data, col=['VehicleNum','time','OpenStatus'])</code>4. Grid Generation
Define a rectangular grid for the study region and obtain grid parameters:
<code>bounds = [113.75, 22.4, 114.62, 22.86]
params = tbd.area_to_params(bounds, accuracy=1000)
</code>Map each GPS point to a grid cell and aggregate counts:
<code># Assign grid indices
data['LONCOL'], data['LATCOL'] = tbd.GPS_to_grids(data['lon'], data['lat'], params)
# Aggregate vehicle counts per grid cell
grid_agg = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['LONCOL'], grid_agg['LATCOL']], params)
grid_gdf = gpd.GeoDataFrame(grid_agg)
</code>Visualize the rasterized result with Matplotlib, adding a basemap, colorbar, scale bar, and compass:
<code>import matplotlib.pyplot as plt
fig = plt.figure(1, (8,8), dpi=300)
ax = plt.subplot(111)
sz.plot(ax=ax, edgecolor='k', facecolor=(0,0,0,0.1), linewidths=0.5)
grid_gdf.plot(column='VehicleNum', cmap='autumn_r', ax=ax)
tbd.plotscale(ax, bounds=bounds, textsize=10, compasssize=1, accuracy=2000, rect=[0.06,0.03], zorder=10)
plt.axis('off')
plt.show()
</code>5. OD Extraction and Aggregation
Extract origin‑destination (OD) points directly from the GPS data:
<code>oddata = tbd.taxigps_to_od(data, col=['VehicleNum','time','Lng','Lat','OpenStatus'])
</code>Aggregate OD counts on a 2 km × 2 km grid:
<code>params_od = tbd.area_to_params(bounds, accuracy=2000)
od_gdf = tbd.odagg_grid(oddata, params_od)
</code>Alternatively, aggregate OD counts to administrative polygons:
<code># Without explicit grid parameters (direct spatial join)
od_gdf = tbd.odagg_shape(oddata, sz, round_accuracy=6)
# With grid parameters for faster processing
od_gdf = tbd.odagg_shape(oddata, sz, params=params_od)
</code>6. Interactive Visualization
Leverage the built‑in Kepler.gl integration for interactive maps. Visualize point density:
<code>tbd.visualization_data(data, col=['lon','lat'], accuracy=1000, height=500)
</code>Visualize OD flows as arcs:
<code>tbd.visualization_od(oddata, accuracy=2000, height=500)
</code>Animate individual vehicle trajectories:
<code>tbd.visualization_trip(data, col=['lon','lat','VehicleNum','time'], height=500)
</code>All visualizations are rendered as interactive web maps that can be explored directly in a Jupyter notebook.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.