Visualizing US Census Data with Datashader: A Step‑by‑Step Python Guide
This tutorial demonstrates how to load a large US census Parquet dataset, convert WebMercator coordinates, filter by geographic regions, and generate high‑resolution visualizations of population density and racial distribution using Python's Datashader library with various colormaps and export options.
This article reproduces an official Datashader example that visualizes US census data stored in Parquet format. It begins by listing the required Python packages and shows a screenshot of the environment configuration.
import datashader as ds import datashader.transfer_functions as tf import pandas as pd import colorcet from IPython.display import Image import warnings warnings.filterwarnings("ignore")
The dataset is read with pandas.read_parquet using the fastparquet engine for better speed and lower memory consumption:
%%time df = pd.read_parquet('/home/mw/input/census2270/census2010.parq/census2010.parq', engine='fastparquet') # CPU times: user 2.99 s, sys: 1.4 s, total: 4.39 s # Wall time: 4.42 s
Because the original coordinates are in WebMercator, a conversion function is defined to obtain latitude values, and the easting values are scaled directly. This step is memory‑intensive and can take several minutes on machines with less than 16 GB RAM.
import math def webMercator2LngLat(y): lat = y / 20037508.34 * 180 lat = 180 / math.pi * (2 * math.atan(math.exp(lat * math.pi / 180)) - math.pi / 2) return lat
%%time df.easting = df.easting / 20037508.34 * 180 df.northing = df.northing.apply(webMercator2LngLat) df.columns = ['lon', 'lat', 'race'] # CPU times: user 4min 1s, sys: 13.8 s, total: 4min 15s # Wall time: 4min 15s
The column race encodes ethnicity (w = white, b = black, a = Asian, h = Latino, o = other). The script defines geographic bounding boxes for the United States and several major cities, sets the output image size, and creates a reusable filter function:
USA = ((-124.72, -66.95), (23.55, 50.06)) LakeMichigan = ((-91.68, -83.97), (40.75, 44.08)) # ... other regions omitted for brevity ... width, height = 1000, 600 def area_filter(longitude, latitude): dd = df[(df.lon.between(*longitude)) & (df.lat.between(*latitude))] return dd
For the first example, the script aggregates points within the whole US into a 1000 × 600 canvas and renders the density using different colormaps and scaling methods (linear, log, equal‑histogram). Images are exported to the export folder and displayed inline in the notebook.
dd = area_filter(*USA) cvs = ds.Canvas(plot_width=width, plot_height=height) agg = cvs.points(dd, y='lat', x='lon').fillna(0) export(tf.shade(agg, cmap=colorcet.fire), "USA_fire") export(tf.shade(agg, cmap=colorcet.fire, how='linear'), "USA_fire_linear") export(tf.shade(agg, cmap=colorcet.fire, how='log'), "USA_fire_log") export(tf.shade(agg, cmap=colorcet.fire, how='eq_hist'), "USA_fire_eq_hist")
To improve visual contrast, the script demonstrates discarding the darkest 20 % of the colormap using colormap_select and also shows alternative palettes from Datashader and Matplotlib:
from datashader.colors import colormap_select, viridis export(tf.shade(agg, colormap_select(colorcet.fire, 0.2)), "USA_fire_eq_hist") export(tf.shade(agg, colormap_select(viridis, 0.2)), "census_viridis_eq_hist") from matplotlib.cm import magma export(tf.shade(agg, magma), "USA_mpl")
For racial distribution, a color key maps each race code to a distinct hue. A helper function creates images for any defined region:
color_key = {'w':'aqua', 'b':'lime', 'a':'red', 'h':'fuchsia', 'o':'yellow'} def creat_image(area): dd = area_filter(*area) cvs = ds.Canvas(plot_width=width, plot_height=height) agg = cvs.points(dd, 'lon', 'lat', ds.count_cat('race')) img = tf.shade(agg, color_key=color_key, how='eq_hist') return img export(creat_image(USA), "USA-race") export(creat_image(LakeMichigan), "LakeMichigan") export(creat_image(Chicago), "Chicago") # ... other cities omitted for brevity ...
The resulting images show population density and ethnic composition across the United States and selected metropolitan areas, illustrating how Datashader can handle millions of points efficiently while providing flexible color mapping and export capabilities.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.