Big Data 8 min read

Visualizing US Census Data with Datashader: A Step‑by‑Step Python Guide

This tutorial demonstrates how to load a large US census Parquet dataset, convert WebMercator coordinates, filter by geographic regions, and generate high‑resolution visualizations of population density and racial distribution using Python's Datashader library with various colormaps and export options.

Python Programming Learning Circle

Dec 4, 2021

Visualizing US Census Data with Datashader: A Step‑by‑Step Python Guide

This article reproduces an official Datashader example that visualizes US census data stored in Parquet format. It begins by listing the required Python packages and shows a screenshot of the environment configuration.

import datashader as ds
import datashader.transfer_functions as tf
import pandas as pd
import colorcet
from IPython.display import Image
import warnings
warnings.filterwarnings("ignore")

The dataset is read with pandas.read_parquet using the fastparquet engine for better speed and lower memory consumption:

%%time
df = pd.read_parquet('/home/mw/input/census2270/census2010.parq/census2010.parq', engine='fastparquet')
# CPU times: user 2.99 s, sys: 1.4 s, total: 4.39 s
# Wall time: 4.42 s

Because the original coordinates are in WebMercator, a conversion function is defined to obtain latitude values, and the easting values are scaled directly. This step is memory‑intensive and can take several minutes on machines with less than 16 GB RAM.

import math

def webMercator2LngLat(y):
    lat = y / 20037508.34 * 180
    lat = 180 / math.pi * (2 * math.atan(math.exp(lat * math.pi / 180)) - math.pi / 2)
    return lat

%%time
df.easting = df.easting / 20037508.34 * 180
df.northing = df.northing.apply(webMercator2LngLat)
df.columns = ['lon', 'lat', 'race']
# CPU times: user 4min 1s, sys: 13.8 s, total: 4min 15s
# Wall time: 4min 15s

The column race encodes ethnicity (w = white, b = black, a = Asian, h = Latino, o = other). The script defines geographic bounding boxes for the United States and several major cities, sets the output image size, and creates a reusable filter function:

USA = ((-124.72, -66.95), (23.55, 50.06))
LakeMichigan = ((-91.68, -83.97), (40.75, 44.08))
# ... other regions omitted for brevity ...

width, height = 1000, 600

def area_filter(longitude, latitude):
    dd = df[(df.lon.between(*longitude)) & (df.lat.between(*latitude))]
    return dd

For the first example, the script aggregates points within the whole US into a 1000 × 600 canvas and renders the density using different colormaps and scaling methods (linear, log, equal‑histogram). Images are exported to the export folder and displayed inline in the notebook.

dd = area_filter(*USA)
cvs = ds.Canvas(plot_width=width, plot_height=height)
agg = cvs.points(dd, y='lat', x='lon').fillna(0)
export(tf.shade(agg, cmap=colorcet.fire), "USA_fire")
export(tf.shade(agg, cmap=colorcet.fire, how='linear'), "USA_fire_linear")
export(tf.shade(agg, cmap=colorcet.fire, how='log'), "USA_fire_log")
export(tf.shade(agg, cmap=colorcet.fire, how='eq_hist'), "USA_fire_eq_hist")

To improve visual contrast, the script demonstrates discarding the darkest 20 % of the colormap using colormap_select and also shows alternative palettes from Datashader and Matplotlib:

from datashader.colors import colormap_select, viridis
export(tf.shade(agg, colormap_select(colorcet.fire, 0.2)), "USA_fire_eq_hist")
export(tf.shade(agg, colormap_select(viridis, 0.2)), "census_viridis_eq_hist")
from matplotlib.cm import magma
export(tf.shade(agg, magma), "USA_mpl")

For racial distribution, a color key maps each race code to a distinct hue. A helper function creates images for any defined region:

color_key = {'w':'aqua', 'b':'lime', 'a':'red', 'h':'fuchsia', 'o':'yellow'}

def creat_image(area):
    dd = area_filter(*area)
    cvs = ds.Canvas(plot_width=width, plot_height=height)
    agg = cvs.points(dd, 'lon', 'lat', ds.count_cat('race'))
    img = tf.shade(agg, color_key=color_key, how='eq_hist')
    return img

export(creat_image(USA), "USA-race")
export(creat_image(LakeMichigan), "LakeMichigan")
export(creat_image(Chicago), "Chicago")
# ... other cities omitted for brevity ...

The resulting images show population density and ethnic composition across the United States and selected metropolitan areas, illustrating how Datashader can handle millions of points efficiently while providing flexible color mapping and export capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Geospatial census datashader

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.