Fundamentals 7 min read

Understanding GeoHash: Principles and Application for Proximity Queries

This article explains the GeoHash algorithm, describing how latitude‑longitude coordinates are converted into binary strings and Base32 codes, how these codes enable efficient proximity searches such as finding nearby ride‑hailing passengers, and discusses the limitations of the approach.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
Understanding GeoHash: Principles and Application for Proximity Queries

Background

Have you ever wondered how a ride‑hailing service knows which drivers are near a passenger and pushes the request to them? The basic idea is to obtain each passenger's longitude and latitude (lng, lat), which form a 2‑D coordinate.

After acquiring the coordinates, a straightforward but inefficient method is to store each (lng, lat) pair in an array and, when searching for users within a certain radius (e.g., 100 meters), iterate over the entire array, compute the distance for each entry, and filter those inside the range.

While this method works in principle, the data volume in real‑world scenarios is massive; traversing the whole dataset for every query is computationally and storage‑wise infeasible. The article explores how to optimise this process.

GeoHash Basic Principle Introduction

First, the article introduces the basic principle of the GeoHash algorithm.

Each coordinate has a longitude and latitude. GeoHash converts these values into a binary representation by repeatedly bisecting the range (a binary search). The resulting binary strings are then compressed using Base32 encoding.

For example, given the coordinate (116.3111126, 40.085003), the longitude and latitude are each bisected to produce binary strings. The longitude step is illustrated in the following image:

The latitude step is shown below:

The process repeatedly bisects the range; if the original value is greater than the midpoint, the corresponding bit is 1, otherwise 0. Recursing yields a binary string for latitude, e.g., 101110010000001 (15 bits).

Combining the 30‑bit strings from longitude ( 110100101011010 ) and latitude ( 101110010000001 ) by interleaving them (longitude first, then latitude) results in a single binary string such as 111001110100100110001010001001 .

Next, the binary string is split into 5‑bit groups and each group indexes into a predefined 32‑character array (e.g., ['a','b','1','2','3','4','5','6','7','A',...]) to obtain the Base32 representation.

Note: The number of bits used for each coordinate can be adjusted; higher precision yields longer strings and greater computational cost, so a trade‑off is required.

How GeoHash Is Applied to the Problem

Because GeoHash encodes coordinates into a string, points that are close together share longer common prefixes. By matching prefixes, one can quickly locate passengers within a desired radius, as illustrated in the following diagram:

This prefix‑matching approach makes it easy to retrieve users in the target area.

Remaining Issues

However, a single Base32 cell covers an area rather than an exact point, leading to edge cases. For example, two points A and B may have the same prefix length, suggesting they are closer, while in reality point A is farther from the query point than points E, C, or D.

This edge‑case issue requires additional handling, which will be addressed in a future update.

Source

Source: juejin.cn/post/7270916734138908672

Join the backend‑focused technical community for high‑quality discussions, job referrals, and industry insights.

Advertisements are prohibited; do not trust private messages that may be scams.

backendalgorithmgeohashspatial indexingproximity search
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.