Databases 4 min read

Counting Routes with Identical City Lists Using PostgreSQL Arrays

This article explains how to transform a non‑visible 0x03‑separated city string stored in a PostgreSQL table into a sorted array so that routes with the same set of cities can be grouped and counted efficiently.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Counting Routes with Identical City Lists Using PostgreSQL Arrays

In the Qunar data warehouse there is a table rtr that stores extracted travel routes, with a column cities containing a list of city names separated by the non‑printable character 0x03. The requirement is to count how many routes share exactly the same set of cities.

The analysis notes that PostgreSQL provides string_to_array() to split a delimited string into an array, that arrays can be compared with the equality operator, and that array equality requires the same element order. Two main difficulties are identified: the extracted city lists are unordered (e.g., "北京,上海,广州" vs "上海,北京,广州") and the delimiter 0x03 is invisible.

The proposed solution consists of three steps: replace the 0x03 delimiter with a standard one such as a comma, convert the comma‑separated string into an array and sort it to ensure a consistent order, and finally compare the resulting ordered arrays. The sorting is achieved by unnesting the array into rows, ordering those rows, and aggregating them back with array_agg() .

The initial query implementing this approach is:

select rid, array_agg(city) as cities
from (
  select rid, unnest(string_to_array(replace(cities, E'\x03', ','), ',')) city
  from rtr
  order by rid, city
  limit 10
) a
group by rid;

This produces a sorted cities array that can be grouped (e.g., group by cities ) to count identical routes and, after ordering, to examine differences between groups.

An improvement removes the unnecessary replace() call by feeding the raw 0x03 delimiter directly to string_to_array() :

select rid, array_agg(city) as cities
from (
  select rid, unnest(string_to_array(cities, E'\x03')) city
  from rtr
  order by rid, city
  limit 10
) a
group by rid;

The author, He Weiping, is a search and database researcher at Qunar, former Yahoo China search engineer, and translator of the first Chinese PostgreSQL manual and the third edition of "Programming Perl".

SQLPostgreSQLArraystring processing
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.