Backend Development 13 min read

Using Redis Data Structures for Efficient Large‑Scale Statistics: Cardinality, Sorting, and Aggregation

The article explains how to choose appropriate Redis data structures—such as Bitmap, HyperLogLog, Set, List, Hash, and Sorted Set—to efficiently handle massive statistical scenarios like UV counting, ranking, and set‑based aggregation, while providing concrete command examples and performance considerations.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Using Redis Data Structures for Efficient Large‑Scale Statistics: Cardinality, Sorting, and Aggregation

In mobile‑app business scenarios we often need to associate a key with a large collection of data and perform statistical sorting on that collection. Typical use cases include checking a user’s login status, counting 7‑day consecutive sign‑ins for hundreds of millions of users, daily new‑user and retention statistics, UV (Unique Visitor) counting, latest comment lists, and music‑play ranking.

Because the number of users and visits can reach millions or even billions, we must select collection types that can efficiently handle such scale.

Four statistical types are introduced: binary state statistics, aggregate statistics, sorted statistics, and cardinality statistics.

Cardinality Statistics

Cardinality statistics count the number of distinct elements in a collection, commonly used for UV calculation.

The naive approach uses a Set , which adds an element only when it has never appeared before. However, for massive traffic a plain Set consumes excessive memory and may not need exact precision.

Redis provides the HyperLogLog data structure, an approximate distinct‑count algorithm with a standard error of 0.81% and a fixed memory footprint (≈12 KB) regardless of the number of elements.

Typical commands:

PFADD mypage:uv userID1 userID2 userID3
PFCOUNT mypage:uv

Multiple HyperLogLog structures can be merged with PFMERGE to obtain a combined cardinality.

PFMERGE mergedKey hll1 hll2

Website UV via Set

Using a Set, each user ID is added once per day:

SADD RedisWhyFast:uv 89757

The UV is obtained with SCARD :

SCARD RedisWhyFast:uv

Website UV via Hash

Alternatively, store the user ID as a hash field and set its value to 1 on each visit.
HSET redisCluster:uv userId:89757 1

UV is then the hash length:

HLEN redisCluster:uv

HyperLogLog as the Preferred Solution

When the number of unique visitors reaches tens of millions, a Set or Hash would consume prohibitive memory, while HyperLogLog keeps memory usage constant.

Sorted Statistics

Redis offers four collection types: List, Set, Hash, and Sorted Set. List and Sorted Set preserve order.

List : ordered by insertion order, suitable for message queues, latest‑item lists, simple leaderboards.

Sorted Set : ordered by a numeric score , ideal for leaderboards based on play count, likes, etc.

Latest Comment List (List)

Use LPUSH to insert new comments at the head and LRANGE to fetch a range.
LPUSH commentList 1 2 3 4 5 6
LRANGE commentList 0 4

Lists are unsuitable for high‑frequency updates with pagination because inserted elements shift existing indices, causing duplicate or missing items on subsequent pages.

Leaderboard (Sorted Set)

Store music IDs in a Sorted Set where the score is the play count. Increment the score with ZINCRBY , retrieve top N with ZREVRANGE or ZRANGEBYSCORE .

ZADD musicTop 100000000 青花瓷 8999999 花田错
ZINCRBY musicTop 1 青花瓷
ZREVRANGE musicTop 0 9 WITHSCORES

Aggregate Statistics

Aggregate statistics involve set operations such as intersection, difference, and union.

Intersection – Common Friends

SINTERSTORE commonFriends user:alice user:bob

Difference – Daily New Users

SDIFFSTORE newUsers user:20210602 user:20210601

Union – Total New Users Over Two Days

SUNIONSTORE totalNew user:20210602 user:20210601

Because set operations can be costly on large datasets, it is recommended to offload aggregation to a dedicated Redis cluster or perform the computation on the client side to avoid blocking the primary service.

backendHyperLogLogRedisstatisticsData Structuressorted set
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.