Backend Development 6 min read

Compressing User Tags and Models with Protostuff and Gzip

By serializing user feature data with Java's Protostuff (built on Protobuf) and then applying JDK Gzip compression before storing it in Redis, the author shrank typical 70 KB per‑user payloads to under 10 KB, enabling billions of records with cross‑language compatibility and no schema‑breakage.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Compressing User Tags and Models with Protostuff and Gzip

Recently while working on the algorithm engineering side, the author found that user‑related features (offline features, real‑time exposures, clicks, etc.) are large, and storing each user in Redis consumes 50‑70 KB or more.

To reduce memory usage, the author explored serialization and compression tools. Having previously used Protobuf for game servers, they chose Protobuf for serialization and the built‑in JDK Gzip for compression, leading to the approach described in this article.

1. What is Protobuf?

Protobuf is Google’s language‑agnostic binary data exchange format. Implementations exist for Java, C#, C++, Go, Python, and community ports for JavaScript, Lua, etc. It provides a compiler and runtime library for each language.

Because it is binary, it is much faster than XML and includes basic data‑type compression. It is suitable for inter‑service communication, heterogeneous environment data exchange, configuration files, and data storage.

2. What is Protostuff?

Protostuff is a Java runtime serialization library built on top of Protobuf. It eliminates the need to write .proto files manually; the library can generate schemas from existing Java classes, enabling cross‑language serialization when corresponding .proto definitions are created.

3. Code Implementation

The author shows how user feature data is serialized with Protostuff, compressed with Gzip, and stored in Redis. (The original article includes several screenshots of the code.)

4. Test Data Output

Original data size

71343 bytes

After Protostuff serialization

65280 bytes

After Gzip compression

7403 bytes

Number of feature values

7892 double values

Traditional serialization size

110677 bytes

After Protostuff serialization

71028 bytes

After Gzip compression

796 bytes

After Gzip decompression

71028 bytes

Feature count after deserialization

7892 double values

5. Summary

Using Protostuff allows unlimited expansion of the data structure stored in Redis without compatibility issues and provides multi‑language support. Other languages can read the data by defining the same .proto schema. Gzip further compresses the payload, dramatically reducing memory consumption, enabling a single Redis cluster to handle billions of user records.

JavaRedisProtostuffSerializationgzipdata compression
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.