Fundamentals 15 min read

Protobuf Encoding Principles and Optimization Techniques

The article explains how Protocol Buffers (proto3) encode basic and composite types using varint, zigzag, fixed-size and IEEE‑754 formats, describes tag and length field structures, and presents optimization strategies such as selecting size‑efficient types, flattening nested messages, and delta‑encoding to significantly reduce serialized byte‑stream size.

Tencent Cloud Developer

Apr 17, 2025

Protobuf Encoding Principles and Optimization Techniques

This article introduces the encoding principles of Protocol Buffers (proto3 syntax) and discusses several optimization techniques for reducing the serialized byte‑stream size.

1. Serialization Basics

Serialization converts a struct or class in memory to a byte stream for transmission and vice‑versa. Protobuf supports basic types (integers, floats, strings) and composite types (structs, arrays, maps). The article defines serialization as memory data → byte stream and deserialization as byte stream → memory data.

2. Basic Types Encoding

For proto3, the following encodings are used:

Fixed‑length integer types (int32, int64, uint32, uint64, bool, enum) use varint encoding.

Sint types (sint32, sint64) first apply zigzag then varint.

Fixed‑size types (fixed32, fixed64, sfixed32, sfixed64) store the raw 4‑ or 8‑byte value.

Floating‑point types (float, double) use IEEE‑754 representation.

String and bytes store the raw UTF‑8 (or raw) bytes.

Varint length formula: y = ceil(log2(x+1)/7). Zigzag maps signed integers to unsigned to improve varint compression.

3. Tag and Length Fields

Each field is stored as typeid length data. typeid packs the field number (tagNum) and a 3‑bit tagType. The article provides tables showing tagType values for different protobuf types.

typeid   length   data</code><code>+--------+--------+--------+</code><code>|xxxxxxxx|xxxxxxxx|xxxxxxxx|</code><code>+--------+--------+--------+

4. Example Message

enum C { C1 = 0; C2 = 1; }</code><code>message B { int32 X = 1; sint32 Y = 2; C Z = 3; }</code><code>message A { repeated float F1 = 1; map<string, B> F2 = 20; }

The serialized byte stream is shown and the layout of tagNum and tagType for field 20 (tagType = 2) is illustrated.

5. Optimization Techniques

5.1 Type Optimization – Choose the most size‑efficient protobuf type based on the value range. A table maps numeric ranges to recommended types (e.g., sint32 for [-2^14, 2^14‑1], fixed32 for larger ranges).

5.2 Structure Optimization – When messages are tightly packed, many repeated tagid fields can be eliminated by flattening the structure. The article rewrites a nested message C (containing repeated A and B) into a flat message with separate repeated fields, halving the byte‑stream length.

message C { repeated int32 xs = 1; repeated int32 ys = 2; int32 z = 3; }

5.3 Data Optimization – For fields with small variance (e.g., timestamps), store a base value once and encode only the differences (deltas). This can further compress the stream, especially when deltas fit into a few bits.

message A { int64 base = 1; repeated int64 timestamps = 2; }

After encoding, each timestamp is stored as a small delta from base, allowing bit‑level packing.

6. Future Work

The article notes that protobuf stores both structural and data information, which can be redundant for tightly‑packed data. It suggests researching algorithms that combine data characteristics with serialization to further eliminate redundancy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Optimization serialization Encoding Protobuf Data Structures Protocol Buffers

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.