Fundamentals 11 min read

Understanding Protobuf Encoding: Varints, ZigZag, Embedded Messages and Best Practices

This article explains Google Protocol Buffers' binary serialization, covering its advantages and drawbacks, the encoding mechanisms of tags, varints, ZigZag, embedded and repeated fields, and provides practical best‑practice guidelines for designing robust .proto schemas.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Understanding Protobuf Encoding: Varints, ZigZag, Embedded Messages and Best Practices

Protobuf is a Google‑developed binary serialization framework that offers high efficiency, cross‑language support, clear schema definition and backward compatibility.

Advantages include compact binary format, fast (de)serialization, language‑agnostic definitions and safe evolution of messages; disadvantages are non‑human‑readable format, lack of built‑in date/time types and the need for an extra compilation step.

Encoding principles

1. Overview – a message is a series of key‑value pairs where each field is represented by a tag (field number + wire type) followed by the encoded value.

2. Varint – integers are encoded using a variable‑length format that drops leading zero bits; numbers ≤127 occupy two bytes (tag + value), larger numbers use additional bytes.

message Student {
  string name = 1;
  int32 age = 2;
}

Example Go code printing the binary of a Student with name “t”:

func main() {
    student := Student{}
    student.Name = "t"
    marshal, _ := proto.Marshal(&student)
    fmt.Println(fmt.Sprintf("%08b", marshal)) // 00001010 00000001 01110100
}

3. ZigZag – maps signed integers to unsigned values before applying Varint, allowing negative numbers to be stored efficiently (e.g., -1 → 1, 1 → 2).

(n << 1) ^ (n >> 31) // 32‑bit

4. Embedded messages & repeated fields – repeated fields are encoded as a length‑delimited list, while embedded messages include their size before the nested payload.

message Lecture {
  int32 price = 1;
}
message Student {
  repeated int32 scores = 1;
  Lecture lecture = 2;
}

Best practices

Use field numbers 1‑15 for frequently used fields to keep tags one byte.

Reserve removed fields with the reserved keyword instead of deleting them.

Never change a field’s tag number or type after release.

Avoid the required label; it is deprecated in proto3.

Prefer small integer values and the sint32 / sint64 types for negative numbers.

goSerializationProtobufbest practicesZigZagVarint
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.