Understanding Protobuf Encoding: Varints, ZigZag, Embedded Messages and Best Practices
This article explains Google Protocol Buffers' binary serialization, covering its advantages and drawbacks, the encoding mechanisms of tags, varints, ZigZag, embedded and repeated fields, and provides practical best‑practice guidelines for designing robust .proto schemas.
Protobuf is a Google‑developed binary serialization framework that offers high efficiency, cross‑language support, clear schema definition and backward compatibility.
Advantages include compact binary format, fast (de)serialization, language‑agnostic definitions and safe evolution of messages; disadvantages are non‑human‑readable format, lack of built‑in date/time types and the need for an extra compilation step.
Encoding principles
1. Overview – a message is a series of key‑value pairs where each field is represented by a tag (field number + wire type) followed by the encoded value.
2. Varint – integers are encoded using a variable‑length format that drops leading zero bits; numbers ≤127 occupy two bytes (tag + value), larger numbers use additional bytes.
message Student {
string name = 1;
int32 age = 2;
}Example Go code printing the binary of a Student with name “t”:
func main() {
student := Student{}
student.Name = "t"
marshal, _ := proto.Marshal(&student)
fmt.Println(fmt.Sprintf("%08b", marshal)) // 00001010 00000001 01110100
}3. ZigZag – maps signed integers to unsigned values before applying Varint, allowing negative numbers to be stored efficiently (e.g., -1 → 1, 1 → 2).
(n << 1) ^ (n >> 31) // 32‑bit4. Embedded messages & repeated fields – repeated fields are encoded as a length‑delimited list, while embedded messages include their size before the nested payload.
message Lecture {
int32 price = 1;
}
message Student {
repeated int32 scores = 1;
Lecture lecture = 2;
}Best practices
Use field numbers 1‑15 for frequently used fields to keep tags one byte.
Reserve removed fields with the reserved keyword instead of deleting them.
Never change a field’s tag number or type after release.
Avoid the required label; it is deprecated in proto3.
Prefer small integer values and the sint32 / sint64 types for negative numbers.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.