Cloud Native 26 min read

Deep Dive into Apache Pulsar Producer: Architecture, Parameters, and Performance Tuning

This article provides a comprehensive analysis of Apache Pulsar's Producer component, detailing its message-sending workflow, key design principles, configuration parameters, and practical performance tuning techniques to improve throughput, reduce latency, and lower resource consumption in large‑scale cloud‑native messaging systems.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Deep Dive into Apache Pulsar Producer: Architecture, Parameters, and Performance Tuning

Author: vivo Internet Big Data Team - Quan Limin. This is the first article in the series "vivo Pulsar Trillion‑Level Message Processing Practice".

The article focuses on the Pulsar client module's Producer, dissecting its data‑sending principles step by step, sharing real‑world parameter‑tuning cases, and illustrating how Producer impacts the stability and performance of a messaging middleware system.

1. Brief Introduction to Pulsar

Pulsar is a next‑generation cloud‑native messaging middleware incubated by the Apache Software Foundation. It offers storage‑compute separation, peer‑to‑peer nodes, independent scaling, real‑time load balancing, and fast node recovery, supporting multiple languages and deployment environments.

Pulsar consists of four core modules: broker, BookKeeper, client (Producer and Consumer), and ZooKeeper for metadata and coordination. It is widely used in cloud computing, big data, and IoT for real‑time message transmission.

2. Pulsar Producer Analysis

The Producer’s data‑sending flow is illustrated using a typical scenario: compression enabled, batch sending to a partitioned topic. The flow comprises twelve steps:

① Create Producer – a PartitionedProducerImpl object manages a ProducerImpl per partition.

② Build Message – encapsulate topic name, payload, schema, metadata, etc.

③ Determine Target Partition – routing strategy selects the partition.

④ Interceptor – custom interceptor can modify messages before sending.

⑤ Message Back‑pressure Control – semaphore and memory checks limit the ingest rate.

⑥ Batch Container Management – messages are cached in a batch buffer before being sent.

⑦ Message Serialization – each message is serialized before network transmission.

⑧ Compression – batch or single‑message compression reduces network I/O.

⑨ Build Send Object – wrapped into an OpSendMsg, the smallest unit for broker processing.

⑩ Pending Queue – OpSendMsg objects are placed in pendingMessages until ACKed.

⑪ Message Transmission – Netty asynchronously sends messages to the broker.

⑫ Response Handling – broker ACKs are processed; failures trigger retries.

Key code snippets are shown, e.g., creating a ProducerImpl per partition and routing logic for SinglePartition and RoundRobin strategies, wrapped in ... tags.

3. Parameter Tuning Practice

The tuning aims to lower the usage barrier of dozens of Pulsar client parameters and boost single‑machine throughput. Four critical aspects are highlighted: batch sending, compression, RoundRobin routing, and back‑pressure control (maxPendingMessages, memoryLimit).

A table of default versus tuned parameter values is provided (image omitted). Formulas are given to compute sensible defaults based on message size and partition count, e.g.,

<code>maxPendingMessages = 1000‑2000
maxPendingMessagesAcrossPartitions = maxPendingMessages * partitionNum
memoryLimit = maxPendingMessages * partitionNum * messageByte
batchingMaxMessages = maxPendingMessages / 2
batchingMaxBytes = Math.min(memoryLimit * 1024 * 1024 / partitionNum / 2, 1048576)
batchingMaxPublishDelayMicros = 1ms‑100ms
batchingPartitionSwitchFrequencyByPublishDelay = 1</code>

Performance results show that after tuning, network traffic drops ~50%, CPU load reduces ~90%, and overall system cost is cut by more than half while maintaining the same write rate.

4. Conclusion

Understanding Producer internals and core parameters is the most effective way to write high‑performance data‑sending programs. Simple client‑side optimizations can yield massive gains, as demonstrated by the case study where server‑side processing capacity more than doubled and costs were dramatically reduced.

cloud-nativeperformance tuningMessagingproducerApache Pulsar
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.