Backend Development 9 min read

How to Build a Zero‑Copy, Low‑Latency Network Protocol in Rust

This article explains how to design and implement a high‑performance custom network protocol in Rust, covering zero‑copy parsing, memory‑mapped packet pools, lock‑free event loops, and an efficient binary packet format to achieve ultra‑low latency and massive concurrency.

Architecture Development Notes
Architecture Development Notes
Architecture Development Notes
How to Build a Zero‑Copy, Low‑Latency Network Protocol in Rust

When millisecond‑level latency is critical and existing TCP/UDP protocols fall short, a custom network protocol becomes essential. This article explores designing and implementing a high‑performance protocol capable of handling millions of connections with ultra‑low latency.

Building a Zero‑Copy Protocol Parser

First, create an efficient parser to minimize memory copies:

<code>pub struct ProtocolParser<'a> {
    // Zero‑copy buffer view
    buffer: &'a [u8],
    // Current parse position
    position: usize,
    // Memory‑mapped packet pool
    packet_pool: Arc<PacketPool>,
}

impl<'a> ProtocolParser<'a> {
    pub fn parse_packet(&mut self) -> Result<Packet, ParseError> {
        // Ensure header can be read
        if self.remaining() < HEADER_SIZE {
            return Err(ParseError::Incomplete);
        }

        // Parse header without copying
        let header = unsafe {
            std::ptr::read_unaligned(self.current_ptr() as *const PacketHeader)
        };

        // Validate header
        if !header.is_valid() {
            return Err(ParseError::InvalidHeader);
        }

        // Ensure full packet data is available
        let total_size = header.size as usize;
        if self.remaining() < total_size {
            return Err(ParseError::Incomplete);
        }

        // Acquire packet from pool
        let mut packet = self.packet_pool.acquire(total_size)?;

        // Copy payload if needed
        if header.flags.contains(Flags::NEEDS_COPY) {
            packet.write_payload(&self.buffer[self.position + HEADER_SIZE..self.position + total_size])?;
        } else {
            // Use zero‑copy reference
            packet.set_payload_ref(&self.buffer[self.position + HEADER_SIZE..self.position + total_size]);
        }

        // Update parse position
        self.position += total_size;

        Ok(packet)
    }

    #[inline]
    fn current_ptr(&self) -> *const u8 {
        unsafe { self.buffer.as_ptr().add(self.position) }
    }

    #[inline]
    fn remaining(&self) -> usize {
        self.buffer.len() - self.position
    }
}
</code>

Memory‑Mapped Packet Pool

To manage memory efficiently, use a memory‑mapped packet pool:

<code>pub struct PacketPool {
    // Pre‑allocated buffers
    buffers: Vec<MmapMut>,
    // List of free packet slots (buffer_idx, offset)
    free_slots: Mutex<Vec<(usize, usize)>>,
}

impl PacketPool {
    pub fn new(buffer_size: usize, buffer_count: usize) -> io::Result<Self> {
        let mut buffers = Vec::with_capacity(buffer_count);
        let mut free_slots = Vec::new();

        for i in 0..buffer_count {
            // Create memory‑mapped buffer
            let mut buffer = MmapMut::map_anon(buffer_size)?;

            // Add slots to free list
            let slots_per_buffer = buffer_size / MAX_PACKET_SIZE;
            for slot in 0..slots_per_buffer {
                free_slots.push((i, slot * MAX_PACKET_SIZE));
            }

            buffers.push(buffer);
        }

        Ok(Self {
            buffers,
            free_slots: Mutex::new(free_slots),
        })
    }

    pub fn acquire(&self, size: usize) -> Result<Packet, PoolError> {
        let (buffer_idx, offset) = {
            let mut slots = self.free_slots.lock().unwrap();
            slots.pop().ok_or(PoolError::NoFreeSlots)?
        };

        Ok(Packet::new(&mut self.buffers[buffer_idx], offset, size))
    }
}
</code>

Implementing a Lock‑Free Event Loop

Design a lock‑free event loop to handle network events efficiently:

<code>pub struct NetworkEventLoop {
    // Multi‑producer single‑consumer channel event queue
    event_queue: Arc<crossbeam::channel::Sender<NetworkEvent>>,
    // Epoll/IOCP event handler
    event_handler: EventHandler,
    // Connection manager
    connections: Arc<ConnectionManager>,
}

impl NetworkEventLoop {
    pub fn run(&self) -> Result<(), Error> {
        let mut events = Vec::with_capacity(1024);

        loop {
            // Wait for network events
            let count = self.event_handler.wait(&mut events, None)?;

            // Batch process events
            for i in 0..count {
                let event = &events[i];
                match event.kind() {
                    EventKind::Read => self.handle_read(event.connection_id())?,
                    EventKind::Write => self.handle_write(event.connection_id())?,
                    EventKind::Close => self.handle_close(event.connection_id())?,
                }
            }

            // Clear events
            events.clear();
        }
    }

    fn handle_read(&self, conn_id: ConnectionId) -> Result<(), Error> {
        let conn = self.connections.get(conn_id)?;

        // Read data without copying
        let read_buffer = conn.read_buffer()?;

        loop {
            match self.parser.parse_packet(read_buffer) {
                Ok(packet) => {
                    // Process complete packet
                    self.event_queue.send(NetworkEvent::Packet {
                        connection: conn_id,
                        packet,
                    })?;
                }
                Err(ParseError::Incomplete) => {
                    // Need more data
                    break;
                }
                Err(e) => return Err(e.into()),
            }
        }

        Ok(())
    }
}
</code>

Lock‑Free Connection Manager

A lock‑free manager efficiently allocates and tracks connections:

<code>pub struct ConnectionManager {
    // Atomic pointer array of connection slots
    connections: Box<[AtomicPtr<Connection>]>,
    // Slot allocator
    slot_allocator: SlotAllocator,
}

impl ConnectionManager {
    pub fn add_connection(&self, socket: Socket) -> Result<ConnectionId, Error> {
        // Allocate slot
        let slot = self.slot_allocator.allocate()?;

        // Create connection
        let conn = Box::new(Connection::new(socket));
        let conn_ptr = Box::into_raw(conn);

        // Store in slot
        self.connections[slot].store(conn_ptr, Ordering::Release);

        Ok(ConnectionId(slot))
    }

    pub fn get(&self, id: ConnectionId) -> Result<&Connection, Error> {
        let ptr = self.connections[id.0].load(Ordering::Acquire);
        if ptr.is_null() {
            return Err(Error::InvalidConnection);
        }
        Ok(unsafe { &*ptr })
    }
}
</code>

Custom Protocol Implementation

Finally, implement an efficient binary protocol:

<code>#[repr(C, packed)]
pub struct PacketHeader {
    magic: u32,      // Protocol magic number
    version: u16,    // Protocol version
    flags: Flags,    // Packet flags
    size: u32,       // Total packet size
    seq: u64,        // Sequence number
    checksum: u32,   // Header checksum
}

impl PacketHeader {
    pub fn is_valid(&self) -> bool {
        // Verify magic number
        if self.magic != PROTOCOL_MAGIC {
            return false;
        }
        // Verify version
        if self.version != PROTOCOL_VERSION {
            return false;
        }
        // Verify size
        if self.size < size_of::<PacketHeader>() as u32 || self.size > MAX_PACKET_SIZE {
            return false;
        }
        // Verify checksum
        let computed = self.compute_checksum();
        computed == self.checksum
    }

    #[inline]
    fn compute_checksum(&self) -> u32 {
        // Compute CRC32 checksum
        let mut header = *self;
        header.checksum = 0;
        let bytes = unsafe {
            std::slice::from_raw_parts(&header as *const _ as *const u8, size_of::<PacketHeader>())
        };
        crc32fast::hash(bytes)
    }
}
</code>

By applying these techniques, you can design and implement a high‑performance network protocol that meets ultra‑low latency and high‑concurrency requirements. The flexibility and efficiency of a custom protocol make it well‑suited for modern network applications.

RustZero Copyhigh performancelock-freelow latencynetwork protocol
Architecture Development Notes
Written by

Architecture Development Notes

Focused on architecture design, technology trend analysis, and practical development experience sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.