Backend Development 27 min read

Performance Testing of Java and Go High‑Performance Message Queues Using LinkedBlockingQueue

This article presents a detailed performance evaluation of Java and Go high‑throughput message queues, focusing on LinkedBlockingQueue, exploring test scenarios based on message size and thread count, analyzing producer and consumer results, providing benchmark data, and sharing Groovy test cases for reproducibility.

FunTester
FunTester
FunTester
Performance Testing of Java and Go High‑Performance Message Queues Using LinkedBlockingQueue

Conclusion

Overall, java.util.concurrent.LinkedBlockingQueue can sustain around 500k QPS, meeting current load‑testing needs, but its performance becomes unstable when the queue grows long. Three practical recommendations are: keep message bodies small, limit the benefit of adding more threads, and avoid queue backlog.

Introduction

After publishing articles on Disruptor and a ten‑million‑level log replay engine, I prepared performance tests for several high‑performance message queues in Java and Go, selecting benchmark scenarios and application cases.

The test scenario design considers two aspects: message body size (distinguished by different GET request sizes) and the number of producer/consumer threads (called goroutine in Go).

Note: In subsequent Go articles, "thread" always refers to a goroutine.

Object Overview

The first tested object is java.util.concurrent.LinkedBlockingQueue , a linked‑node based optionally‑bounded blocking queue that follows FIFO ordering. The official definition states that linked queues usually provide higher throughput than array‑based queues but can exhibit less predictable performance in many concurrent applications.

Among JDK‑provided queue implementations, LinkedBlockingQueue shows the best performance, with ArrayBlockingQueue as a secondary candidate. Reported data suggests that LinkedBlockingQueue is roughly 2‑3 times faster than ArrayBlockingQueue .

Test Results

Performance is measured solely by the number of messages processed per millisecond.

Data Description

Three types of org.apache.http.client.methods.HttpGet requests are used, differing in header and URL length to simulate small, medium, and large message bodies.

def get = new HttpGet()

Medium object example:

def get = new HttpGet(url)
get.addHeader("token", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)

Large object example:

def get = new HttpGet(url + token)
get.addHeader("token", token)
get.addHeader("token1", token)
get.addHeader("token5", token)
get.addHeader("token4", token)
get.addHeader("token3", token)
get.addHeader("token2", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)

Producer

Object Size

Queue Length (M)

Threads

Rate (/ms)

Small

1

1

838

Small

1

5

837

Small

1

10

823

Small

5

1

483

Small

10

1

450

Medium

1

1

301

Medium

1

5

322

Medium

1

10

320

Medium

1

20

271

Medium

5

1

Failure

Medium

10

1

Failure

Medium

0.5

1

351

Medium

0.5

5

375

Large

1

1

214

Large

1

5

240

Large

1

10

241

Large

0.5

1

209

Large

0.5

5

250

Large

0.5

10

246

Large

0.2

1

217

Large

0.2

5

309

Large

0.2

10

321

Large

0.2

20

243

Two middle tests failed because the wait time became too long and the process stalled around 3 million operations.

Conclusions for org.apache.http.client.methods.HttpRequestBase messages:

Keep length around one hundred thousand.

Use 5‑10 producer threads.

Make the message body as small as possible.

Consumer

Object Size

Queue Length (M)

Threads

Rate (/ms)

Small

1

1

1893

Small

1

5

1706

Small

1

10

1594

Small

1

20

1672

Small

2

1

2544

Small

2

5

2024

Small

5

1

3419

Medium

1

1

1897

Medium

1

5

1485

Medium

1

10

1345

Medium

1

20

1430

Medium

2

1

2971

Medium

2

5

1576

Large

1

1

1980

Large

1

5

1623

Large

1

10

1689

Large

0.5

1

1136

Large

0.5

5

1096

Large

0.5

10

1072

Conclusions for org.apache.http.client.methods.HttpRequestBase messages:

Longer messages tend to improve throughput.

Fewer consumer threads are better.

Keep the message body as small as possible.

The main difference from the producer side is that less lock contention and larger message volumes lead to higher speeds.

Producer & Consumer Combined

Thread count refers to the number of producers or consumers; the total thread count is twice this number.

Object Size

Runs (M)

Threads

Queue Length (M)

Rate (/ms)

Small

1

1

0.1

1326

Small

1

1

0.2

1050

Small

1

1

0.5

1054

Small

1

5

0.1

1091

Small

1

10

0.1

1128

Small

2

1

0.1

1798

Small

2

1

0.2

1122

Small

2

5

0.2

946

Small

5

5

0.1

1079

Small

5

10

0.1

1179

Medium

1

1

0.1

632

Medium

1

1

0.2

664

Medium

1

5

0.2

718

Medium

1

10

0.2

683

Medium

2

1

0.2

675

Medium

2

5

0.2

735

Medium

2

10

0.2

788

Medium

2

15

0.2

828

Large

1

1

0.1

505

Large

1

1

0.2

558

Large

1

5

0.2

609

Large

1

10

0.2

496

Large

2

1

0.2

523

Large

2

5

0.2

759

Large

2

10

0.2

668

Test Cases (Groovy)

The test cases are written in Groovy, using a custom asynchronous keyword fun and closures to simplify multithreaded code. Below are three representative scenarios.

Producer Scenario

package com.funtest.groovytest

import com.funtester.config.HttpClientConstant
import com.funtester.frame.SourceCode
import com.funtester.utils.CountUtil
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.atomic.AtomicInteger

class QueueT extends SourceCode {
    static AtomicInteger index = new AtomicInteger(0)
    static int total = 100_0000
    static int size = 10
    static int threadNum = 1
    static int piece = total / size
    static def url = "http://localhost:12345/funtester"
    static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"

    public static void main(String[] args) {
        LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
        def start = Time.getTimeStamp()
        def latch = new CountDownLatch(threadNum)
        def barrier = new CyclicBarrier(threadNum + 1)
        def funtester = {
            fun {
                barrier.await()
                while (true) {
                    if (index.getAndIncrement() % piece == 0) {
                        def l = Time.getTimeStamp() - start
                        ts << l
                        output("${formatLong(index.get())} add cost ${formatLong(l)}")
                        start = Time.getTimeStamp()
                    }
                    if (index.get() > total) break
                    def get = new HttpGet(url)
                    get.addHeader("token", token)
                    get.addHeader(HttpClientConstant.USER_AGENT)
                    get.addHeader(HttpClientConstant.CONNECTION)
                    linkedQ.put(get)
                }
                latch.countDown()
            }
        }
        threadNum.times { funtester() }
        def st = Time.getTimeStamp()
        barrier.await()
        latch.await()
        def et = Time.getTimeStamp()
        outRGB("Rate per ms ${total / (et - st)}")
        outRGB(CountUtil.index(ts).toString())
    }
}

Consumer Scenario

package com.funtest.groovytest

import com.funtester.config.HttpClientConstant
import com.funtester.frame.SourceCode
import com.funtester.utils.CountUtil
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.CyclicBarrier
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicInteger

class QueueTconsume extends SourceCode {
    static AtomicInteger index = new AtomicInteger(1)
    static int total = 100_0000
    static int size = 10
    static int threadNum = 5
    static int piece = total / size
    static def url = "http://localhost:12345/funtester"
    static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"

    public static void main(String[] args) {
        LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
        def pwait = new CountDownLatch(10)
        def produces = {
            fun {
                while (true) {
                    if (linkedQ.size() > total) break
                    def get = new HttpGet(url)
                    get.addHeader("token", token)
                    get.addHeader(HttpClientConstant.USER_AGENT)
                    get.addHeader(HttpClientConstant.CONNECTION)
                    linkedQ.add(get)
                }
                pwait.countDown()
            }
        }
        10.times { produces() }
        pwait.await()
        outRGB("Data prepared! ${linkedQ.size()}")
        def start = Time.getTimeStamp()
        def barrier = new CyclicBarrier(threadNum + 1)
        def latch = new CountDownLatch(threadNum)
        def funtester = {
            fun {
                barrier.await()
                while (true) {
                    if (index.getAndIncrement() % piece == 0) {
                        def l = Time.getTimeStamp() - start
                        ts << l
                        output("${formatLong(index.get())} consume cost ${formatLong(l)}")
                        start = Time.getTimeStamp()
                    }
                    def poll = linkedQ.poll(100, TimeUnit.MILLISECONDS)
                    if (poll == null) break
                }
                latch.countDown()
            }
        }
        threadNum.times { funtester() }
        def st = Time.getTimeStamp()
        barrier.await()
        latch.await()
        def et = Time.getTimeStamp()
        outRGB("Rate per ms ${total / (et - st)}")
        outRGB(CountUtil.index(ts).toString())
    }
}

Producer & Consumer Combined Scenario

This test pre‑fills the queue to a specified initial length before running producer and consumer threads.

package com.funtest.groovytest

import com.funtester.frame.SourceCode
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.CyclicBarrier
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicInteger

class QueueBoth extends SourceCode {
    static AtomicInteger index = new AtomicInteger(1)
    static int total = 500_0000
    static int length = 50_0000
    static int threadNum = 5
    static def url = "http://localhost:12345/funtester"
    static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"

    public static void main(String[] args) {
        LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
        def latch = new CountDownLatch(threadNum * 2)
        def barrier = new CyclicBarrier(threadNum * 2 + 1)
        def ts = []
        def funtester = { f ->
            {
                fun {
                    barrier.await()
                    while (true) {
                        if (index.getAndIncrement() > total) break
                        f()
                    }
                    latch.countDown()
                }
            }
        }
        def produces = {
            def get = new HttpGet(url)
            get.addHeader("token", token)
            get.addHeader(HttpClientConstant.USER_AGENT)
            get.addHeader(HttpClientConstant.CONNECTION)
            linkedQ.put(get)
        }
        length.times { produces() }
        threadNum.times {
            funtester(produces)
            funtester { linkedQ.poll(100, TimeUnit.MILLISECONDS) }
        }
        def st = Time.getTimeStamp()
        barrier.await()
        latch.await()
        def et = Time.getTimeStamp()
        outRGB("Rate per ms ${total / (et - st) / 2}")
    }
}

Additional Observations

The performance of LinkedBlockingQueue is highly unstable; logs show that maximum latency can be ten to twenty times the minimum when the queue length reaches one million. Reducing the queue length to 500 k mitigates this instability, so keeping the queue as short as possible is advisable.

Benchmark Summary

Using the FunTester framework, the following rates (operations per millisecond) were observed for different object sizes and thread counts:

Test Object

Threads

Count (M)

Rate (/ms)

Small

1

1

5681

Small

5

1

8010

Small

5

5

15105

Medium

1

1

1287

Medium

5

1

2329

Medium

5

5

4176

Large

1

1

807

Large

5

1

2084

Large

5

5

3185

The test cases used Groovy code similar to the snippets above, with configurable thread numbers, total request counts, and message sizes.

Have Fun ~ Tester !

FunTester 2021 Summary

2022 Plan Template

"Programmers over 35 are eliminated" – 22 years old

Selenium JUnit Parameterization

QPS Sampler Implementation in Performance Test Framework

Testing Non‑Fixed Probability Algorithms

Mobile Test Engineer Career

Groovy Hot‑Update Java Practice

Java Thread‑Safe ReentrantLock

Interface Test Coverage (JaCoCo) Sharing

Selenium Python Tips (Part 3)

Console Color Output

JavaconcurrencyGoperformance testingmessage queueLinkedBlockingQueue
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.