Performance Testing of Java and Go High‑Performance Message Queues Using LinkedBlockingQueue
This article presents a detailed performance evaluation of Java and Go high‑throughput message queues, focusing on LinkedBlockingQueue, exploring test scenarios based on message size and thread count, analyzing producer and consumer results, providing benchmark data, and sharing Groovy test cases for reproducibility.
Conclusion
Overall, java.util.concurrent.LinkedBlockingQueue can sustain around 500k QPS, meeting current load‑testing needs, but its performance becomes unstable when the queue grows long. Three practical recommendations are: keep message bodies small, limit the benefit of adding more threads, and avoid queue backlog.
Introduction
After publishing articles on Disruptor and a ten‑million‑level log replay engine, I prepared performance tests for several high‑performance message queues in Java and Go, selecting benchmark scenarios and application cases.
The test scenario design considers two aspects: message body size (distinguished by different GET request sizes) and the number of producer/consumer threads (called goroutine in Go).
Note: In subsequent Go articles, "thread" always refers to a goroutine.
Object Overview
The first tested object is java.util.concurrent.LinkedBlockingQueue , a linked‑node based optionally‑bounded blocking queue that follows FIFO ordering. The official definition states that linked queues usually provide higher throughput than array‑based queues but can exhibit less predictable performance in many concurrent applications.
Among JDK‑provided queue implementations, LinkedBlockingQueue shows the best performance, with ArrayBlockingQueue as a secondary candidate. Reported data suggests that LinkedBlockingQueue is roughly 2‑3 times faster than ArrayBlockingQueue .
Test Results
Performance is measured solely by the number of messages processed per millisecond.
Data Description
Three types of org.apache.http.client.methods.HttpGet requests are used, differing in header and URL length to simulate small, medium, and large message bodies.
def get = new HttpGet()Medium object example:
def get = new HttpGet(url)
get.addHeader("token", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)Large object example:
def get = new HttpGet(url + token)
get.addHeader("token", token)
get.addHeader("token1", token)
get.addHeader("token5", token)
get.addHeader("token4", token)
get.addHeader("token3", token)
get.addHeader("token2", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)Producer
Object Size
Queue Length (M)
Threads
Rate (/ms)
Small
1
1
838
Small
1
5
837
Small
1
10
823
Small
5
1
483
Small
10
1
450
Medium
1
1
301
Medium
1
5
322
Medium
1
10
320
Medium
1
20
271
Medium
5
1
Failure
Medium
10
1
Failure
Medium
0.5
1
351
Medium
0.5
5
375
Large
1
1
214
Large
1
5
240
Large
1
10
241
Large
0.5
1
209
Large
0.5
5
250
Large
0.5
10
246
Large
0.2
1
217
Large
0.2
5
309
Large
0.2
10
321
Large
0.2
20
243
Two middle tests failed because the wait time became too long and the process stalled around 3 million operations.
Conclusions for org.apache.http.client.methods.HttpRequestBase messages:
Keep length around one hundred thousand.
Use 5‑10 producer threads.
Make the message body as small as possible.
Consumer
Object Size
Queue Length (M)
Threads
Rate (/ms)
Small
1
1
1893
Small
1
5
1706
Small
1
10
1594
Small
1
20
1672
Small
2
1
2544
Small
2
5
2024
Small
5
1
3419
Medium
1
1
1897
Medium
1
5
1485
Medium
1
10
1345
Medium
1
20
1430
Medium
2
1
2971
Medium
2
5
1576
Large
1
1
1980
Large
1
5
1623
Large
1
10
1689
Large
0.5
1
1136
Large
0.5
5
1096
Large
0.5
10
1072
Conclusions for org.apache.http.client.methods.HttpRequestBase messages:
Longer messages tend to improve throughput.
Fewer consumer threads are better.
Keep the message body as small as possible.
The main difference from the producer side is that less lock contention and larger message volumes lead to higher speeds.
Producer & Consumer Combined
Thread count refers to the number of producers or consumers; the total thread count is twice this number.
Object Size
Runs (M)
Threads
Queue Length (M)
Rate (/ms)
Small
1
1
0.1
1326
Small
1
1
0.2
1050
Small
1
1
0.5
1054
Small
1
5
0.1
1091
Small
1
10
0.1
1128
Small
2
1
0.1
1798
Small
2
1
0.2
1122
Small
2
5
0.2
946
Small
5
5
0.1
1079
Small
5
10
0.1
1179
Medium
1
1
0.1
632
Medium
1
1
0.2
664
Medium
1
5
0.2
718
Medium
1
10
0.2
683
Medium
2
1
0.2
675
Medium
2
5
0.2
735
Medium
2
10
0.2
788
Medium
2
15
0.2
828
Large
1
1
0.1
505
Large
1
1
0.2
558
Large
1
5
0.2
609
Large
1
10
0.2
496
Large
2
1
0.2
523
Large
2
5
0.2
759
Large
2
10
0.2
668
Test Cases (Groovy)
The test cases are written in Groovy, using a custom asynchronous keyword fun and closures to simplify multithreaded code. Below are three representative scenarios.
Producer Scenario
package com.funtest.groovytest
import com.funtester.config.HttpClientConstant
import com.funtester.frame.SourceCode
import com.funtester.utils.CountUtil
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.atomic.AtomicInteger
class QueueT extends SourceCode {
static AtomicInteger index = new AtomicInteger(0)
static int total = 100_0000
static int size = 10
static int threadNum = 1
static int piece = total / size
static def url = "http://localhost:12345/funtester"
static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"
public static void main(String[] args) {
LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
def start = Time.getTimeStamp()
def latch = new CountDownLatch(threadNum)
def barrier = new CyclicBarrier(threadNum + 1)
def funtester = {
fun {
barrier.await()
while (true) {
if (index.getAndIncrement() % piece == 0) {
def l = Time.getTimeStamp() - start
ts << l
output("${formatLong(index.get())} add cost ${formatLong(l)}")
start = Time.getTimeStamp()
}
if (index.get() > total) break
def get = new HttpGet(url)
get.addHeader("token", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)
linkedQ.put(get)
}
latch.countDown()
}
}
threadNum.times { funtester() }
def st = Time.getTimeStamp()
barrier.await()
latch.await()
def et = Time.getTimeStamp()
outRGB("Rate per ms ${total / (et - st)}")
outRGB(CountUtil.index(ts).toString())
}
}Consumer Scenario
package com.funtest.groovytest
import com.funtester.config.HttpClientConstant
import com.funtester.frame.SourceCode
import com.funtester.utils.CountUtil
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.CyclicBarrier
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicInteger
class QueueTconsume extends SourceCode {
static AtomicInteger index = new AtomicInteger(1)
static int total = 100_0000
static int size = 10
static int threadNum = 5
static int piece = total / size
static def url = "http://localhost:12345/funtester"
static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"
public static void main(String[] args) {
LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
def pwait = new CountDownLatch(10)
def produces = {
fun {
while (true) {
if (linkedQ.size() > total) break
def get = new HttpGet(url)
get.addHeader("token", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)
linkedQ.add(get)
}
pwait.countDown()
}
}
10.times { produces() }
pwait.await()
outRGB("Data prepared! ${linkedQ.size()}")
def start = Time.getTimeStamp()
def barrier = new CyclicBarrier(threadNum + 1)
def latch = new CountDownLatch(threadNum)
def funtester = {
fun {
barrier.await()
while (true) {
if (index.getAndIncrement() % piece == 0) {
def l = Time.getTimeStamp() - start
ts << l
output("${formatLong(index.get())} consume cost ${formatLong(l)}")
start = Time.getTimeStamp()
}
def poll = linkedQ.poll(100, TimeUnit.MILLISECONDS)
if (poll == null) break
}
latch.countDown()
}
}
threadNum.times { funtester() }
def st = Time.getTimeStamp()
barrier.await()
latch.await()
def et = Time.getTimeStamp()
outRGB("Rate per ms ${total / (et - st)}")
outRGB(CountUtil.index(ts).toString())
}
}Producer & Consumer Combined Scenario
This test pre‑fills the queue to a specified initial length before running producer and consumer threads.
package com.funtest.groovytest
import com.funtester.frame.SourceCode
import com.funtester.utils.Time
import org.apache.http.client.methods.HttpGet
import org.apache.http.client.methods.HttpRequestBase
import java.util.concurrent.CountDownLatch
import java.util.concurrent.CyclicBarrier
import java.util.concurrent.LinkedBlockingQueue
import java.util.concurrent.TimeUnit
import java.util.concurrent.atomic.AtomicInteger
class QueueBoth extends SourceCode {
static AtomicInteger index = new AtomicInteger(1)
static int total = 500_0000
static int length = 50_0000
static int threadNum = 5
static def url = "http://localhost:12345/funtester"
static def token = "FunTesterFunTesterFunTesterFunTesterFunTesterFunTesterFunTester"
public static void main(String[] args) {
LinkedBlockingQueue
linkedQ = new LinkedBlockingQueue<>()
def latch = new CountDownLatch(threadNum * 2)
def barrier = new CyclicBarrier(threadNum * 2 + 1)
def ts = []
def funtester = { f ->
{
fun {
barrier.await()
while (true) {
if (index.getAndIncrement() > total) break
f()
}
latch.countDown()
}
}
}
def produces = {
def get = new HttpGet(url)
get.addHeader("token", token)
get.addHeader(HttpClientConstant.USER_AGENT)
get.addHeader(HttpClientConstant.CONNECTION)
linkedQ.put(get)
}
length.times { produces() }
threadNum.times {
funtester(produces)
funtester { linkedQ.poll(100, TimeUnit.MILLISECONDS) }
}
def st = Time.getTimeStamp()
barrier.await()
latch.await()
def et = Time.getTimeStamp()
outRGB("Rate per ms ${total / (et - st) / 2}")
}
}Additional Observations
The performance of LinkedBlockingQueue is highly unstable; logs show that maximum latency can be ten to twenty times the minimum when the queue length reaches one million. Reducing the queue length to 500 k mitigates this instability, so keeping the queue as short as possible is advisable.
Benchmark Summary
Using the FunTester framework, the following rates (operations per millisecond) were observed for different object sizes and thread counts:
Test Object
Threads
Count (M)
Rate (/ms)
Small
1
1
5681
Small
5
1
8010
Small
5
5
15105
Medium
1
1
1287
Medium
5
1
2329
Medium
5
5
4176
Large
1
1
807
Large
5
1
2084
Large
5
5
3185
The test cases used Groovy code similar to the snippets above, with configurable thread numbers, total request counts, and message sizes.
Have Fun ~ Tester !
FunTester 2021 Summary
2022 Plan Template
"Programmers over 35 are eliminated" – 22 years old
Selenium JUnit Parameterization
QPS Sampler Implementation in Performance Test Framework
Testing Non‑Fixed Probability Algorithms
Mobile Test Engineer Career
Groovy Hot‑Update Java Practice
Java Thread‑Safe ReentrantLock
Interface Test Coverage (JaCoCo) Sharing
Selenium Python Tips (Part 3)
Console Color Output
FunTester
10k followers, 1k articles | completely useless
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.