Understanding Java 8 Stream API, Parallel Streams, and ForkJoinPool
This article explains Java 8 Stream API fundamentals, its composition, pipelining and internal iteration, details parallel stream execution using ForkJoinPool, discusses performance considerations, and provides practical code examples for creating and managing streams in backend development.
Java 8 introduced the Stream API, a high‑level abstraction that allows declarative processing of data similar to SQL queries, improving productivity by enabling concise, efficient code.
Stream Composition and Characteristics
A Stream is a sequence of elements from a data source that supports aggregate operations. Elements are not stored like a collection; they are computed on demand. Sources can be collections, arrays, I/O channels, generators, etc.
Aggregate operations resemble SQL statements (e.g., filter , map , reduce , find , match , sorted ). Two core features differentiate streams from traditional collection iteration:
Pipelining : intermediate operations return the stream itself, enabling fluent chaining and optimizations such as lazy evaluation and short‑circuiting.
Internal iteration : streams use the Visitor pattern to iterate internally, avoiding explicit external loops with Iterator or for‑each .
Streams also support parallel execution, which splits a task into subtasks processed by multiple threads.
BaseStream Interface
The parent of Stream is BaseStream , defining the core contract:
public interface BaseStream
> extends AutoCloseable {
Iterator
iterator();
Spliterator
spliterator();
boolean isParallel();
S sequential();
S parallel();
S unordered();
S onClose(Runnable closeHandler);
void close();
}Here T is the element type and S is the concrete stream type (e.g., Stream , IntStream , LongStream ).
Stream Interface
The Stream declaration extends BaseStream :
public interface Stream
extends BaseStream
> { }Common intermediate operations include:
Stream
filter(Predicate
predicate);
Stream
map(Function
mapper);
Stream
flatMap(Function
> mapper);
Stream
sorted();
Stream
peek(Consumer
action);
Stream
limit(long maxSize);
Stream
skip(long n);
...Parallel vs. Sequential Streams
BaseStream provides parallel() and sequential() methods; the last invocation determines the stream mode. Parallel streams rely on the Fork/Join framework introduced in Java 7.
stream.parallel()
.filter(...)
.sequential()
.map(...)
.parallel()
.sum();ForkJoinPool Behind Parallel Streams
ForkJoinPool implements work‑stealing: each worker has a private double‑ended queue; idle workers steal tasks from the tail of other queues, reducing contention. This design allows a small number of threads to execute millions of subtasks.
Parallel streams use the common ForkJoinPool (default size equals the number of CPU cores). The pool size can be adjusted with the system property -Djava.util.concurrent.ForkJoinPool.common.parallelism=N .
Blocking Operations in Parallel Streams
If a parallel stream performs blocking I/O, worker threads may become idle, potentially exhausting the pool:
public static String query(String question) {
List
engines = new ArrayList<>();
engines.add("http://www.google.com/?q=");
engines.add("http://duckduckgo.com/?q=");
engines.add("http://www.bing.com/search?q=");
Optional
result = engines.stream().parallel().map(base -> {
String url = base + question;
return WS.url(url).get(); // blocking HTTP call
}).findAny();
return result.get();
}Such blocking can starve the common pool, making other parallel tasks unpredictable.
Performance Considerations
Data size: large enough datasets make parallelism worthwhile.
Source structure: random‑access structures (e.g., ArrayList , arrays) split efficiently; linked structures split poorly.
Boxing overhead: primitive streams ( IntStream ) are faster.
CPU cores: more cores provide more worker threads.
Per‑element work (Q): higher work per element yields better speed‑up (N × Q model).
Encountered Order
Streams may be ORDERED or not. Operations like sorted() preserve order, while unordered() can drop it for better parallel performance. Certain terminal operations ( findFirst() , forEachOrdered() ) are order‑sensitive and may limit parallel gains.
Conclusion
When dealing with recursive divide‑and‑conquer algorithms, consider ForkJoinPool; tune the split threshold; adjust the common pool size if needed; avoid side‑effects in lambdas; keep lambdas stateless; and be mindful of ordering constraints for optimal parallel stream performance.
For more in‑depth examples and PDF collections, follow the author’s public account and engage with the community.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.