Understanding the Java Stream Pipeline: Implementation Principles and Execution Process
This article explains the internal workings of Java's Stream API pipeline, covering how intermediate and terminal operations are recorded, combined, and executed via stages and Sink interfaces, illustrating lazy evaluation, stateful vs stateless operations, and the mechanisms behind parallel and sequential processing.
The article introduces the Java Stream API and asks how its powerful pipeline is implemented, including questions about iteration, automatic parallelism, and thread count.
It starts by reviewing how a container executes a lambda using ArrayList.forEach() and shows the source code of that method.
public void forEach(Consumer
action) {
...
for (int i = 0; i < size && modCount == expectedModCount; i++) {
action.accept(elementData[i]); // callback
}
...
}The article points out that ArrayList.forEach() simply loops over elements and invokes a callback, which is the same principle behind Stream's use of lambda expressions.
It then explains that Stream operations are divided into intermediate (stateless or stateful) and terminal (short‑circuit or non‑short‑circuit) operations, providing a classification table:
Stream operation classification
Intermediate operations
Stateless
unordered() filter() map() mapToInt() mapToLong() mapToDouble() flatMap() flatMapToInt() flatMapToLong() flatMapToDouble() peek()
Stateful
distinct() sorted() limit() skip()
Terminal operations
Non‑short‑circuit
forEach() forEachOrdered() toArray() reduce() collect() max() min() count()
Short‑circuit
anyMatch() allMatch() noneMatch() findFirst() findAny()
Intermediate operations are merely markers; only a terminal operation triggers actual computation. Stateless intermediates do not depend on previous elements, while stateful ones (e.g., sorted() ) need to see all elements before producing a result.
Two code examples illustrate lazy evaluation. The first uses peek() , limit() , and forEach() to show how each element passes through the pipeline only when the terminal operation runs:
IntStream.range(1, 10)
.peek(x -> System.out.print("\nA" + x))
.limit(3)
.peek(x -> System.out.print("B" + x))
.forEach(x -> System.out.print("C" + x));The output demonstrates that the pipeline processes elements one by one, back‑tracking from the terminal operation to earlier stages.
A second example with skip() shows how the pipeline can “continue” past certain elements, again emphasizing the lazy nature of intermediate operations.
A Straightforward Implementation
The article describes a naïve approach where each intermediate operation is executed immediately, storing intermediate results in lists, which leads to multiple iterations and high memory overhead.
int longest = 0;
for (String str : strings) {
if (str.startsWith("A")) {
int len = str.length();
longest = Math.max(len, longest);
}
}While efficient, this approach requires the library to know the user's intent, which it does not. Therefore, the Stream implementation must record operations and combine them into a single pass.
Stream Pipeline Solution
To achieve this, the library records each user operation as a Stage (a triple of data source, operation, and lambda) and links stages in a doubly‑linked list, forming the pipeline.
Each stage wraps its operation into a Sink object. The Sink interface defines four methods:
void begin(long size)
Prepare before traversal.
void end()
Signal completion.
boolean cancellationRequested()
Allow short‑circuiting.
void accept(T t)
Process a single element and forward it downstream.
Stages chain their sinks so that each accept() call forwards the processed element to the next stage, without the previous stage needing to know the downstream details.
Example of map() implementation:
public final
Stream
map(Function
mapper) {
return new StatelessOp
(this, StreamShape.REFERENCE,
StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
@Override
Sink
opWrapSink(int flags, Sink
downstream) {
return new Sink.ChainedReference
(downstream) {
@Override
public void accept(P_OUT u) {
R r = mapper.apply(u);
downstream.accept(r);
}
};
}
};
}For a stateful operation like sorted() , the sink stores elements in a list during accept() , sorts them in end() , and then forwards the sorted elements downstream.
class RefSortingSink
extends AbstractRefSortingSink
{
private ArrayList
list;
@Override
public void begin(long size) { list = new ArrayList<>((int)size); }
@Override
public void accept(T t) { list.add(t); }
@Override
public void end() {
list.sort(comparator);
downstream.begin(list.size());
if (!cancellationWasRequested) {
list.forEach(downstream::accept);
} else {
for (T t : list) {
if (downstream.cancellationRequested()) break;
downstream.accept(t);
}
}
downstream.end();
list = null;
}
}The pipeline is finally executed when a terminal operation creates the last sink and invokes copyInto() , which calls begin() , iterates over the source via a Spliterator , and then calls end() :
final
void copyInto(Sink
wrappedSink, Spliterator
spliterator) {
if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
wrappedSink.begin(spliterator.getExactSizeIfKnown());
spliterator.forEachRemaining(wrappedSink);
wrappedSink.end();
}
}Result handling varies by terminal operation: boolean or Optional results are stored directly in the sink; reduction operations place results in user‑provided containers; and operations returning arrays store elements in an internal Node tree before converting to an array.
Conclusion
The article concludes that understanding the Stream pipeline’s internal design—stages, sinks, lazy evaluation, and result propagation—helps developers write correct and efficient Stream code while dispelling performance concerns.
JDK version used for the examples:
$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) Server VM (build 25.101-b13, mixed mode)Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.