Understanding Java String.substring Memory Leak in JDK 6 and Its Fix in JDK 7
This article explains how the original JDK 6 implementation of String.substring could retain large character arrays and cause apparent memory‑leak‑like OutOfMemoryError, describes the underlying fields, reproduces the issue, and shows how JDK 7’s copy‑on‑substring redesign resolves the problem.
Basic Introduction
The substring method in Java has two overloads: one that takes only a start index and another that takes both start and end indices. For example, "unhappy".substring(2) returns "happy" , and "smiles".substring(1, 5) returns "mile" .
Preparation
The issue appears on Java 6; if you are using a different version, adjust the JDK accordingly (e.g., switch to 1.6 on macOS or use alternatives --config java on Linux).
Problem Reproduction
The official Java bug report provides a test case that, when run on Java 6, throws java.lang.OutOfMemoryError: Java heap space . The code creates many TestGC objects that each hold a large string, leading to OOM.
Modifying the getString method to avoid retaining the large backing array prevents the OOM, because the large largeString is released by the garbage collector.
Deep Dive into Java 6 Implementation
In JDK 6, String stores three fields:
value – the character array containing the actual characters
offset – the start index of the string within value
count – the length of the string
The substring implementation simply creates a new String that shares the original value array, adjusting offset and count without copying characters.
Consequently, if a large string (e.g., 1 GB) is substring‑ed to a tiny string and the original reference is cleared, the large character array remains reachable through the tiny string, preventing its memory from being reclaimed.
Shared Character Array
Sharing the backing array improves performance by avoiding unnecessary copying, but in the extreme case described it can cause memory‑pressure that looks like a leak.
How to Solve
To avoid retaining the large array, explicitly create a new string that copies only the needed characters, e.g.:
new String(original.substring(start, end))or use a constructor that copies the relevant portion when the source array is larger than the string length.
Java 7 Implementation
JDK 7 removed the shared‑array optimization. Every substring (except when the result is the original string) creates a fresh character array containing only the substring’s characters, eliminating the hidden memory retention.
Is It Really a Memory Leak?
While the JDK 6 behavior can cause large amounts of memory to stay allocated, it is not a classic leak because the memory is still reachable and will be reclaimed once both the original and the substring are garbage‑collected.
Which Version Is Better?
Some developers prefer JDK 6’s approach for its speed, arguing that the issue can be mitigated with careful coding. Others favor JDK 7’s safer semantics despite a modest performance cost.
Value of the Issue
Understanding this implementation detail is valuable for designing efficient string handling and avoiding subtle memory problems, even though newer JDKs have fixed the issue.
Affected Methods
Methods such as trim and subSequence internally rely on substring , so changes to its implementation affect them as well.
Reference Resources
The substring() Method in JDK 6 and JDK 7 – discusses the bug and related string concatenation concerns.
How SubString method works in Java – Memory Leak Fixed in JDK 1.7 – clarifies that the new string does not prevent the old one’s array from being collected.
JDK‑4513622 : (str) keeping a substring of a field prevents GC for object – notes a test issue regarding the string constant pool.
Note
When reproducing the bug, avoid using string literals like "ab" directly, because they are interned and shared; using new String("ab") forces a distinct object.
Strong Recommendation
For readers interested in deeper Java insights, consider reading Joshua Bloch’s “Java Puzzlers”.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.