Backend Development 8 min read

Resolving a NullPointerException Caused by In‑Place Modification of Apollo Configuration

The article recounts a production incident where a new backend feature introduced a NullPointerException due to directly mutating an Apollo configuration object, describes the debugging steps taken—including log analysis, environment replication, service restart, and code review—and explains the final fix of returning a copy of the configuration to prevent shared‑state corruption.

转转QA
转转QA
转转QA
Resolving a NullPointerException Caused by In‑Place Modification of Apollo Configuration

Happy Launch

Recently a new feature was added to the management backend to expose several operational configuration items for controlling front‑end switches. The changes touched many places but were straightforward; after testing the feature was deployed and initially appeared to work without issues.

Lost Lunch Break

Problem Appears

Shortly after lunch, an online alarm triggered. Log analysis reproduced the issue in the production environment, revealing a NullPointerException at com.xxxx.xxxx.test.utils.CommonConfigUtil.getOriginAppIdsConf(CommonConfigUtil.java:842) . The offending code was part of the newly added feature.

The highlighted line calls getAgencyByAppId , which reads an Apollo object named agencyItemConfMap .

Although the code was located, it was puzzling why a simple Apollo read would result in a null pointer, especially since the configuration had already been published online.

Attempted Solutions

First, the same configuration was used in a sandbox environment for debugging, but the issue could not be reproduced. Since the error stemmed from reading Apollo configuration, we suspected that the Apollo publish might not have taken effect, leaving agencyItemConfMap empty.

We manually republished the configuration and monitored the alerts, but they persisted.

Next, we considered a service‑side problem—perhaps the service did not read the configuration after deployment—so we restarted the server. After the restart the problem disappeared temporarily, giving us confidence.

However, the error resurfaced after about 30 seconds, indicating an intermittent issue. Further log inspection suggested the problem might be limited to a specific machine, but all machines produced similar error logs.

Deep Analysis

Since neither the server nor Apollo appeared faulty, the bug had to be in the code. Reviewing the implementation revealed that the Apollo object was being modified directly (using remove ) after being read, which corrupted the shared configuration data.

The problematic snippet (shown in the attached images) removed expired or invalid appIds from the original Apollo map, causing subsequent reads to encounter a null reference.

The method getAgencyItemConfList further illustrated the issue by returning the same Apollo object after removal operations.

Two approaches were contrasted:

Problematic: All features operate on the same Apollo object, so any mutation affects every subsequent read.

Correct: Each feature works on a fresh copy of the Apollo data; mutations are applied to the copy, leaving the original untouched.

Backtracking

Git history showed that earlier code retrieved all app configurations by iterating over the Apollo object and returning a new object, which avoided the issue.

Problem Fix

The solution was to assign the read Apollo object to a new instance and return that copy, ensuring that subsequent operations never modify the original data. The corrected code is shown below.

After deploying the fix, the system was monitored for a period with no further errors, allowing the team to relax.

Painful Reflection

Why did the issue not appear in test or sandbox environments? Two main reasons:

The remove scenario only triggers under specific conditions that were not exercised during testing.

Only a single user operated in test/sandbox, so concurrent reads after a removal never occurred.

To prevent similar problems:

Developers should always return a new object when they need to modify configuration data.

Testers with code‑review skills should thoroughly review changes; teams lacking this skill should increase coverage of edge cases.

Consider adding custom Sonar rules to detect risky in‑place modifications during static analysis.

About the author Cheng Jie, Shenzhen Business – Financial Testing Group, primarily responsible for financial business testing and financial mock platform construction.
backendDebuggingJavaConfigurationCode ReviewnullpointerexceptionApollo
转转QA
Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.