MongoDB Index Optimization for High‑Write CMDB Workloads and Slow‑Log Reduction
This article analyzes the performance bottlenecks of a MongoDB‑based CMDB under high write concurrency, explains index structures and storage, demonstrates how to identify and eliminate full‑table scans and redundant multikey indexes, and shows the resulting reduction in slow‑log volume and CPU usage.
Background
CMDB stores machine data in MongoDB 4.0 and receives about 1.5k QPS write traffic. Because each machine update may involve many sub‑resource updates, the database stays under high load, generating up to 140,000 slow‑log entries per hour, causing queue back‑pressure. The article focuses on database‑side optimizations: controlling write speed and improving write performance.
MongoDB Index Overview
Only the index types relevant to the optimization are introduced: Single Field Index, Compound Index, Multikey Index, Geospatial Index, Text Index, and Hashed Index. Index attributes such as Unique, Partial, Sparse, and TTL are also mentioned. The CMDB collection galaxyx mainly uses single‑field (unique), compound, and multikey indexes.
Index Storage
MongoDB indexes are stored using a B‑Tree structure. Each node contains the indexed key value and the document’s storage location (the location format differs between MMAPv1 and WiredTiger). Unlike MySQL’s B+Tree where only leaf nodes store data, MongoDB can store data in any node, potentially requiring only one disk I/O.
Example collection users with sample documents:
1 {"name": "a", "age": 30, "childrens": [{"name": "a_a", "age": 3}, {"name": "a_b", "age": 1}]}
2 {"name": "b", "age": 30, "childrens": [{"name": "b_a", "age": 2}]}
3 {"name": "c", "age": 32, "childrens": [{"name": "c_a", "age": 4}, {"name": "c_b", "age": 1}]}
4 {"name": "e", "age": 33, "childrens": [{"name": "e_a", "age": 5}, {"name": "e_b", "age": 2}]}
5 {"name": "f", "age": 32, "childrens": [{"name": "f_a", "age": 4}, {"name": "f_b", "age": 1}]}
6 {"name": "d", "age": 40, "childrens": [{"name": "d_a", "age": 10}]}
7 {"name": "g", "age": 42, "childrens": [{"name": "g_a", "age": 12}, {"name": "g_b", "age": 8}]}After persistence, each document has a storage position similar to a locker number in a supermarket.
Creating a single‑field index on age :
db.users.createIndex({'age': 1})The resulting index entries map the age value to the document position:
age position document
30 position3 {"name":"a",...}
30 position5 {"name":"b",...}
32 position1 {"name":"c",...}
32 position2 {"name":"f",...}
33 position4 {"name":"e",...}
40 position6 {"name":"d",...}Creating a compound index on name and age yields entries such as:
name,age position
a,30 position3
b,30 position5
c,32 position1
f,32 position2
e,33 position4
d,40 position6Creating a compound index on age and the array field childrens.name turns it into a multikey index, producing many key‑to‑position mappings. When an array contains many elements, the index size grows proportionally (N:1), which can degrade update performance because each update may need to modify many index entries.
Index Type
Key‑to‑Document Ratio
Single‑field or Compound
1:1
Multikey
N:1
Slow‑Log Pain Points
Under high write concurrency, an hour can generate more than 140,000 slow‑log entries, severely slowing data consumption. Profiling is enabled with db.setProfilingLevel(1) , which records operations longer than 100 ms.
Typical slow‑log fields include operation type, command details, execution plan (e.g., IXSCAN vs COLLSCAN ), lock information, and execution time (e.g., 13148ms ).
Investigation Steps
Start with the longest‑running logs.
After fixing issues, run the service again to see if the problem persists.
Ignore system‑related slow logs and focus on business‑related ones.
Longest‑Running Logs
Filtering logs with execution time > 10 s reveals many full‑collection scans (planSummary = COLLSCAN ). Adding appropriate indexes in the background (to avoid table locks) eliminates these scans.
After removing full scans, the number of > 10 s logs drops, and CPU usage decreases slightly.
Further filtering with a 1 s threshold shows three main categories of remaining slow logs:
Transaction‑related logs.
System session logs.
Logs related to updating machine resources.
The third category is the most actionable. An example query updates the docker0 network interface and uses a compound index {'uuid':1, 'interface.name':1} , but the planner actually chooses the unique {'uuid':1} index, causing extra fetch work.
Because the unique index is smaller (1:1 key‑to‑document ratio) than the compound multikey index (which can be N:1), MongoDB prefers it when the query filters on a unique field.
When the unique constraint is removed, the same query correctly uses the compound index, confirming that the former unique index had become redundant.
In the cr_resource collection, 28 indexes existed, 17 of which were redundant compound multikey indexes. Removing these indexes reduced CPU usage from a peak of 70 % to about 10 % and dramatically lowered lock contention.
FAQ
Why doesn’t the query hit the compound index?
MongoDB chooses the smallest index that satisfies the query predicates. If a unique index on uuid already returns a single document, the planner prefers it over a larger compound index that includes additional fields.
Why do update operations cause high load?
Updates that modify indexed fields trigger index maintenance. With many multikey indexes, each update may need to modify a large number of index entries, increasing CPU and lock usage. WiredTiger limits concurrent operations to 128, so high write concurrency can saturate locks, especially when many transactions update the same collection.
Conclusion
The optimization process highlighted two key actions:
Identify and index queries that cause full‑collection scans.
Remove redundant indexes, especially multikey ones.
General recommendations:
Understand business query patterns and create indexes selectively; excessive indexes degrade performance.
If a unique field is used for lookups, a separate compound index containing that field is unnecessary.
Follow the left‑most prefix rule when building compound indexes (e.g., {name:1, age:1, tel:1} covers queries on {name} , {name, age} , and {name, age, tel} ).
Be cautious with multikey indexes; large arrays can inflate index size and update cost.
When an operation is slow, first check whether it is index‑related.
Use explain() to inspect execution plans.
References
WiredTiger Storage Engine – https://docs.mongodb.com/manual/core/wiredtiger/
Write Operation Performance – https://docs.mongodb.com/manual/core/write-performance/#document-growth-and-the-mmapv1-storage-engine
Effective MongoDB Indexing – https://dzone.com/articles/effective-mongodb-indexing-part-1
MongoDB serverStatus.globalLock Deep Dive – https://yq.aliyun.com/articles/201983
MongoDB Indexing Best Practices – https://www.compose.com/articles/mongodb-indexing-best-practices/
NetEase Game Operations Platform
The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.