Databases 14 min read

Analyzing and Optimizing InfluxDB Performance: Indexing, Memory Usage, and Configuration

This article examines InfluxDB's in‑memory and TSI indexing, investigates queue buildup caused by asynchronous batch writes and high I/O, analyzes excessive SHR memory usage, and proposes configuration, retention policy, NUMA, and query optimizations to improve overall database performance.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Analyzing and Optimizing InfluxDB Performance: Indexing, Memory Usage, and Configuration

InfluxDB currently supports the in‑memory inmem index and the file‑based tsi1 index. After switching the database to use tsi1 , monitoring revealed occasional queue buildup during periods when the writer employed an asynchronous batch‑write strategy, leading to high disk I/O and elevated wa values.

System monitoring with top showed the influxd process consuming 53 GB of resident memory and 25 GB of shared memory (SHR). The article explains that SHR represents shared memory that can be used by other processes and investigates why InfluxDB exhibits such high SHR usage by analyzing /proc/9571/smaps output.

Analysis of the smaps data identified large TSM data files (≈18 GB) and series files (≈7 GB) contributing to the memory footprint. Adjusting the retention policy to a 3‑day duration with 2‑hour shards reduced the SHR peak by 10‑15 GB, though it increased the frequency of internal compactions.

The article also discusses hardware architecture, contrasting SMP/UMA with NUMA, and presents lscpu and numactl outputs that reveal two NUMA nodes with significant numa_miss and other_node values. Because Go does not yet provide a mature NUMA‑aware scheduler, the author recommends launching InfluxDB with interleaved memory allocation:

numactl --interleave=all /usr/bin/influxd -config /usr/bin/influxdb.conf

For read‑side optimization, the author advises defining appropriate tags and fields to limit series proliferation and restricting query time ranges to avoid loading unnecessary shards. An example shows how adding a recent‑2‑hour filter dramatically reduces memory and CPU consumption.

Key configuration changes include setting wal-fsync-delay to 50 ms, using index-version = "tsi1" , adjusting cache sizes, snapshot intervals, and compaction parameters (e.g., compact-full-write-cold-duration = "80h" , max-concurrent-compactions = 8 , compact-throughput = "16m" ).

In summary, the article consolidates observed InfluxDB issues, proposes retention‑policy tuning, configuration adjustments, NUMA‑aware startup, and query refinements as ongoing steps to mitigate memory pressure, I/O bottlenecks, and improve overall stability.

performanceOptimizationDatabaseMemoryInfluxDBTSI
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.