Backend Development 10 min read

Optimizing a High‑Concurrency Data Processing Interface: Reducing Response Time from 30 s to 0.8 s

This article presents a real‑world case study of a high‑concurrency transaction‑handling API that originally required 30 seconds to process over two million records, and details the step‑by‑step SQL, Java, and caching optimizations that brought the response time down to under one second.

Code Ape Tech Column

Jul 18, 2024

Optimizing a High‑Concurrency Data Processing Interface: Reducing Response Time from 30 s to 0.8 s

Problem Diagnosis

The interface initially exhibited a latency of about 30 seconds for a query that scanned more than 2 million rows. After ruling out network and hardware issues, profiling revealed that the bottleneck lay in the SQL execution.

The most time‑consuming part of the code was a MyBatis call:

List<Map<String, Object>> list = transhandleFlowMapper.selectDataTransHandleFlowAdd(selectSql);

The corresponding SQL fetched the programhandleidlist column from the anti_transhandle table for a one‑day window:

SELECT programhandleidlist FROM anti_transhandle WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0';

Running the SQL alone took only about 800 ms, indicating that the heavy cost was introduced in the Java layer where each row was materialized into a Map object.

SQL‑Level Analysis

Using EXPLAIN ANALYZE confirmed that the query plan was not the issue; the delay stemmed from the massive object creation in the application.

Code‑Level Analysis

For each of the 2 million+ rows, a Map was created, which dramatically slowed down processing.

Optimization Measures

1. SQL Optimization

The idea was to collapse the 2 million rows into a single result using PostgreSQL array functions. The following SQL aggregates the array elements:

SELECT array_agg(elem) AS concatenated_array
FROM (
    SELECT unnest(programhandleidlist) AS elem
    FROM anti_transhandle
    WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0'
) sub;

This returns one array instead of millions of rows, reducing the database time to ~0.8 s. Further aggregation with string_agg can produce a comma‑separated string if needed.

2. Push Business Logic into SQL

To count the occurrences of each ID in programhandleidlist, the query was rewritten to unnest the array and group by the element:

SELECT elem, COUNT(*) AS count
FROM (
    SELECT unnest(programhandleidlist) AS elem
    FROM anti_transhandle
    WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0'
) sub
GROUP BY elem;

This query runs in about 0.7 s, which is the best achievable with a relational database for this data volume.

3. Introduce Caching

A local cache (Caffeine) was added to avoid repeated database hits for historical dates. The cache is configured with a maximum size of 500 entries and a 60‑minute expiration.

<dependency>
  <groupId>com.github.ben-manes.caffeine</groupId>
  <artifactId>caffeine</artifactId>
  <version>3.1.8</version>
</dependency>

import com.github.benmanes.caffeine.cache.Caffeine;
import org.springframework.cache.CacheManager;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.cache.caffeine.CaffeineCacheManager;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.concurrent.TimeUnit;

@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager();
        cacheManager.setCaffeine(Caffeine.newBuilder()
            .maximumSize(500)
            .expireAfterWrite(60, TimeUnit.MINUTES));
        return cacheManager;
    }
}

The service method was updated to retrieve the previous day’s hit rate from the cache, falling back to a database query when the cache miss occurs.

@Autowired
private CacheManager cacheManager;

private static final String YESTERDAY_HIT_RATE_CACHE = "hitRateCache";

@Override
public RuleHitRateResponse ruleHitRate(LocalDate currentDate) {
    double hitRate = cacheManager.getCache(YESTERDAY_HIT_RATE_CACHE)
        .get(currentDate.minusDays(1), () -> {
            Map<String, String> hitRateList = dataTunnelClient.selectTransHandleFlowByTime(currentDate.minusDays(1));
            // compute hit rate
            return hitRate;
        });
    // ... other logic ...
}

Conclusion

After applying the SQL aggregation, moving part of the logic into the database, and adding a Caffeine cache, the interface response time dropped from 30 seconds to under 0.8 seconds. The case also highlights the limitations of relational databases for massive analytical workloads and suggests column‑oriented stores (e.g., ClickHouse, Hive) for future scalability.

Relational databases (MySQL, Oracle) excel at transactional workloads with strong consistency.

Column‑oriented databases (HBase, ClickHouse) are better suited for large‑scale analytics and fast read‑heavy queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MyBatis PostgreSQL SQL optimization

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.