Backend Development 18 min read

How We Cut Publishing Latency by 600ms: A Real‑World Backend Optimization Case Study

Through profiling with flame graphs, log analysis, and targeted refactoring—including async task handling, rule‑engine tuning, data‑load reduction, and cache redesign—we reduced the 95th‑percentile publishing latency on Baixing.com from around 3 seconds to under 1 second, achieving near‑instant “second‑post” performance.

Baixing.com Technical Team
Baixing.com Technical Team
Baixing.com Technical Team
How We Cut Publishing Latency by 600ms: A Real‑World Backend Optimization Case Study

Background

When users click the publish button on Baixing.com, the request passes through a risk‑control system that performs extensive risk and quality analysis before returning a publishing status. This heavy analysis lengthened the response time, prompting the technical team to launch a "Second‑Kill" optimization project at the end of July.

Current State and Goal

Current State

The 95th‑percentile latency for publish & update operations was about 3 seconds.

Goal

Reduce the 95th‑percentile latency to under 1 second.

Problem Identification

Using flame‑graph profiling and historical slow‑query logs, the team pinpointed two major hot‑spots: the cloud‑association analysis module (data loading) and the keyword‑matching module (matching algorithm).

Flame‑graph Y‑axis shows call order; X‑axis shows time percentage per sample.

Flame‑Graph Tool

The flame graph revealed the call stacks that consumed the most time, highlighting the cloud‑association analysis and keyword modules as primary targets.

Log Data

Analysis of slow‑query logs exposed insufficient data structures and inadequate caching, while timing logs of key algorithms quantified the performance gap.

Optimization Plan

Asynchronously process risk‑control sub‑services that can be queued.

Optimize usage of the rule‑engine module on the business side.

Improve the cloud‑association analysis module, which has the highest potential but also the highest difficulty.

Optimize the keyword‑matching module.

Acceptance Environment

Before development, real‑time timing points were added to all publishing modules and key risk‑control components to enable precise measurement of improvements.

Development and Incremental Improvements

6.1 Extracting Asynchronous Tasks

Identified and async‑ified two risk‑control sub‑services. The async services reduced their 95th‑percentile latency to under 20 ms, but the overall publishing curve showed little change due to noise.

6.2 Optimizing Rule‑Engine Usage

Removed obsolete rules and moved some checks to post‑publish, gaining roughly 300 ms of latency reduction.

6.3 Cloud‑Association Analysis

6.3.1 Reducing Redundant Data Loads

The module loaded excessive data via

Data::load

. Refactoring the data‑loading logic cut about 600 ms from the overall latency.

6.3.2 Parallelizing Search Requests

Attempted to parallelize HTTP calls to the search service using

curl_multi

. Although parallelism sped up the search phase in tests, the single‑threaded multiplexing model and increased CPU usage limited real‑world gains.

6.3.3 Additional Attempts

Implemented cloud pre‑warming and cloud weighting, yielding 50‑100 ms improvements for pre‑warming and negligible speed impact for weighting.

6.4 Keyword Matching Optimization

6.4.1 Matching Algorithm

Replaced the original

mb_strpos()

approach with a Trie‑tree algorithm, achieving O(m) matching time where m is the text length.

6.4.2 Cache Structure

Serialized the Trie‑tree into Redis, but deserialization via

json_decode()

became a bottleneck (≈52 ms). Switching to

serialize()/unserialize()

was slower; using Memcache for direct object storage reduced latency, with Redis as a fallback.

After these changes, the keyword module contributed an additional 300‑400 ms improvement.

Project Results

Timeline

July 31 – August 16, 2017 (13 workdays). One engineer full‑time, plus business support.

Performance Gains

Across all platforms, the 95th‑percentile latency dropped from ~3 seconds to under 1 second, with individual gains of 600 ms (cloud‑association), 300 ms (rule‑engine), and 300‑400 ms (keyword module).

Further Improvement Thoughts

Turn the keyword module into a long‑running service to keep the Trie‑tree resident in memory.

Explore more efficient Trie implementations to reduce size and boost speed.

Modularize the risk‑control system by media type (text, image, audio, video) to enable finer‑grained concurrency optimizations.

Conclusion

Thorough analysis with flame graphs and real‑time metrics is essential before tackling performance problems.

Establish a solid acceptance environment to attribute latency changes to specific code changes.

For deeper dives into search performance, PHP concurrency, Trie algorithms, and serialization overhead, refer to the sections on cloud‑association and keyword optimizations.

BackendPerformance Optimizationcachingphpflame graphs
Baixing.com Technical Team
Written by

Baixing.com Technical Team

A collection of the Baixing.com tech team's insights and learnings, featuring one weekly technical article worth following.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.