Refactoring a Decade‑Old Search Query Module: Identifying and Fixing Code Smells
The team inherited a ten‑year‑old search query module riddled with duplicate code, massive functions, bloated classes, and other smells, and by extracting utilities, splitting functions, separating responsibilities, tightening interfaces, parallelizing tokenization, and enforcing strict compiler warnings, they cut code size by 80%, reduced startup time, memory use, and latency by 26%, and dramatically improved stability and observability.
Previously, the team took over and refactored a ten‑year‑old search query understanding codebase, reducing the code size by 80% and significantly improving performance, stability, and observability while supporting deployment on both self‑built cloud and on‑premise environments.
Background
After an organizational restructure, the team inherited three low‑level modules of the search pipeline, including the Query Optimizer (QO) responsible for tokenization, term weighting, proximity, and intent recognition. The original code suffered from low iteration efficiency (adding a simple operator required three person‑days), poor stability (frequent P99 spikes), slow startup (18 minutes), excessive memory usage (114 GB per process), lack of monitoring and tracing tools, and an outdated GCC 4.8 compiler.
Motivated by the desire to improve these issues, the team launched a comprehensive refactor.
Code Smells and Their Motives
1. Duplicate Code : Two functions for GBK/UTF‑8 conversion differed only in parameter order. The team extracted the common logic into a shared utility.
2. Long Functions : Functions exceeding 1,000 lines (many commented out) were split into smaller, testable units.
3. Bloated Classes : A request‑handling class bundled HTTP service instances, caches, and dozens of strategy logics, making it hard to comprehend. Responsibilities were separated into dedicated classes.
4. Long Parameter Lists : Methods with dozens of parameters were refactored to accept configuration objects or structs, reducing the risk of passing incorrect arguments.
5. Confusing Temporary Fields : Variables like is_second had obscure meanings; they were renamed or eliminated.
6. Excessive Parameter Ranges : Large structs were passed around unnecessarily; the scope of data was minimized per interface.
7. Unnecessary Serial Execution : Two tokenization steps (with and without punctuation) were parallelized using a DAG scheduler, cutting main‑flow latency from 13.19 ms to 9.71 ms (≈26% improvement).
8. Ignored Compilation Warnings : Missing return statements and unsafe sprintf usage caused crashes after upgrading to GCC 8.3.1. The team enabled -Wall -Werror to treat warnings as errors.
9. Magic Numbers : Hard‑coded constants (e.g., 43000) were replaced with named constants and documentation.
10. Long If‑Statements : Repeated conditional checks were replaced with lookup tables to improve readability and performance.
Preventive Measures
• Use static analysis tools (e.g., CodeCC) to detect duplicate code and other smells. • Enforce coding standards: avoid magic numbers, keep functions short, limit parameter counts, and prefer composition over large monolithic classes. • Write comprehensive unit tests and monitor test coverage; high coverage discourages code duplication. • Adopt the “least knowledge” principle: expose only the data required by an interface. • Enable strict compiler warnings to catch potential bugs early. • Leverage DAG‑based task scheduling for parallelizable workloads.
Results After Refactor
The refactored module achieved a dramatic reduction in code size, faster startup (minutes instead of tens of minutes), lower memory consumption, improved stability (fewer P99 spikes), and better observability. Performance benchmarks showed a 26% latency reduction for the previously serial tokenization step.
Conclusion
The article shares the encountered code smells, their original motivations, and concrete preventive and remediation strategies, emphasizing that disciplined refactoring leads to more maintainable, efficient, and reliable backend systems.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.