Implementation and Optimization of Local Private Domain Search in Cloud Music
The Cloud Music team integrated a lightweight on‑device full‑text engine using SQLite FTS5 with a simple tokenizer, replaced JavaScript matching with SQLite’s bm25(), parallelized queries, and cut search latency by 75%, boosting CTR 13% and average playback by 17 seconds while preserving user privacy.
Cloud Music accumulates user private domain data such as liked songs, collected playlists, and followed artists. To meet the growing demand for searching this data locally, a lightweight on‑device full‑text search engine was integrated.
The article first reviews the basic workflow of a search engine (crawl, analyze, index, search, rank) and the principles of a full‑text search engine.
After evaluating existing solutions, the team selected between NSearch (a proprietary high‑performance engine) and SQLite FTS. Considering package size and cross‑platform integration, SQLite 3 FTS5 together with the third‑party “simple” tokenizer was chosen as the optimal solution.
SQLite FTS5 provides a virtual table that stores inverted indexes and supports multi‑column text search, Boolean operators, and language‑specific tokenizers. The “simple” tokenizer, built on cppjieba, offers Chinese word, phrase, pinyin and pinyin‑initial matching.
// create song info table
CREATE TABLE IF NOT EXISTS songindex (
song_id INTEGER PRIMARY KEY,
name TEXT,
alias TEXT,
artist_name TEXT,
other TEXT
); // create FTS5 virtual table
CREATE VIRTUAL TABLE IF NOT EXISTS songindexfts5 USING fts5 (
song_id UNINDEXED,
name,
alias,
artist_name,
other,
content='songindex',
content_rowid='song_id',
tokenize='simple'
); // trigger to keep FTS table in sync
CREATE TRIGGER IF NOT EXISTS triggerinsert AFTER INSERT ON songindex
BEGIN
INSERT INTO songindexfts5 (rowid, name, alias, artist_name, other)
VALUES (new.song_id, new.name, new.alias, new.artist_name, new.other);
END; // basic full‑text query
SELECT * FROM songindexfts5 WHERE songindexfts5 MATCH 'keyword';In the client, the search flow is: the user types a query, the JS ranking script receives the raw results from SQLite, computes a text‑matching score, applies a second round of weighting based on user behavior (likes, play count, etc.), and finally renders the sorted list.
Performance bottlenecks were identified in three stages: assembling resource data (~3000 ms for 10 k items), JS‑Native data transfer (~1600 ms), and text‑matching calculation in JS (~230 ms). Optimizations included parallel SQLite queries with delayed deserialization, replacing the JS matching step with SQLite’s built‑in bm25() function, and reducing JNI/JSC thread switches.
SELECT *, bm25(songindexfts5) AS BM
FROM songindexfts5
WHERE songindexfts5 MATCH 'keyword'
ORDER BY bm25(songindexfts5);After these changes, the total search latency dropped by about 75 %, allowing 4 k items to be processed within 1 s (previously only 1 k). Click‑through rate increased by 13 % and average playback time grew by 17 seconds.
The solution demonstrates that a lightweight SQLite FTS5 + simple tokenizer stack can satisfy privacy‑preserving, offline, and low‑latency search requirements on mobile devices, and the ongoing work will focus on custom tokenizer improvements and further query‑time reductions.
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.