Databases 9 min read

Sonar-TS: A New Text-to-SQL Paradigm for Time‑Series Databases

The paper defines the NLQ4TSDB problem of letting non‑expert users query massive time‑series data with natural language, builds the large‑scale NLQTSBench benchmark, proposes the neural‑symbolic Sonar‑TS framework that searches then verifies, and shows it outperforms existing baselines while highlighting remaining challenges.

DataFunSummit

Jun 10, 2026

Sonar-TS: A New Text-to-SQL Paradigm for Time‑Series Databases

Problem definition – Existing Text‑to‑SQL approaches assume relational databases and cannot express the morphological patterns (e.g., V‑shaped spikes, platform‑type steps) that are crucial in time‑series analysis. The authors formalize this gap as NLQ4TSDB (Natural Language Querying for Time Series Databases), a problem not previously studied.

Benchmark construction – To evaluate NLQ4TSDB, the authors create NLQTSBench, the first large‑scale benchmark covering 1 153 queries across four difficulty levels (L1 basic operations, L2 shape recognition, L3 semantic reasoning, L4 insight synthesis). Each query requires locating answers within roughly 12 000 data points, forcing systems to retrieve evidence rather than scan the entire series.

Sonar‑TS framework – Inspired by active sonar, Sonar‑TS follows a “search‑then‑verify” pipeline:

Offline semantic index: time‑series are discretized into SAX symbols and statistical features at year, month, and day scales, forming a feature table searchable by SQL.

Online candidate retrieval: a planner decomposes the natural‑language query into sub‑steps; a generator produces two code snippets – one SQL query over the feature table to narrow candidates, and a Python script to precisely verify candidates against raw data.

Cold‑start guidance: domain heuristics (e.g., shape‑matching strategies) are injected during the query stage without fine‑tuning.

Result post‑processing: outputs are rendered as natural language explanations and visualizations for user verification.

Experimental results – On NLQTSBench, Sonar‑TS achieves an average score of 0.61, significantly higher than representative time‑series models and Text‑to‑SQL baselines, yet still far from solving the task completely. The authors observe that (1) all existing methods struggle with the dual demands of scale and shape, and (2) even Sonar‑TS performs poorly on tasks heavily reliant on shape understanding, indicating ample room for improvement.

Case study – For the query “identify the longest stable platform period,” Text‑to‑SQL generates rigid value filters, and pure time‑series models miss the global “longest” constraint. Sonar‑TS first filters candidates using SAX symbols, then precisely aligns boundaries with Python, reliably returning the correct answer, demonstrating the necessity of both search and verification steps.

Demo and resources – An online demo (https://huggingface.co/spaces/mrtan/Sonar-TS-Demo) lets users pose natural‑language questions and see answers highlighted on waveforms. The full code, benchmark data, and demo are open‑source at https://github.com/Atlamtiz/Sonar-TS. The underlying paper (ICML 2026) is available at https://arxiv.org/abs/2602.17001.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

benchmark Text-to-SQL time series Neural Symbolic NLQ4TSDB Sonar-TS

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.