Synthesizing Agentic Factual SFT/Mid‑train Data: Query Filtering, Trajectory Generation, and Tool Usage
The article outlines a practical pipeline for creating agentic factual SFT and mid‑train datasets, covering how to define training goals, filter and classify queries, label processing tags, format trajectory samples, differentiate SFT from mid‑train data, and avoid common pitfalls when generating evidence‑driven AI training data.
