STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction
STARDOM is an end‑to‑end deep hierarchical forecasting model that jointly learns hierarchical constraints, query semantics via pretrained BERT, and a calibration matrix within an encoder‑decoder architecture, using a distilled reconciliation loss and hierarchical sampling to accurately predict large‑scale search traffic and outperform state‑of‑the‑art baselines.
1. Background
Search guaranteed-impression advertising in e‑commerce requires accurate traffic forecasting for each query. Traditional methods treat each query independently, ignoring hierarchical constraints among queries, brands, categories, and the semantic intent of queries. To address this, we propose the Semantic‑Aware Deep Hierarchical Forecasting Model (STARDOM), the first end‑to‑end approach that jointly learns hierarchical constraints, semantic information, and time‑series prediction.
2. Motivation and Problem Definition
Independent query forecasts suffer from high noise and poor aggregation. Aggregated series such as brand or category have stronger regularity and obey sum constraints with their child queries. Moreover, queries with similar semantics exhibit similar temporal patterns. We therefore incorporate multi‑granularity hierarchical relationships and query semantics into the model, using a calibration‑matrix learning module and a distilled reconciliation loss, together with a hierarchical sampling strategy that scales to millions of series.
3. Hierarchical Forecasting Principle
Traditional bottom‑up or top‑down hierarchical methods only consider a single level. Reconciliation‑based methods first produce base forecasts for all nodes and then adjust them using a calibration matrix that enforces sum constraints. However, existing approaches are two‑stage, non‑end‑to‑end, and do not scale to high‑dimensional data.
4. Calibration Matrix via Empirical Risk Minimization
We learn the calibration matrix directly within an encoder‑decoder deep forecasting model by minimizing an empirical risk that includes a reconciliation term. The matrix is applied to encoder representations rather than decoder outputs, enabling stable end‑to‑end training.
5. STARDOM Model
The overall architecture follows a classic encoder‑decoder design. The encoder contains a dual encoder (forecast and context embeddings) and a Reconciliation Matrix Learning (RML) module that produces calibrated node representations. The decoder incorporates a Distilled Reconciliation Loss to jointly optimize forecasting accuracy and hierarchical consistency. Semantic information extracted by pretrained BERT is fed both as features and into the RML module. A hierarchical sampling strategy selects a subset of nodes and a virtual parent node for each training step, reducing the number of calibration parameters.
5.1 Multi‑Granularity Data Construction
We build a dataset with eight node types (query, brand, first‑level category, leaf category, brand×category, etc.) and align their features for joint training.
5.2 Calibration Matrix Learning Module
Each node’s time series is encoded into forecast and context embeddings; pairwise calibration coefficients are computed and used to adjust the encoder outputs. Multi‑head learning aggregates several calibration matrices.
5.3 Distilled Reconciliation Loss
The loss decomposes into forecast loss for each node and a reconciliation loss that aligns aggregated child forecasts with parent forecasts, using knowledge‑distillation to mitigate noisy parent labels.
5.4 Encoder‑Decoder Structure
The encoder combines LSTM and multi‑head self‑attention with convolutional distillation; the decoder mirrors this structure and attends to calibrated encoder representations.
5.5 Semantic Information Module
Query semantics are obtained via pretrained BERT and injected as features and into the calibration matrix learning, improving performance over raw ID features.
5.6 Hierarchical Sampling
A random parent node and its children are sampled proportionally to historical PV, then a virtual parent is created to preserve sum constraints while drastically reducing calibration parameters.
6. Experiments and Analysis
STARDOM is compared with state‑of‑the‑art baselines (LSTM, Transformer, DCRNN, GMAN, MinT, HIRED) on the public FGSF dataset. It achieves significant improvements on bottom‑level, aggregated, and overall metrics. Ablation studies show the importance of the reconciliation loss, RML module, stability regularization, and semantic features.
7. Conclusion and Future Work
We present an end‑to‑end deep hierarchical forecasting model that leverages multi‑granularity data, hierarchical constraints, and semantic information for large‑scale search traffic prediction. Future directions include integrating temporal‑graph structures to further capture spatial‑temporal relationships.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.