Artificial Intelligence 22 min read

STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

STARDOM is an end‑to‑end deep hierarchical forecasting model that jointly learns hierarchical constraints, query semantics via pretrained BERT, and a calibration matrix within an encoder‑decoder architecture, using a distilled reconciliation loss and hierarchical sampling to accurately predict large‑scale search traffic and outperform state‑of‑the‑art baselines.

Alimama Tech
Alimama Tech
Alimama Tech
STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

1. Background

Search guaranteed-impression advertising in e‑commerce requires accurate traffic forecasting for each query. Traditional methods treat each query independently, ignoring hierarchical constraints among queries, brands, categories, and the semantic intent of queries. To address this, we propose the Semantic‑Aware Deep Hierarchical Forecasting Model (STARDOM), the first end‑to‑end approach that jointly learns hierarchical constraints, semantic information, and time‑series prediction.

2. Motivation and Problem Definition

Independent query forecasts suffer from high noise and poor aggregation. Aggregated series such as brand or category have stronger regularity and obey sum constraints with their child queries. Moreover, queries with similar semantics exhibit similar temporal patterns. We therefore incorporate multi‑granularity hierarchical relationships and query semantics into the model, using a calibration‑matrix learning module and a distilled reconciliation loss, together with a hierarchical sampling strategy that scales to millions of series.

3. Hierarchical Forecasting Principle

Traditional bottom‑up or top‑down hierarchical methods only consider a single level. Reconciliation‑based methods first produce base forecasts for all nodes and then adjust them using a calibration matrix that enforces sum constraints. However, existing approaches are two‑stage, non‑end‑to‑end, and do not scale to high‑dimensional data.

4. Calibration Matrix via Empirical Risk Minimization

We learn the calibration matrix directly within an encoder‑decoder deep forecasting model by minimizing an empirical risk that includes a reconciliation term. The matrix is applied to encoder representations rather than decoder outputs, enabling stable end‑to‑end training.

5. STARDOM Model

The overall architecture follows a classic encoder‑decoder design. The encoder contains a dual encoder (forecast and context embeddings) and a Reconciliation Matrix Learning (RML) module that produces calibrated node representations. The decoder incorporates a Distilled Reconciliation Loss to jointly optimize forecasting accuracy and hierarchical consistency. Semantic information extracted by pretrained BERT is fed both as features and into the RML module. A hierarchical sampling strategy selects a subset of nodes and a virtual parent node for each training step, reducing the number of calibration parameters.

5.1 Multi‑Granularity Data Construction

We build a dataset with eight node types (query, brand, first‑level category, leaf category, brand×category, etc.) and align their features for joint training.

5.2 Calibration Matrix Learning Module

Each node’s time series is encoded into forecast and context embeddings; pairwise calibration coefficients are computed and used to adjust the encoder outputs. Multi‑head learning aggregates several calibration matrices.

5.3 Distilled Reconciliation Loss

The loss decomposes into forecast loss for each node and a reconciliation loss that aligns aggregated child forecasts with parent forecasts, using knowledge‑distillation to mitigate noisy parent labels.

5.4 Encoder‑Decoder Structure

The encoder combines LSTM and multi‑head self‑attention with convolutional distillation; the decoder mirrors this structure and attends to calibrated encoder representations.

5.5 Semantic Information Module

Query semantics are obtained via pretrained BERT and injected as features and into the calibration matrix learning, improving performance over raw ID features.

5.6 Hierarchical Sampling

A random parent node and its children are sampled proportionally to historical PV, then a virtual parent is created to preserve sum constraints while drastically reducing calibration parameters.

6. Experiments and Analysis

STARDOM is compared with state‑of‑the‑art baselines (LSTM, Transformer, DCRNN, GMAN, MinT, HIRED) on the public FGSF dataset. It achieves significant improvements on bottom‑level, aggregated, and overall metrics. Ablation studies show the importance of the reconciliation loss, RML module, stability regularization, and semantic features.

7. Conclusion and Future Work

We present an end‑to‑end deep hierarchical forecasting model that leverages multi‑granularity data, hierarchical constraints, and semantic information for large‑scale search traffic prediction. Future directions include integrating temporal‑graph structures to further capture spatial‑temporal relationships.

time series forecastingdeep learningSearch Advertisinghierarchical modelingsemantic awareness
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.