Artificial Intelligence 12 min read

Applying BERT for News Timeliness Classification at NetEase

The article describes how NetEase adapts a pre‑trained BERT model to classify news articles into ultra‑short, short, or long timeliness categories by combining rule‑based strong and weak time cues, key‑sentence extraction, domain‑embedding fusion and multi‑layer semantic aggregation, achieving accurate and interpretable predictions for its platform.

NetEase Media Technology Team
NetEase Media Technology Team
NetEase Media Technology Team
Applying BERT for News Timeliness Classification at NetEase

This article presents the application of the BERT pre‑trained language model to the problem of judging the timeliness of news articles in NetEase’s news platform.

Business background: With the rapid growth of online news, the volume of articles has become massive. Different types of news have varying sensitivity to timeliness (ultra‑short, short, long). Setting appropriate expiration times improves user experience and reduces the load on recommendation systems.

Task definition: The timeliness judgment is formulated as a three‑class text classification problem (ultra‑short, short, long). Three kinds of textual cues are identified: (1) strong time‑template sentences that can directly determine the class, (2) weak time‑keyword sentences that provide clues but need model assistance, and (3) articles without explicit time cues, where domain categories help inference.

Engineering strategies: Four strategies are proposed: (1) direct classification using strong rules, (2) extract key sentences (weak rules) and feed them to the model, (3) rewrite key sentences before feeding to the model, and (4) fuse article text with domain category features and feed the combined representation to the model.

Implementation details: The strong‑rule extractor uses predefined time templates per domain. The weak‑rule extractor pulls key sentences based on domain and weak time keywords, optionally rewriting them (e.g., converting specific dates to “today”/“tomorrow”). The fusion module embeds each domain, applies a self‑attention layer to capture inter‑domain similarity, and concatenates the resulting domain vector with the text semantic vector.

Basic model overview: BERT employs a Transformer encoder composed of multiple stacked blocks. Each block contains a multi‑head attention sub‑layer, a residual connection, layer‑norm, and a feed‑forward network. BERT is pre‑trained on two self‑supervised tasks: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).

Model modifications: (a) Key‑sentence + BERT: Extracted key sentences are concatenated with [SEP] tokens (e.g., key_sentence1[SEP]key_sentence2[SEP]… ) and fine‑tuned on the timeliness labels using cross‑entropy loss. (b) Fusion of category features: Each domain receives a learnable embedding vector. After self‑attention, the domain vector is concatenated with the text semantic vector, passed through a linear layer with tanh activation, and multiplied by a projection matrix to obtain short‑ and long‑effect scores, followed by softmax normalization. (c) Multi‑granularity semantic fusion: For each of BERT’s 12 layers, token embeddings are averaged to produce a layer‑wise sentence vector. A linear layer learns importance weights for each layer; softmax normalizes these weights, and a weighted sum yields a fused semantic vector that captures both low‑level (syntactic) and high‑level (semantic) information.

Final system: The final architecture combines the rule‑based modules, the key‑sentence + BERT module, and the two fusion schemes (category‑feature fusion and multi‑granularity semantic fusion) to deliver a robust and interpretable news‑timeliness predictor.

Conclusion: By integrating rule‑based cues, domain embeddings, and multi‑layer semantic information, BERT can be effectively adapted for industrial news‑timeliness tasks, offering both high accuracy and flexibility for iterative rule adjustments.

artificial intelligenceModel FusionNLPBERTText ClassificationNews Timeliness
NetEase Media Technology Team
Written by

NetEase Media Technology Team

NetEase Media Technology Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.