Absolute Semantic Recognition Competition: Feature Design, Modeling Strategy, and Core Algorithm Insights
This article presents a comprehensive solution to the absolute semantic recognition competition, detailing the problem background, dataset, evaluation metrics, feature engineering, model architecture—including Attention, Capsule, Bi‑GRU, and BERT—and analysis of results and lessons learned.
Abstract Based on the final presentation PPT, this article outlines the complete competition solution, focusing on feature design concepts, core modeling ideas, and algorithmic principles, covering Attention, Capsule, Bi‑GRU, and BERT.
Competition Background Semantic recognition is a crucial component of NLP with broad applications and high market value. The task aims to distinguish absolute expressions from contextual words to reduce advertising violations.
Dataset The training set contains 80,000 advertising sentences, with ~35,000 labeled as violating (label=1) and ~45,000 as non‑violating (label=0).
Goal Predict whether an advertising sentence violates regulations.
Evaluation Metric F‑score (β=1) calculated from precision and recall.
Algorithm Core Design
Feature Engineering Traditional TF‑IDF extracts contextual phrase features for absolute expressions, while word‑level and character‑level embeddings (Word2Vec, GloVe) capture contextual word features; BERT encodes contextual word vectors.
Model Architecture
TF‑IDF features fed to LR, LightGBM, XGBoost for absolute‑expression detection.
Word embeddings used in a TextNN network; Bi‑GRU and Capsule networks enrich sequential and spatial representations.
Attention Mechanism
Attention is introduced into Bi‑GRU and Bi‑LSTM to capture global and local dependencies, mitigating the weakness of long‑range sequence modeling.
Capsule TextNN
Capsule networks model vectors instead of scalars, capturing spatial structure; low‑level capsules receive RNN outputs, and dynamic routing builds high‑level semantic capsules for downstream dense layers.
Optimizer Improvement
The Adam optimizer is modified to apply L2 weight decay after learning‑rate scaling, decoupling them; warm‑up is used in the first epoch and learning rate is reduced in the second.
BERT Application
BERT‑Base Chinese (12‑layer, 768‑hidden, 110M parameters) provides deep bidirectional context; a max‑pooling layer after the transformer extracts the most expressive features for classification.
Result Analysis
Visualization of model weights shows strong detection of absolute expressions and varying influence of contextual words across different sentence orders.
Competition Experience Summary
The final solution used simple voting ensemble without stacking or blending; data augmentation was not applied due to time constraints.
Interesting Findings
Stage A’s second‑place score came from a BERT+LR hybrid model.
With data augmentation, TF‑IDF+LR alone could achieve top‑3 performance.
Final Remarks
Participating in algorithm competitions helps maintain competitive and learning states; sharing experiences across structured data, NLP, and computer vision fosters collective progress.
References
Devlin et al., BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding, 2018.
Loshchilov & Hutter, Fixing Weight Decay Regularization in Adam, 2017.
Raffel & Ellis, Feed‑forward networks with attention can solve some long‑term memory problems, 2015.
Zhao et al., Investigating capsule networks with dynamic routing for text classification, 2018.
He et al., Secaps: A sequence enhanced capsule model for charge prediction, 2019.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.