Artificial Intelligence 11 min read

Consumer Behavior Cause Extraction with BERT Fine‑tuning and a Novel Sequence‑Labeling Framework (ICDM 2020 Winning Solution)

At ICDM 2020, the JD Digits Silicon Valley team achieved top results in the Knowledge Graph Contest by fine‑tuning BERT and introducing a novel sequence‑labeling framework that jointly extracts consumer behavior types and their underlying reasons, leveraging CRF decoding and model ensemble for superior performance.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Consumer Behavior Cause Extraction with BERT Fine‑tuning and a Novel Sequence‑Labeling Framework (ICDM 2020 Winning Solution)

Background

Extracting the reasons behind consumer actions is a critical problem in many business scenarios such as content advertising and social listening. Advertisers now prefer embedding product functions into content to subtly encourage consumers to associate their brand with various actions, making accurate cause extraction essential for building effective systems.

Challenge

The Consumer Behavior Cause Extraction (CECE) task requires extracting both the behavior type and its underlying reason from text. Traditional approaches treat these as separate extraction problems, ignoring the dependency between them, which limits performance.

Proposed Solution Overview

The team introduced a new perspective by framing the problem as a joint sequence‑labeling task rather than separate extraction steps. Experiments showed that this framework outperforms baseline methods even when using a pre‑trained BERT encoder.

1. Data Preparation

To ensure high data quality, identifiers were removed from the text. For example, the raw entry "68771,Love doing makeup on all ages" was transformed to "Love doing makeup on all ages".

2. Model Architecture

BERT Encoder

The input format is [CLS] Brand/Product [SEP] Text [SEP] . A pre‑trained BERT model encodes the sequence, producing token‑level representations z_j that are fed into the tagging decoder.

Sequence Tagging Decoder

The decoder jointly predicts multiple behavior types and their reasons. Labels are constructed for each token, e.g., B_consumer interest , I_consumer interest , etc. A softmax layer decodes each label independently, while a Conditional Random Field (CRF) models label dependencies to produce the optimal tag sequence.

The CRF probability model is defined as: p(y|z;W,b) = \frac{\exp(\sum_i (W_{y_i} \cdot z_i + b_{y_i,y_{i-1}}))}{\sum_{y'} \exp(\sum_i (W_{y'_i} \cdot z_i + b_{y'_i,y'_{i-1}}))}

Training maximizes the conditional log‑likelihood, and decoding uses the Viterbi algorithm to find the best label sequence.

3. Model Ensemble

During the ensemble stage, a simple yet effective two‑step method was applied, yielding a 1.30% performance boost. Predictions from cross‑validation folds were aggregated, and a threshold was used to finalize the output.

4. Results

The proposed framework achieved the first place on the leaderboard of the first stage of the ICDM 2020 competition, demonstrating the effectiveness of joint sequence labeling with BERT and CRF.

Other Winning Solutions

One Japanese team used a GAN to generate additional training samples, labeling them as "Fake" and feeding the GAN output together with BERT embeddings into a Bi‑LSTM layer, extending the original five labels (Attention, Intention, Need, Purchase, Use) with a fake label.

Conclusion

The team plans to focus on recommendation systems and algorithms, launching a public WeChat account "炼丹笔记" to share academic and industrial recommendation research, practical solutions, and deep competition techniques.

Knowledge GraphBERTCRFsequence labelingconsumer behavior extractionICDM 2020
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.