Artificial Intelligence 12 min read

Bilibili Personal Attack Content Governance: Background, Goals, Methods, and Effectiveness

Bilibili combats personal‑attack and trolling comments by combining sector‑specific keyword databases, user‑group analysis, advanced word‑matching (including pinyin and homophone detection) and multiple NLP/graph models, which has cut personal‑attack reports in entertainment, film and gaming by about 32 % and trolling reports by roughly 25 % between June and December 2023.

Bilibili Tech

Feb 18, 2024

Bilibili Personal Attack Content Governance: Background, Goals, Methods, and Effectiveness

Governance Background and Objectives

Bilibili, as a comprehensive video community, faces challenges from personal attacks, insults, and defamatory comments that can damage the community atmosphere. The goal is to reduce exposure of such negative content, promote positive interaction, and create a friendly environment.

Current Situation of Personal Attack Content

Analysis of June 2022 reports shows that the main sources of negative comments are "trolling" and personal attacks. Short abusive terms are limited but their variants (homophones, initial‑letter matches, special characters/emojis) make detection difficult.

Sector‑Specific Issues

Eight sectors (life, entertainment, film, knowledge, technology, sports, games, music) were examined. Entertainment, film, and game sectors have the highest volume and concentration of personal‑attack reports, making them priority targets.

Specialized Governance Process

The process focuses on personal attacks (while also considering trolling). It includes:

Word‑matching identification using techniques such as pinyin recognition, numeric homophone detection, character similarity, and transformation mapping.

Model‑based recognition employing various NLP models (FastText, DPCNN, TextRCNN, Attention, BERT, tiny_bert) and graph‑based methods (GCN/GAT) to capture nuanced abusive language.

Examples of character‑similarity detection use the "phonetic‑shape code" concept to convert characters into machine‑readable numeric strings, enabling recognition of similar‑sounding or similarly‑shaped characters.

Key Strategies

The strategy is built on three dimensions:

Keyword dimension: A knowledge base linking entities to abusive keywords is maintained per sector, allowing precise matching of comments to risky entities.

User‑group dimension: Users are categorized by target (e.g., non‑real entities like celebrities or games vs. platform users/UPs). High‑risk user groups are identified through interaction patterns and relationships.

Content & UP dimension: High‑risk videos or UPs are flagged, and multi‑model fusion (linear regression, logistic regression, tree models) is used to improve recall of abusive content.

Governance Outcomes

After implementation, the proportion of personal‑attack reports in the entertainment, film, and game sectors dropped by 31.97% (from June to December 2023), and trolling reports decreased by 24.77%.

Summary and Outlook

While the reduction trend is evident, further improvements are needed, such as optimizing content ranking, automating negative‑keyword extraction, refining reporting workflows, and accelerating short‑cycle model training and deployment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning natural language processing content moderation text classification Bilibili abusive language detection

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.