Artificial Intelligence 14 min read

FinBERT 1.0: An Open‑Source Chinese Financial Domain Pre‑trained BERT Model and Its Evaluation

FinBERT 1.0 is an open‑source Chinese BERT model pre‑trained on large‑scale financial corpora that achieves 2‑5 % F1 improvements across multiple downstream fintech tasks without additional tuning, demonstrating the value of domain‑specific pre‑training for natural language processing.

DataFunTalk

Oct 24, 2020

FinBERT 1.0: An Open‑Source Chinese Financial Domain Pre‑trained BERT Model and Its Evaluation

FinBERT 1.0 is the first open‑source Chinese BERT model pre‑trained on large‑scale financial corpora, released by Entropy AI Lab to promote NLP in fintech.

Background: existing Chinese BERT models are generic; FinBERT targets the financial domain, achieving 2‑5 % F1 improvement on downstream tasks without extra tuning.

Model architecture: based on Google’s BERT‑Base (12‑layer Transformer); only the Base version is released.

Training data: ~3 million documents covering financial news (≈1 M), research reports & announcements (≈2 M), and financial encyclopedia entries (≈1 M), filtered to 30 billion tokens.

Pre‑training tasks: word‑level Financial Whole Word Mask (FWWM) and Next Sentence Prediction, plus supervised tasks—research‑report industry classification and financial‑entity recognition—using a two‑stage training schedule (sentence lengths 128 then 512).

Acceleration: TensorFlow XLA and Automatic Mixed Precision (AMP) reduce training time and memory, achieving ~3× speedup.

Experiments: four downstream benchmarks (financial short‑text type classification, industry classification, sentiment classification, and named‑entity recognition) compare FinBERT with Google Chinese BERT, BERT‑wwm and RoBERTa‑wwm‑ext; FinBERT consistently outperforms baselines by 2‑5 percentage points in F1.

Conclusion: FinBERT demonstrates the benefit of domain‑specific pre‑training for Chinese financial NLP; future work includes larger corpora and FinBERT 2.0/3.0 releases.

References and author information are provided, along with the GitHub repository https://github.com/valuesimplex/FinBERT.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning domain adaptation BERT Chinese pretrained language model financial NLP FinBERT

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.