Tag

masking strategies

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 30, 2022 · Artificial Intelligence

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

The OPPO XiaoBu team introduced OBERT, a series of 100M‑, 300M‑, and 1B‑parameter pretrained language models that leverage massive TB‑scale corpora, multi‑granular masking, retrieval‑augmented training, and distributed acceleration to achieve state‑of‑the‑art results on CLUE and KgCLUE benchmarks while enabling efficient industrial deployment.

Fine‑tuningNLPdistributed training
0 likes · 12 min read
OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications