Tag

Multimodal Pretraining

1 views collected around this technical thread.

AntTech
AntTech
Jul 31, 2023 · Artificial Intelligence

LayoutMask: Enhancing Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

LayoutMask introduces a novel multi-modal pre‑training model that replaces global 1D position with local 1D position and adds Whole Word Masking, Layout‑Aware Masking, and Masked Position Modeling, achieving state‑of‑the‑art results on various visually‑rich document understanding tasks.

AIDocument UnderstandingMultimodal Pretraining
0 likes · 15 min read
LayoutMask: Enhancing Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
DataFunTalk
DataFunTalk
Mar 20, 2020 · Artificial Intelligence

UNITER: Unified Image‑Text Representation Learning for Vision‑Language Tasks

This article introduces UNITER, a unified image‑text representation learning framework pretrained on four large multimodal datasets, describes its three pretraining tasks (MLM, ITM, MRM), details model architecture, training optimizations, and evaluates performance across six vision‑language downstream tasks, achieving state‑of‑the‑art results.

AIITMMLM
0 likes · 11 min read
UNITER: Unified Image‑Text Representation Learning for Vision‑Language Tasks