AntTech
Jun 15, 2022 · Artificial Intelligence
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.
Document UnderstandingMultimodalVision Transformer
0 likes · 10 min read