Tag

flow data

1 views collected around this technical thread.

Architect
Architect
Nov 1, 2021 · Fundamentals

Document Rendering and Structured Data Extraction in Baidu Wenku: From Layout Data to Flow Data and Chart Metadata

The article explains Baidu Wenku's document conversion pipeline, detailing how various office formats are transformed into PDF layout data, then into adaptive flow data for mobile devices, and describes the technical methods for extracting structured content and chart metadata from PDFs and OOXML documents.

Baidu WenkuData ExtractionOOXML
0 likes · 11 min read
Document Rendering and Structured Data Extraction in Baidu Wenku: From Layout Data to Flow Data and Chart Metadata