Tagged articles
6 articles
Page 1 of 1
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

Is Multimodal RAG the Cure for Enterprise Knowledge‑Base Bottlenecks? The ‘Where to Retrieve’ Challenge

The article analyzes how multimodal Retrieval‑Augmented Generation expands retrieval objects beyond text chunks, why the "where to retrieve" problem is as critical as "what to retrieve" in enterprise knowledge bases, and how Google Gemini's File Search and recent industry research illustrate the shift toward verifiable, multimodal evidence.

AI RetrievalEnterprise Knowledge BaseGemini API
0 likes · 7 min read
Is Multimodal RAG the Cure for Enterprise Knowledge‑Base Bottlenecks? The ‘Where to Retrieve’ Challenge
Architecture Digest
Architecture Digest
May 12, 2026 · Artificial Intelligence

Tencent Open‑Sources WeKnora: An AI‑Powered Document Understanding Framework

WeKnora, Tencent's newly open‑source framework built on the IMA kernel, combines LLM and RAG to parse unstructured PDFs, Word files and scans with over 300% speed improvement and 89% top‑10 retrieval precision, offering modular deployment, secure private‑cloud options, and seamless integration with vector databases and the WeChat ecosystem.

Knowledge BaseLLMOpen Source
0 likes · 8 min read
Tencent Open‑Sources WeKnora: An AI‑Powered Document Understanding Framework
HyperAI Super Neural
HyperAI Super Neural
Sep 26, 2025 · Artificial Intelligence

Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding

IBM’s newly released open‑source model Granite‑Docling‑258M tackles the long‑standing challenge of converting diverse digital documents into machine‑readable, structured data by preserving layout, tables, formulas, and supporting multiple languages, while remaining lightweight at 258 M parameters and outperforming its predecessor SmolDocling‑256M‑Preview.

DoclingIBMOCR
0 likes · 5 min read
Redefining Next‑Gen OCR: IBM’s Open‑Source Granite‑Docling‑258M for Unified Structure and Content Understanding
DataFunTalk
DataFunTalk
Jun 29, 2024 · Artificial Intelligence

Document Intelligence in the Financial Sector: Technologies, Challenges, and Future Directions

This presentation reviews the technical scope of document intelligence, its specific applications and challenges in finance, recent advances in document analysis, recognition, and understanding, and outlines future research directions for large‑model and multimodal solutions in processing complex financial documents.

Large Modelsdeep learningdocument AI
0 likes · 28 min read
Document Intelligence in the Financial Sector: Technologies, Challenges, and Future Directions
AntTech
AntTech
Nov 15, 2023 · Artificial Intelligence

Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction

The paper identifies reading‑order disorder as a critical obstacle in visually‑rich document information extraction, proposes a Token Path Prediction model with grid‑label formulation, introduces re‑annotated FUNSD‑r and CORD‑r datasets, and demonstrates SOTA performance on NER, entity linking, and reading‑order prediction tasks.

Layout AnalysisNERdocument AI
0 likes · 17 min read
Reading Order Matters: Information Extraction from Visually‑rich Documents by Token Path Prediction
Laiye Technology Team
Laiye Technology Team
May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Layout AnalysisMultimodaldocument AI
0 likes · 15 min read
Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc