Artificial Intelligence 13 min read

Multimodal Large Model Platform: History, Architecture, and Practice by Nine Chapters Cloud Extreme DataCanvas

This article presents Nine Chapters Cloud Extreme DataCanvas's insights and practices on multimodal large model platforms, covering their historical development, platform components such as AI Foundation Software and Prompt Manager, practical implementations like memory-augmented models and ETL pipelines, and future prospects for enterprise knowledge bases and agents.

DataFunTalk
DataFunTalk
DataFunTalk
Multimodal Large Model Platform: History, Architecture, and Practice by Nine Chapters Cloud Extreme DataCanvas

The presentation, delivered by Miao Xu, Chief AI Scientist at Nine Chapters Cloud Extreme, shares the company’s thinking and practice on multimodal large‑model platforms.

It begins with a brief history of multimodal large models, tracing back to the 1956 Dartmouth AI workshop, the symbolic AI era, the AI winters of the 1980s‑90s, and the resurgence driven by large language models. Key milestones such as the 2020 Vision Transformer, OpenAI’s CLIP, and the rapid emergence of multimodal models in 2023 (e.g., PaLM‑E, Whisper, ImageBind, SAM, Kosmos‑2) are highlighted.

The talk explains what multimodal large models can achieve, including video understanding and summarisation, program classification, view‑count statistics, text‑to‑image generation, and embodied agents that combine visual perception with logical reasoning.

Nine Chapters Cloud Extreme’s DataCanvas platform is then described. The AI Foundation Software (AIFS) provides compute resources, high‑performance storage, and training tools (data annotation, sandbox experiments). The Model Tool LMOPS supports the full model lifecycle—data preparation, development, evaluation, and inference. Large Model Builder (LMB) offers one‑click distributed optimisations (data, tensor, pipeline parallelism) with visual control. Large Model Serving (LMS) implements quantisation, knowledge distillation, and pruning to reduce inference cost. Prompt Manager enables prompt design, version control, and template management for both technical and non‑technical users.

Practical experiences are shared next. A memory‑augmented multimodal model improves inference without sacrificing capacity. The platform’s DingoDB multimodal vector database combines ETL capabilities for unstructured data, offering compiled operators, parallel processing, and caching optimisations. Model construction follows three stages: (1) align a fixed language model with modality encoders; (2) optionally add a multimodal retrieval module; (3) optionally fine‑tune the language model for specific tasks. A knowledge‑base case demonstrates how the memory architecture enables efficient multimodal knowledge storage and retrieval, with a memory‑attention mechanism that boosts recall by about 10%.

Looking ahead, the speaker argues that 85 % of enterprise data is unstructured, and multimodal large models can unlock this data, potentially delivering ten‑fold value growth. Knowledge bases will become the foundation for various agents (R&D, customer service, sales, legal, HR, operations). A sales‑agent example shows how multimodal knowledge (text, images, tables) can be integrated to support decision‑making and continuously improve performance.

The session concludes with a thank‑you to the audience.

multimodal AIprompt engineeringKnowledge BaseLarge ModelsAI Platformmodel serving
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.