Artificial Intelligence 13 min read

Enterprise Large Model Deployment: Data Governance, Fine‑Tuning Strategies, and Cost Economics

The article examines how enterprises can adopt domain‑specific large models by addressing talent and cost challenges, outlining self‑supervised pre‑training, instruction fine‑tuning, data governance for unstructured data, dataset balance, model‑type selection, and integrated product solutions to achieve efficient, high‑performance AI deployments.

DataFunSummit

Dec 16, 2023

Enterprise Large Model Deployment: Data Governance, Fine‑Tuning Strategies, and Cost Economics

The interview with Ba Hai‑feng, President of Deepexi at DeepExi, highlights that the current focus of large‑model adoption is on domain‑specific models, requiring both demand‑side cost‑efficiency and supply‑side mature training techniques.

He explains that traditional machine‑learning teams are costly due to the need for diverse roles (data engineers, BI engineers, analysts, scientists, algorithm engineers), making it difficult for many enterprises to build such teams, whereas large models can lower the technical barrier by enabling a single user with AI copilots to perform tasks previously needing whole teams.

Training large models typically follows a pipeline of self‑supervised pre‑training, supervised fine‑tuning, and RLHF alignment, turning a generic model like Llama 2 13B into a specialized chatbot with a fraction of the original data.

Fine‑tuning reduces hallucinations and improves consistency, requiring only 0.1‑1 % of the original data volume. Effective data governance is crucial; unlike traditional structured‑data governance, enterprise AI now must manage high‑cost, unstructured data for model training.

Dataset quality must balance flexibility, diversity, and accuracy. A practical mix of 30 % domain data and 70 % generic data yields models that retain adaptability while achieving high accuracy, lowering overall data acquisition costs.

Data types are divided into representation‑heavy tasks (e.g., re‑phrasing Java threads) and knowledge‑intensive Q&A tasks; the latter often require full‑parameter fine‑tuning with substantial GPU memory (e.g., 80 GB for Llama 2 13B).

DeepExi’s product ecosystem addresses efficiency, performance, and experience: a Fast5000E training‑inference appliance simplifies hardware deployment, while the FastAGI platform provides agents (Data Agent, Doc Agent, Plugin Agent) for rapid tool‑chain construction and customized solutions such as data‑analysis copilots.

The overall cost economics emphasize integrating data governance with agile practices, leveraging both small and large models to extract high‑quality data from unstructured sources, and delivering enterprise‑grade AI capabilities without prohibitive resource investment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model fine-tuning AI deployment Large Models Data Governance Enterprise AI cost economics

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.