Big Data 23 min read

How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

This article details Hisense's journey to create an AI‑ready multimodal data platform, covering the challenges of integrating diverse business systems, the shift from a Hadoop‑based architecture to a cloud‑native data lake, the JuData governance and development platform, and six practical scenarios that demonstrate unified ingestion, metadata management, rule‑based quality control, intelligent asset retrieval, and future AI‑driven DataOps capabilities.

DataFunSummit
DataFunSummit
DataFunSummit
How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

Hisense operates multiple B2C and B2B product lines—TVs, air conditioners, refrigerators, as well as commercial display, energy, and semiconductor businesses—each backed by distinct core IT systems such as PLM, HRM, ERP, and CMS. Integrating these siloed systems into a unified data lake is essential for turning data into a strategic asset for business decisions.

Key challenges include complex organizational structures, fragmented multimodal data sources, inconsistent data quality across product lines, and the lack of a unified metadata framework. For example, color codes for TV models differ from those used for refrigerators, making cross‑domain analytics difficult.

AI data demands are driven by two main needs: model training (which now relies on >80% unstructured data such as images, video, and text) and intelligent agent applications (customer service bots, production line agents, etc.). This creates new requirements for multimodal data storage, high‑performance processing, semantic governance, and unified metadata.

Data development and governance workflow is demand‑centric and follows the chain: requirement analysis → data source discovery → standard definition → model design → model development → quality inspection → data service → asset consolidation. The workflow ensures that each AI use case starts from a clear data need, leading to continuous business value generation.

Platform architecture evolution moves from a legacy Hadoop data warehouse (limited to structured data, suffering from version constraints, NameNode bottlenecks, and redundant compute engines like Presto, Druid, Kylin, Doris) to a cloud‑native data lake built on distributed storage orchestrated by Kubernetes (K8s). The new stack leverages open‑source components such as Spark, Hive, Flink, Paimon, and Doris, offering simple cluster provisioning, elastic scaling, and compute‑storage separation for petabyte‑scale multimodal data.

JuData platform provides a one‑stop solution for multimodal data ingestion, processing, governance, and AI consumption. It supports batch and real‑time ingestion via APIs, standardizes metadata extraction, enforces quality rules (completeness, accuracy, timeliness), and offers preprocessing pipelines (feature extraction, augmentation, cross‑modal fusion). Data assets are stored in hot‑cold layers with multi‑dimensional indexing, enabling efficient retrieval.

Practical scenarios :

Scenario 1 – "One‑stop storage": Multi‑source multimodal data are ingested, cleaned, quality‑checked, and stored in layered directories, achieving unified lake storage for all data types.

Scenario 2 – "Full‑process standardization": SQL or script development, visual job orchestration, real‑time monitoring, and operational dashboards provide end‑to‑end control of development, scheduling, monitoring, and maintenance.

Scenario 3 – "Unified catalog and association hub": Full metadata capture from ERP, PLM, IoT systems, visualized metadata views, and mapping between technical and business entities create a unified data catalog.

Scenario 4 – "Rule definition, real‑time monitoring, and issue closure": Configurable quality rule templates, continuous validation, automated alerts, and one‑click remediation ensure data reliability.

Scenario 5 – "AI‑powered asset retrieval": Cross‑modal semantic search (text, image) with relevance ranking, intelligent recommendation, and natural language interaction enables precise data discovery.

Scenario 6 – "AI‑driven recommendation (TikTok‑style)": Visual page content is transformed into structured business information, supporting automated report generation and downstream decision making.

Scenario 7 – "Intelligent DataOps and diagnostics": AI assistants provide natural language interfaces for development, scheduling, monitoring, and automated root‑cause analysis, boosting DataOps efficiency.

Future outlook emphasizes intelligent DataOps (SQL generation, automated testing, smart labeling) and Data Agent applications such as intelligent data extraction, analysis, metric generation, and decision support.

Overall, the case study demonstrates how a demand‑driven, cloud‑native multimodal data platform can overcome traditional data silos, provide robust governance, and enable AI‑ready services across a large enterprise.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud Nativedata governancedata lakeAI PlatformDataOpsmultimodal dataJuData
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.