Showing 100 articles max

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

The article analyzes why data‑governance projects still fail when combined with AI, presents a four‑layer NL2SQL architecture, details agent responsibilities, metadata‑governance methods, anomaly‑diagnosis and permission‑control flows, outlines dataset‑building stages, evaluation metrics, and provides a step‑by‑step rollout roadmap.

AI agentAnomaly DetectionDataset Construction
0 likes · 21 min read
How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained
DataFunTalk
DataFunTalk
May 28, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse with generic incremental compute, cutting architecture complexity, resource and development costs by one‑third while delivering second‑level queries over trillions of rows.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
May 26, 2026 · Big Data

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

The article details MaxCompute's transformation into a cloud‑native, AI‑centric data warehouse, covering multi‑modal storage, model management, heterogeneous CPU/GPU scheduling, SQL AI functions, the MaxFrame Python framework, and several production case studies that demonstrate performance gains of up to 50% and elastic resource scaling to 160 000 cores.

Data+AIDistributed computingLarge‑model preprocessing
0 likes · 13 min read
How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases
Big Data Tech Team
Big Data Tech Team
May 25, 2026 · Big Data

AI Large Models Meet Data Warehouses: 3 Core Use Cases, 5 Common Pitfalls, and Best Practices

The article analyzes how AI large models can transform data‑warehouse development through three practical scenarios—automated modeling, intelligent data cleaning, and ops optimization—while exposing five frequent implementation traps and offering concrete best‑practice recommendations to achieve cost reduction, efficiency gains, and quality improvement.

AI large modelsAutomated modelingBest Practices
0 likes · 10 min read
AI Large Models Meet Data Warehouses: 3 Core Use Cases, 5 Common Pitfalls, and Best Practices
DataFunSummit
DataFunSummit
May 25, 2026 · Big Data

How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

This article details Hisense's journey to create an AI‑ready multimodal data platform, covering the challenges of integrating diverse business systems, the shift from a Hadoop‑based architecture to a cloud‑native data lake, the JuData governance and development platform, and six practical scenarios that demonstrate unified ingestion, metadata management, rule‑based quality control, intelligent asset retrieval, and future AI‑driven DataOps capabilities.

AI PlatformCloud NativeDataOps
0 likes · 23 min read
How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development
DataFunTalk
DataFunTalk
May 25, 2026 · Big Data

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

This article examines how Alibaba Cloud’s MaxCompute platform has been transformed for AI workloads, detailing its multi‑layer architecture, multimodal data storage, SQL AI functions, the Python‑based MaxFrame framework, and real‑world deployments in large‑model preprocessing, autonomous driving, and multimodal image labeling.

AIBig DataDistributed computing
0 likes · 12 min read
MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases
Big Data Tech Team
Big Data Tech Team
May 24, 2026 · Big Data

Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes

This guide compiles the most frequent interview pitfalls for data warehouse roles, covering SQL join and aggregation errors, window function misuse, subquery versus CTE performance myths, dimensional modeling mistakes, SCD implementation traps, layered design issues, data quality handling, ETL traps, Hive and Spark performance questions, real‑time warehousing considerations, and effective interview strategies.

Big DataETLHive
0 likes · 3 min read
Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes
DataFunSummit
DataFunSummit
May 22, 2026 · Big Data

How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine

OPPO tackles explosive multimodal data growth by unifying metadata with Gravitino and boosting I/O performance using the open‑source Curvine cache, delivering a four‑layer data‑lake architecture that resolves data islands, metadata chaos, and bandwidth bottlenecks while achieving near‑commercial query speeds.

CurvineDistributed CacheGravitino
0 likes · 11 min read
How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine
DataFunTalk
DataFunTalk
May 22, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

The article details Xiaohongshu's evolution from a simple ClickHouse‑based analytics layer to a Lambda‑enabled 2.0 stack and finally a Lakehouse‑based 3.0 architecture, showing how each iteration reduced infrastructure complexity, resource consumption and development effort by roughly one‑third while supporting trillions of daily events and AI‑driven use cases.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era
DataFunSummit
DataFunSummit
May 21, 2026 · Big Data

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

Facing a projected 85% of enterprises deploying internal agents within two years, Alibaba Cloud proposes an Agent-Ready big‑data AI infrastructure—comprising a unified data lake, real‑time processing, high‑dimensional vector retrieval, elastic model serving, and comprehensive security governance—that has already cut data‑development cycles from hours to 5‑10 minutes in internal model‑training and Taobao flash‑sale scenarios.

AIAgent-ReadyBig Data
0 likes · 15 min read
Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes
DataFunSummit
DataFunSummit
May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data
0 likes · 20 min read
How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture
AntTech
AntTech
May 20, 2026 · Big Data

SIGMOD 2026: Shared Computation for Query Subgraph Matching & Fast MPC Shortest Paths

This article reviews two SIGMOD 2026 papers—MASC, which redefines multi‑query subgraph matching by maximizing shared computation to achieve up to two orders of magnitude speedup, and PrivHop, which combines 2‑hop labeling with secure multi‑party computation to enable privacy‑preserving shortest‑path queries on million‑node graphs with roughly a million‑fold reduction in runtime and communication.

MPCgraph algorithmsprivacy-preserving
0 likes · 5 min read
SIGMOD 2026: Shared Computation for Query Subgraph Matching & Fast MPC Shortest Paths
Big Data Tech Team
Big Data Tech Team
May 19, 2026 · Big Data

Enterprise Data Warehouse Development Playbook: Standard Engineering Edition

This playbook provides enterprise‑level data warehouse engineers, ETL developers, data modelers, and data‑team managers with a complete, logical, and actionable set of standards, processes, and best‑practice guidelines covering architecture, development principles, role responsibilities, end‑to‑end workflow, metadata, security, performance metrics, and team collaboration.

ETLPerformancedata modeling
0 likes · 18 min read
Enterprise Data Warehouse Development Playbook: Standard Engineering Edition
dbaplus Community
dbaplus Community
May 14, 2026 · Big Data

Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks

The article outlines the evolution of big data in banking, compares management models for heterogeneous data, describes the shift from data engineering to knowledge engineering, introduces LLMOps for high‑quality knowledge bases, and details how integrating AI and data can enable a “one‑sentence bank” that answers queries and executes tasks.

Artificial IntelligenceBig DataKnowledge Engineering
0 likes · 22 min read
Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks
DataFunSummit
DataFunSummit
May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse
0 likes · 11 min read
How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse
vivo Internet Technology
vivo Internet Technology
May 13, 2026 · Big Data

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

This article details Vivo's end‑to‑end upgrade of a YARN 2.6.0 cluster to a modern version for a million‑node, hundred‑thousand‑tasks‑per‑day platform, covering architectural evolution, scheduler migration, compatibility fixes, performance tuning, and service‑continuity strategies.

Big DataCapacity SchedulerCluster Upgrade
0 likes · 28 min read
How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations
DeWu Technology
DeWu Technology
May 13, 2026 · Big Data

How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse

The article explains how BP Claw tackles unstable AI coding results by automatically converting low‑quality PRD documents into structured, high‑quality requirements, applying token‑saving strategies, strict hallucination guards, and multi‑skill orchestration, which together boost FlinkSpec’s real‑time data‑warehouse delivery efficiency by up to 30%.

AI codingBP ClawBig Data
0 likes · 17 min read
How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse
Architect's Guide
Architect's Guide
May 13, 2026 · Big Data

Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance

The article introduces a visual drag‑and‑drop data flow platform that unifies stream and batch processing, offers version control, automatic fault tolerance, configurable data permissions, comprehensive monitoring, data alignment, and query templates, and presents single‑instance performance benchmarks of over 30k and 60k ops/s.

Data AlignmentData FlowDrag-and-Drop
0 likes · 7 min read
Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance
DataFunTalk
DataFunTalk
May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink
0 likes · 22 min read
How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era