Showing 100 articles max

AI Large-Model Wave and Transformation Guide

May 29, 2026 · Big Data

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

The article analyzes why data‑governance projects still fail when combined with AI, presents a four‑layer NL2SQL architecture, details agent responsibilities, metadata‑governance methods, anomaly‑diagnosis and permission‑control flows, outlines dataset‑building stages, evaluation metrics, and provides a step‑by‑step rollout roadmap.

AI agentAnomaly DetectionDataset Construction

0 likes · 21 min read

How to Solve Data Governance + AI Agent Pitfalls: Agent Roles, NL2SQL Datasets, and Rule Templates Explained

DataFunTalk

May 28, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse with generic incremental compute, cutting architecture complexity, resource and development costs by one‑third while delivering second‑level queries over trillions of rows.

Big DataClickHouseData Architecture

0 likes · 21 min read

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

DataFunTalk

May 26, 2026 · Big Data

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

The article details MaxCompute's transformation into a cloud‑native, AI‑centric data warehouse, covering multi‑modal storage, model management, heterogeneous CPU/GPU scheduling, SQL AI functions, the MaxFrame Python framework, and several production case studies that demonstrate performance gains of up to 50% and elastic resource scaling to 160 000 cores.

Data+AIDistributed computingLarge‑model preprocessing

0 likes · 13 min read

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

Big Data Tech Team

May 25, 2026 · Big Data

AI Large Models Meet Data Warehouses: 3 Core Use Cases, 5 Common Pitfalls, and Best Practices

The article analyzes how AI large models can transform data‑warehouse development through three practical scenarios—automated modeling, intelligent data cleaning, and ops optimization—while exposing five frequent implementation traps and offering concrete best‑practice recommendations to achieve cost reduction, efficiency gains, and quality improvement.

AI large modelsAutomated modelingBest Practices

0 likes · 10 min read

AI Large Models Meet Data Warehouses: 3 Core Use Cases, 5 Common Pitfalls, and Best Practices

DataFunSummit

May 25, 2026 · Big Data

How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

This article details Hisense's journey to create an AI‑ready multimodal data platform, covering the challenges of integrating diverse business systems, the shift from a Hadoop‑based architecture to a cloud‑native data lake, the JuData governance and development platform, and six practical scenarios that demonstrate unified ingestion, metadata management, rule‑based quality control, intelligent asset retrieval, and future AI‑driven DataOps capabilities.

AI PlatformCloud NativeDataOps

0 likes · 23 min read

How Hisense Built an AI‑Ready Multimodal Data Platform: Storage, Governance, and Development

DataFunTalk

May 25, 2026 · Big Data

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

This article examines how Alibaba Cloud’s MaxCompute platform has been transformed for AI workloads, detailing its multi‑layer architecture, multimodal data storage, SQL AI functions, the Python‑based MaxFrame framework, and real‑world deployments in large‑model preprocessing, autonomous driving, and multimodal image labeling.

AIBig DataDistributed computing

0 likes · 12 min read

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

Big Data Tech Team

May 24, 2026 · Big Data

Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes

This guide compiles the most frequent interview pitfalls for data warehouse roles, covering SQL join and aggregation errors, window function misuse, subquery versus CTE performance myths, dimensional modeling mistakes, SCD implementation traps, layered design issues, data quality handling, ETL traps, Hive and Spark performance questions, real‑time warehousing considerations, and effective interview strategies.

Big DataETLHive

0 likes · 3 min read

Data Warehouse Interview Pitfall Guide 2.0: Avoid Common SQL, Modeling, and ETL Mistakes

DataFunSummit

May 22, 2026 · Big Data

How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine

OPPO tackles explosive multimodal data growth by unifying metadata with Gravitino and boosting I/O performance using the open‑source Curvine cache, delivering a four‑layer data‑lake architecture that resolves data islands, metadata chaos, and bandwidth bottlenecks while achieving near‑commercial query speeds.

CurvineDistributed CacheGravitino

0 likes · 11 min read

How OPPO Accelerates Multimodal Data & AI Fusion with Gravitino and Curvine

DataFunTalk

May 22, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

The article details Xiaohongshu's evolution from a simple ClickHouse‑based analytics layer to a Lambda‑enabled 2.0 stack and finally a Lakehouse‑based 3.0 architecture, showing how each iteration reduced infrastructure complexity, resource consumption and development effort by roughly one‑third while supporting trillions of daily events and AI‑driven use cases.

Big DataClickHouseData Architecture

0 likes · 21 min read

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

DataFunTalk

May 21, 2026 · Big Data

How Bitmap‑Based High‑Table Architecture Powers Mill‑Scale User Profiling and Real‑Time Crowd Selection

The article explains how a bitmap‑driven high‑table design (SelectDB) overcomes wide‑table storage bloat and latency to enable millisecond‑level crowd selection for tens of millions of users with hundreds of tag dimensions, while supporting dynamic tag expansion.

bitmapcrowd selectionreal-time analytics

0 likes · 2 min read

How Bitmap‑Based High‑Table Architecture Powers Mill‑Scale User Profiling and Real‑Time Crowd Selection

DataFunSummit

May 21, 2026 · Big Data

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

Facing a projected 85% of enterprises deploying internal agents within two years, Alibaba Cloud proposes an Agent-Ready big‑data AI infrastructure—comprising a unified data lake, real‑time processing, high‑dimensional vector retrieval, elastic model serving, and comprehensive security governance—that has already cut data‑development cycles from hours to 5‑10 minutes in internal model‑training and Taobao flash‑sale scenarios.

AIAgent-ReadyBig Data

0 likes · 15 min read

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

DataFunSummit

May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data

0 likes · 20 min read

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

AntTech

May 20, 2026 · Big Data

SIGMOD 2026: Shared Computation for Query Subgraph Matching & Fast MPC Shortest Paths

This article reviews two SIGMOD 2026 papers—MASC, which redefines multi‑query subgraph matching by maximizing shared computation to achieve up to two orders of magnitude speedup, and PrivHop, which combines 2‑hop labeling with secure multi‑party computation to enable privacy‑preserving shortest‑path queries on million‑node graphs with roughly a million‑fold reduction in runtime and communication.

MPCgraph algorithmsprivacy-preserving

0 likes · 5 min read

SIGMOD 2026: Shared Computation for Query Subgraph Matching & Fast MPC Shortest Paths

Big Data Tech Team

May 19, 2026 · Big Data

Enterprise Data Warehouse Development Playbook: Standard Engineering Edition

This playbook provides enterprise‑level data warehouse engineers, ETL developers, data modelers, and data‑team managers with a complete, logical, and actionable set of standards, processes, and best‑practice guidelines covering architecture, development principles, role responsibilities, end‑to‑end workflow, metadata, security, performance metrics, and team collaboration.

ETLPerformancedata modeling

0 likes · 18 min read

Enterprise Data Warehouse Development Playbook: Standard Engineering Edition

dbaplus Community

May 14, 2026 · Big Data

Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks

The article outlines the evolution of big data in banking, compares management models for heterogeneous data, describes the shift from data engineering to knowledge engineering, introduces LLMOps for high‑quality knowledge bases, and details how integrating AI and data can enable a “one‑sentence bank” that answers queries and executes tasks.

Artificial IntelligenceBig DataKnowledge Engineering

0 likes · 22 min read

Building a ‘One‑Sentence Bank’: Big Data and AI Fusion for Small Banks

DataFunSummit

May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse

0 likes · 11 min read

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

vivo Internet Technology

May 13, 2026 · Big Data

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

This article details Vivo's end‑to‑end upgrade of a YARN 2.6.0 cluster to a modern version for a million‑node, hundred‑thousand‑tasks‑per‑day platform, covering architectural evolution, scheduler migration, compatibility fixes, performance tuning, and service‑continuity strategies.

Big DataCapacity SchedulerCluster Upgrade

0 likes · 28 min read

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

DeWu Technology

May 13, 2026 · Big Data

How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse

The article explains how BP Claw tackles unstable AI coding results by automatically converting low‑quality PRD documents into structured, high‑quality requirements, applying token‑saving strategies, strict hallucination guards, and multi‑skill orchestration, which together boost FlinkSpec’s real‑time data‑warehouse delivery efficiency by up to 30%.

AI codingBP ClawBig Data

0 likes · 17 min read

How BP Claw Solves AI Coding Input Challenges in FlinkSpec’s Real‑Time Data Warehouse

Architect's Guide

May 13, 2026 · Big Data

Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance

The article introduces a visual drag‑and‑drop data flow platform that unifies stream and batch processing, offers version control, automatic fault tolerance, configurable data permissions, comprehensive monitoring, data alignment, and query templates, and presents single‑instance performance benchmarks of over 30k and 60k ops/s.

Data AlignmentData FlowDrag-and-Drop

0 likes · 7 min read

Next‑Gen Visual Drag‑Drop Data Flow Platform: Features, Architecture, and Performance

DataFunTalk

May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink

0 likes · 22 min read

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era