Tagged articles

182 articles

Page 1 of 2

May 21, 2026 · Big Data

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

Facing a projected 85% of enterprises deploying internal agents within two years, Alibaba Cloud proposes an Agent-Ready big‑data AI infrastructure—comprising a unified data lake, real‑time processing, high‑dimensional vector retrieval, elastic model serving, and comprehensive security governance—that has already cut data‑development cycles from hours to 5‑10 minutes in internal model‑training and Taobao flash‑sale scenarios.

AIAgent-ReadyBig Data

0 likes · 15 min read

Alibaba Cloud’s Agent-Ready Big Data AI Infrastructure: Boosting Data Development from Hours to Minutes

Lao Guo's Learning Space

Apr 29, 2026 · Big Data

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

The article dissects a credit data system architecture, detailing six logical layers—from multi-source data collection and feature engineering (including graph features and feature stores) to model training, real‑time stream processing, decision engine integration, and privacy‑preserving computation—while explaining the trade‑offs, tools, and performance targets needed for accurate, low‑latency risk assessment.

Credit ScoringFeature StoreFlink

0 likes · 16 min read

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

JD Retail Technology

Jan 5, 2026 · Big Data

How JD’s Data Lake Uses Hudi LSM‑Tree to Power Near‑Real‑Time Data Assets

The article details JD’s data lake architecture, its 500 PB scale, self‑developed Hudi extensions—including LSM‑Tree‑based MoR tables, custom indexing, IO optimizations, Flink stream scheduling, and NativeIO SDK—along with benchmarks, community contributions, and future roadmap for real‑time big‑data processing.

Big DataHudiLSM‑Tree

0 likes · 19 min read

How JD’s Data Lake Uses Hudi LSM‑Tree to Power Near‑Real‑Time Data Assets

Alibaba Cloud Observability

Dec 29, 2025 · Cloud Native

How to Seamlessly Import Massive S3 Logs into Alibaba Cloud SLS with Real‑Time Analysis

This article explains how to centralize and analyze massive multi‑cloud log data stored in object storage by moving AWS S3 logs into Alibaba Cloud Log Service (SLS) using dual‑mode file discovery, SQS event‑driven import, elastic scaling, and pre‑ingestion processing to achieve low latency, high reliability, and cost efficiency.

AWS S3Real-time Processingalibaba-sls

0 likes · 12 min read

How to Seamlessly Import Massive S3 Logs into Alibaba Cloud SLS with Real‑Time Analysis

DataFunSummit

Dec 10, 2025 · Big Data

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

The article recaps the Apache Hudi Asia Meetup hosted by JD, covering community updates, JD's data‑lake challenges, the upcoming Hudi 1.1 release, JD's architectural redesign, Kuaishou's real‑time lake adoption, and Huawei Cloud's deep optimizations, all aimed at building an AI‑native, real‑time lakehouse.

AI-nativeApache HudiFlink

0 likes · 13 min read

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

Instant Consumer Technology Team

Oct 29, 2025 · Big Data

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Facing PB‑scale user behavior data and millions of feature dimensions, the platform transformed its search, advertising, and recommendation pipelines by adopting a distributed, configurable‑service architecture that delivers high‑throughput streaming, elastic storage, rapid feature iteration, and robust fault‑tolerance for AI‑driven personalization.

Big DataData ArchitectureDistributed Systems

0 likes · 17 min read

Revolutionizing Feature Engineering with Distributed Tech & Configurable Services

Huolala Tech

Oct 22, 2025 · Backend Development

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

To ensure fund safety and robust operations, the team built a real‑time reconciliation platform that leverages Kafka, and after encountering scaling bottlenecks with a static consumer model, they implemented a dynamic, partition‑level, weighted load‑balancing consumer cluster that supports automatic scaling and high‑throughput processing.

Backend ArchitectureDistributed SystemsDynamic Scaling

0 likes · 15 min read

Scaling Real‑Time Reconciliation with Dynamic Kafka Consumer Clusters

JD Retail Technology

Aug 8, 2025 · Big Data

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

This article examines JD.com's migration of its massive traffic data processing from a dual Lambda architecture to an integrated lakehouse solution, detailing the challenges, innovative optimizations with Flink and Hudi, performance gains, cost reductions, and future directions for real‑time data handling.

Big DataFlinkHudi

0 likes · 10 min read

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

58 Tech

Aug 7, 2025 · Big Data

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

This article details how a real‑time data warehouse built on Flink, Kafka, HBase and MySQL was redesigned using Paimon to eliminate costly deduplication, handle out‑of‑order events, enable streaming reads, simplify aggregation, replace multiple lookup sources, and achieve faster, more reliable batch repairs, resulting in major resource and operational gains.

Data WarehouseFlinkLakehouse

0 likes · 24 min read

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

Alibaba Cloud Big Data AI Platform

Aug 5, 2025 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.

AlibabaBig DataData Platform

0 likes · 25 min read

How Alibaba Built a World‑Class Big Data Platform Over a Decade

Linux Cloud Computing Practice

May 29, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and Key Interview Topics

This article explains why Kafka is essential for modern data engineering, highlighting its widespread adoption, high throughput, scalability, durability, integration with streaming ecosystems, and common real‑time use cases, while also providing a concise list of interview topics for aspiring engineers.

Real-time ProcessingStreamingdata pipelines

0 likes · 6 min read

Why Learn Kafka? Core Benefits, Use Cases, and Key Interview Topics

Full-Stack Internet Architecture

May 27, 2025 · Big Data

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

This article explains Kafka's event streaming concept, detailing events and streams, core components such as producers, topics, partitions, consumers, persistence, and typical real‑time data pipeline, event‑driven architecture, stream processing, and log aggregation use cases, highlighting its role as a foundational big‑data infrastructure.

Event StreamingKafkaReal-time Processing

0 likes · 7 min read

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

Full-Stack Internet Architecture

May 20, 2025 · Big Data

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

This article explains why Kafka is widely adopted by top companies, outlines its high throughput, scalability, and durability, and describes key real‑time data pipeline, stream processing, and big‑data integration scenarios, concluding that mastering Kafka is essential for modern backend and data engineering roles.

KafkaReal-time Processingdata engineering

0 likes · 4 min read

Why Learn Kafka? Core Benefits, Use Cases, and a Summary

php Courses

Apr 7, 2025 · Backend Development

Implementing Sliding Window Algorithms in PHP for Real-Time Data Processing

This article introduces the sliding window technique, demonstrates efficient PHP implementations for computing averages and handling real-time streams, provides optimization strategies, and outlines practical applications such as financial analysis, network monitoring, and recommendation systems, highlighting performance considerations for backend development.

PHPReal-time ProcessingSliding Window

0 likes · 6 min read

Implementing Sliding Window Algorithms in PHP for Real-Time Data Processing

Alibaba Cloud Observability

Apr 1, 2025 · Cloud Native

Shift Data Processing Left with SPL: Low‑Code, High‑Performance Cloud‑Native Solutions

This article explains how SPL rule consumption moves data cleaning and preprocessing to the server side, enabling low‑code, high‑performance, cloud‑native real‑time processing that reduces client complexity, latency, and bandwidth costs while integrating with services like Flink and DataWorks.

Log ServiceLow‑codeReal-time Processing

0 likes · 10 min read

Shift Data Processing Left with SPL: Low‑Code, High‑Performance Cloud‑Native Solutions

Alibaba Cloud Native

Mar 25, 2025 · Cloud Native

Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing

Alibaba Cloud Log Service’s new SPL‑based rule consumption lets users move complex data‑cleaning logic from client code to the server, offering low‑code configuration, high performance, precise filtering, and significant reductions in latency, bandwidth, and compute resources across typical scenarios such as Python SDK processing and Flink integration.

Data cleaningLog ServiceLow‑code

0 likes · 11 min read

Shift Data Cleaning Server‑Side with SPL: Boost Real‑Time Log Processing

AntData

Mar 20, 2025 · Big Data

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

This article presents a comprehensive exploration of using Apache Paimon and Flink to design lake tables that support minute‑level latency, low cost, and unified batch‑stream processing for advertising data, covering schema design, partitioning strategies, performance trade‑offs, cost analysis, and operational best practices.

Big DataFlinkPaimon

0 likes · 34 min read

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

Zhuanzhuan Tech

Mar 13, 2025 · Backend Development

Design and Implementation of a Real-Time Product Tagging Platform for a Second‑Hand E‑Commerce System

This article presents a comprehensive technical case study of a three‑layer product‑tagging platform that addresses the challenges of fine‑grained operations, ensures real‑time tag updates, guarantees data consistency, and eliminates read bottlenecks through traffic separation, event‑driven processing, deduplication MQ, and multi‑level caching.

Backend ArchitectureCachingData Consistency

0 likes · 13 min read

Design and Implementation of a Real-Time Product Tagging Platform for a Second‑Hand E‑Commerce System

Huolala Safety Emergency Response Center

Jan 9, 2025 · Information Security

Detecting API Anomalous Traffic with Big Data and Machine Learning

This article outlines a comprehensive approach to API anomaly detection, covering background, objectives, a two‑layer framework with offline and real‑time feature pipelines, threshold profiling, detection methods, strategy types, and operational practices to mitigate data leakage and abuse.

Anomaly DetectionBig DataReal-time Processing

0 likes · 10 min read

Detecting API Anomalous Traffic with Big Data and Machine Learning

Alibaba Cloud Big Data AI Platform

Oct 25, 2024 · Big Data

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

This article, based on Alibaba Cloud expert Li Lubing’s presentation, examines the rapid growth of China’s new energy vehicle market, outlines typical automotive big‑data architectures, compares Lambda and real‑time lakehouse solutions built with Flink and Apache Paimon, and showcases real‑world customer deployments.

AutomotiveBig DataCloud Computing

0 likes · 18 min read

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

JD Retail Technology

Oct 11, 2024 · Big Data

JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans

This article presents JD Retail's data lake architecture overhaul, detailing the shortcomings of the Lambda model, the migration to Flink‑Hudi‑Spark pipelines, performance gains, storage savings, unified APIs, and upcoming improvements for resilience and automation.

Big DataFlinkHudi

0 likes · 11 min read

JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans

DataFunSummit

Aug 30, 2024 · Big Data

Kuaishou's Data Lake Journey with Apache Hudi: Architecture Evolution, Use Cases, and Lessons Learned

The article details Kuaishou's adoption of a data lake powered by Apache Hudi, covering the challenges of growing data warehouses, the migration from Hive to Hudi, concrete business case studies, promotion strategies, and key takeaways for large‑scale data engineering.

Apache HudiBig DataData Warehouse

0 likes · 12 min read

Kuaishou's Data Lake Journey with Apache Hudi: Architecture Evolution, Use Cases, and Lessons Learned

Mike Chen's Internet Architecture

Jul 15, 2024 · Big Data

Master Distributed Computing: Hadoop, Spark, and Flink Explained

This article introduces the fundamentals of distributed computing, compares major frameworks such as Hadoop, Spark, and Flink, and outlines their key components, performance characteristics, and typical application scenarios including big‑data analytics, cloud services, real‑time streaming, and scientific computing.

Big DataDistributed computingFlink

0 likes · 7 min read

Master Distributed Computing: Hadoop, Spark, and Flink Explained

21CTO

Jul 15, 2024 · Big Data

Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events

Twitter migrated from a Lambda-based dual‑pipeline system to a Kappa architecture that relies on a single real‑time stream using Kafka, Google Pub/Sub, Dataflow, and BigTable, dramatically reducing latency, increasing throughput, and improving data accuracy for processing billions of daily events.

Big DataCloud ComputingDataFlow

0 likes · 9 min read

Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events

DataFunSummit

Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi

0 likes · 13 min read

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

DataFunTalk

May 13, 2024 · Big Data

Data Integration Maturity Model: From ETL to EtLT

The article examines the evolution of data integration architectures—from traditional ETL through ELT to the emerging EtLT model—highlighting their advantages, disadvantages, industry trends, maturity stages, and practical guidance for enterprises and professionals navigating modern big‑data pipelines.

Big DataDataOpsELT

0 likes · 31 min read

Data Integration Maturity Model: From ETL to EtLT

iQIYI Technical Product Team

Apr 26, 2024 · Big Data

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

FlinkIcebergReal-time Processing

0 likes · 18 min read

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

DataFunSummit

Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink

0 likes · 15 min read

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

Xiaohongshu Tech REDtech

Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData WarehouseFlink

0 likes · 19 min read

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

DataFunTalk

Feb 27, 2024 · Big Data

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.

Big DataCloud NativeData Warehouse

0 likes · 20 min read

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

DataFunTalk

Feb 25, 2024 · Big Data

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

This article details Bilibili's tag system from its 2021 inception through successive redesigns, describing the three‑layer architecture, data flow pipelines using Hive, Iceberg, Spark and ClickHouse, crowd selection DSL, online services with Redis, performance optimizations, and upcoming governance and quality initiatives.

Big DataClickHouseReal-time Processing

0 likes · 12 min read

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

DataFunSummit

Jan 25, 2024 · Big Data

Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning

This article presents Jushuitan's cloud‑native OLAP architecture, covering business background, data‑warehouse evolution, real‑time processing with Flink, Hologres, and Aerospike, and detailed logistics‑warning use cases, followed by technical challenges, future outlook, and a Q&A on implementation details.

Big DataData WarehouseFlink

0 likes · 20 min read

Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning

vivo Internet Technology

Jan 24, 2024 · Big Data

Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing

Vivo’s trillion‑scale data platform evolved into a dual‑active real‑time and offline architecture that leverages multi‑datacenter clusters, Kafka/Pulsar caching, a unified sorting layer, HBase‑backed dimension tables, and micro‑batch Spark jobs to deliver low‑cost, high‑performance processing, 99.9% availability, and 99.9995% data‑integrity.

Data ArchitectureHBaseOffline Computing

0 likes · 16 min read

Evolution of Vivo's Trillions-Scale Data Architecture: Dual-Active Real-Time and Offline Computing

Bilibili Tech

Dec 15, 2023 · Artificial Intelligence

Bilibili's AI-Powered Video Frame Interpolation: Techniques, Challenges, and Deployment

Bilibili’s AI‑driven frame‑interpolation pipeline upgrades low‑frame-rate videos to smooth high‑frame-rate 1080p playback by optimizing optical‑flow models for large motion, texture and text artifacts, pruning for speed, and deploying via the BVT SDK across on‑demand and live streams.

AIMultimediaReal-time Processing

0 likes · 14 min read

Bilibili's AI-Powered Video Frame Interpolation: Techniques, Challenges, and Deployment

Zhuanzhuan Tech

Dec 14, 2023 · Big Data

Design and Implementation of a Data Service Platform for New Media Business

This article details the background, challenges, design principles, and implementation of a unified data service platform—including data modeling, multi-source governance, real-time processing, and a Doris-based storage solution—to support large‑scale video data for a new media operation.

Apache DorisData PlatformReal-time Processing

0 likes · 7 min read

Design and Implementation of a Data Service Platform for New Media Business

Big Data Technology & Architecture

Nov 13, 2023 · Big Data

Rapid Detection and Resolution of Kafka Data Errors: Ensuring Timeliness, Quality, and Stability

The article examines a real‑world Kafka record error that surfaced after 8 am, outlines how to quickly locate and correct the issue by 10 am while minimizing impact, and presents comprehensive strategies for timeliness, data quality, and stability in real‑time data pipelines.

FlinkKafkaMonitoring

0 likes · 9 min read

Rapid Detection and Resolution of Kafka Data Errors: Ensuring Timeliness, Quality, and Stability

DataFunSummit

Oct 18, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, outlines the shortcomings of its previous Lambda architecture, describes the adoption of Apache Hudi for unified batch‑stream processing, and details the five major technical challenges and the corresponding solutions implemented to improve performance, consistency, and operational reliability.

Apache HudiBig DataData Architecture

0 likes · 17 min read

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

DataFunSummit

Oct 1, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation introduces Iceberg's core capabilities, details Xiaomi's practical applications—including log ingestion, near‑real‑time warehousing, offline challenges, column‑level encryption, and Hive migration—and outlines future development directions such as materialized views and cloud migration, providing a comprehensive view of modern data‑lake engineering.

Big DataFlinkIceberg

0 likes · 22 min read

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

Big Data Technology & Architecture

Sep 18, 2023 · Big Data

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

The article explains the mainstream Lambda data‑warehouse architecture, its benefits and challenges, then introduces Hudi as a lake‑house solution that unifies real‑time and offline storage, describes the multi‑layer service design, and showcases three practical scenarios—stream processing, real‑time multidimensional analysis, and stream‑batch data reuse—demonstrating how the integrated architecture improves latency, cost, and operational complexity.

Batch ProcessingData WarehouseHudi

0 likes · 13 min read

Unified Real‑Time and Batch Data Warehouse Architecture with Hudi Lakehouse

21CTO

Sep 8, 2023 · Big Data

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

Real-time data processing transforms traditional batch pipelines by delivering fresh, low‑latency data to millions of concurrent users, leveraging event‑driven architectures, streaming engines, and real‑time databases, with use cases ranging from fraud detection to personalized e‑commerce and operational dashboards, and includes reference architectures and tool recommendations.

Big DataReal-time ProcessingStreaming

0 likes · 16 min read

Why Real-Time Data Processing Is the Next Frontier for Data Engineers

Qunar Tech Salon

Aug 25, 2023 · Big Data

Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value

This article presents a comprehensive case study of Qunar Travel's Customer Data Platform (CDP), detailing its business background, operational pain points, architectural design, tag production and quality processes, real‑time labeling, crowd selection techniques, deployment safeguards, measurable business impact, and future development directions.

CDPCustomer DataQunar Travel

0 likes · 20 min read

Customer Data Platform (CDP) at Qunar Travel: Architecture, Construction Practices, and Business Value

Huolala Tech

Aug 3, 2023 · Big Data

Building a Scalable Ad Attribution Platform: Architecture & Real‑Time Data Flow

This article explains how to design and implement a scalable ad attribution platform, covering data collection, real‑time processing with Kafka, storage in HBase, deduplication strategies, attribution models, and configurable media integration to maximize ROI for marketers.

Ad AttributionHBaseKafka

0 likes · 25 min read

Building a Scalable Ad Attribution Platform: Architecture & Real‑Time Data Flow

dbaplus Community

Aug 2, 2023 · Backend Development

How WeChat Built a Scalable Security Data Warehouse for Billions of Requests

This article explains the evolution of WeChat's security data warehouse—from its business background and the need for unified feature storage to the architectural designs, multi‑IDC synchronization, operation system, and data‑quality safeguards that enable reliable, high‑performance security policy development for over a trillion daily feature reads and writes.

Feature ManagementReal-time Processingdata quality

0 likes · 12 min read

How WeChat Built a Scalable Security Data Warehouse for Billions of Requests

Ctrip Technology

Jul 20, 2023 · Big Data

Building an Offline‑Online Data Warehouse at Ctrip: Architecture, Goals, and Practices

This article presents Ctrip's practical experience of constructing an offline‑online data warehouse, detailing business pain points, objectives, system architecture, component design, data quality measures, and future directions to achieve scalable, real‑time data processing and management.

CtripData WarehouseFlink

0 likes · 9 min read

Building an Offline‑Online Data Warehouse at Ctrip: Architecture, Goals, and Practices

Top Architect

Jul 14, 2023 · Big Data

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

This article introduces the Lambda Architecture for billion‑scale real‑time data analysis, explains its three layers—Batch, Speed, and Serving—covers its flexibility, fault tolerance, and scalability, and demonstrates concrete applications such as Twitter hashtag analysis and a smart‑parking recommendation system.

Batch LayerBig DataLambda architecture

0 likes · 11 min read

Lambda Architecture: Real-Time Big Data Processing and Practical Use Cases

ByteFE

Jul 12, 2023 · Artificial Intelligence

Image Processing and WebAssembly: From Basic Filters to OpenCV Applications

This article explores image processing techniques from basic filters to advanced OpenCV applications, demonstrating how WebAssembly enables high-performance image processing in web browsers.

AssemblyScriptFiltersMachine Learning

0 likes · 16 min read

Image Processing and WebAssembly: From Basic Filters to OpenCV Applications

DataFunTalk

Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Big Data ArchitectureData WarehouseHudi

0 likes · 20 min read

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

MaGe Linux Operations

Jun 20, 2023 · Big Data

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

Kafka is an open‑source, distributed streaming platform that uses a publish/subscribe message queue architecture to provide high‑throughput, fault‑tolerant real‑time data processing, featuring topics, partitions, replicas, consumer groups, and multiple APIs for producers, consumers, streams, connectors, and administration.

Big DataDistributed StreamingKafka

0 likes · 20 min read

What Is Kafka? A Beginner’s Guide to Distributed Streaming and Messaging

DataFunTalk

May 15, 2023 · Big Data

Kuaishou Data Lake Construction with Apache Hudi: Architecture, Challenges, and Solutions

This article explains why Kuaishou built a data lake, describes its Hudi‑based architecture, outlines five major challenges encountered during implementation, and presents the solutions and future development plans, illustrating performance improvements and practical use cases across various business scenarios.

Apache HudiBig DataFlink

0 likes · 19 min read

DataFunSummit

Apr 25, 2023 · Big Data

Building a Real-Time Data Lake with Hudi: Architecture, Challenges, and Practices

This article presents Huawei's end‑to‑end solution for constructing a real‑time data lake on Hudi, covering requirement analysis, technology selection, architectural design, ingestion and processing challenges, practical optimizations, and future improvement directions.

ETL/ELTFlinkHudi

0 likes · 14 min read

Building a Real-Time Data Lake with Hudi: Architecture, Challenges, and Practices

ITPUB

Apr 8, 2023 · Big Data

How Bilibili Cut Data Pipeline Costs by 20% with Flink Real‑Time Incremental Computing

Facing daily terabyte‑scale data ingestion and costly duplicate reads in its ODS‑to‑DWD pipeline, Bilibili introduced a Flink‑based real‑time incremental computation and multi‑level partition shuffling, dramatically reducing read amplification, cutting resource usage by ~20%, improving latency to minutes, and enhancing scalability.

Big DataFlinkReal-time Processing

0 likes · 19 min read

How Bilibili Cut Data Pipeline Costs by 20% with Flink Real‑Time Incremental Computing

dbaplus Community

Apr 8, 2023 · Big Data

How Zhihu Built a Scalable DMP: Architecture, Data Pipelines, and Real‑Time Targeting

This article details Zhihu's Data Management Platform (DMP), covering the business problems it solves, the end‑to‑end workflow, feature taxonomy, system architecture, data pipelines for batch and streaming, audience targeting processes, performance challenges, and future technical directions.

Big DataDMPData Platform

0 likes · 8 min read

How Zhihu Built a Scalable DMP: Architecture, Data Pipelines, and Real‑Time Targeting

Tencent Cloud Developer

Mar 8, 2023 · Artificial Intelligence

Building a Scalable Recommendation System for WeChat Games: Architecture and Implementation

The article describes WeChat Games’ scalable recommendation system, detailing its four‑component architecture—offline ML platform, unified management, online DAG‑based engine, and peripheral services—along with a hybrid algorithm library, feature engineering, real‑time monitoring, and solutions that boost engagement across diverse game recommendation scenarios.

Data ManagementReal-time ProcessingScalability

0 likes · 28 min read

Building a Scalable Recommendation System for WeChat Games: Architecture and Implementation

Baidu Geek Talk

Mar 6, 2023 · Big Data

Accelerating Data Production and Consumption in Baidu's Performance Platform

Baidu's Performance Platform speeds data production and consumption by adopting a unified stream‑batch architecture with TM and Spark, leveraging the Turing warehouse, introducing tiered service grading, robust governance and compliance measures, and offering self‑service analytics, cutting latency from minutes or days to milliseconds while handling billions of daily records and boosting SLA adherence, data accuracy, and user satisfaction.

Big DataReal-time Processingdata engineering

0 likes · 12 min read

Accelerating Data Production and Consumption in Baidu's Performance Platform

Alibaba Cloud Developer

Feb 28, 2023 · Artificial Intelligence

How a Dual‑Way Sign Language Digital Human Transforms Communication for the Deaf

This article describes the severe shortage of sign‑language teachers worldwide, presents user demographics, outlines the challenges of bidirectional sign‑language translation, and details the cloud‑native AI architecture, data pipeline, and real‑time recognition and synthesis techniques behind the virtual digital human "Sign Language Translator".

AIReal-time Processingdigital human

0 likes · 17 min read

How a Dual‑Way Sign Language Digital Human Transforms Communication for the Deaf

StarRing Big Data Open Lab

Feb 17, 2023 · Big Data

Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time

This article details Xinghuan Technology’s evolution from 2013 to the present, describing its self‑developed Big Data 3.0 stack—including a unified data platform, SQL‑centric development, cloud‑native resource scheduling, distributed storage managed by Raft, DAG‑based compute engines, and real‑time stream processing—while highlighting key milestones and design principles that differentiate it from traditional Hadoop‑based solutions.

Data PlatformReal-time ProcessingSQL Optimizer

0 likes · 19 min read

Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time

dbaplus Community

Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

ClickHouseFlinkIceberg

0 likes · 32 min read

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

DataFunSummit

Jan 10, 2023 · Big Data

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

This article presents a comprehensive overview of Iceberg's adoption in Huawei Terminal Cloud, covering its architectural overview, key features such as Git‑style data management, real‑time processing, acceleration layers, and future development directions, along with a Q&A session addressing performance and implementation details.

Big DataFlinkGit-style Data Management

0 likes · 15 min read

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

DataFunTalk

Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data PlatformDataXETL

0 likes · 31 min read

Architect

Dec 31, 2022 · Big Data

Elasticsearch and Logstash Tutorial: Installation, Configuration, and Flight Data Import

This tutorial explains how to install and configure Elasticsearch and Kibana, demonstrates CRUD operations, bulk data import, and shows how to use Logstash to ingest, transform, and index flight JSON data, covering both batch and near‑real‑time processing techniques.

ElasticsearchJSONJava

0 likes · 31 min read

Elasticsearch and Logstash Tutorial: Installation, Configuration, and Flight Data Import

DataFunTalk

Dec 10, 2022 · Big Data

Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation

Based on expert interviews, the article outlines the current development traits of data warehouses—standardization through data governance, real‑time processing, modular architecture, and holistic evaluation—while linking these trends to emerging concepts such as data middle platforms, data lakes, and DataOps.

Real-time Processingmodular architecture

0 likes · 13 min read

Key Development Trends of Data Warehouses: Standardization, Real‑time Processing, Modularity, and Holistic Evaluation

21CTO

Nov 9, 2022 · Operations

How Ctrip Handles Billions of Logs Daily: Real‑Time Monitoring, Clog, CAT & TSDB

This article details Ctrip’s large‑scale log monitoring architecture, covering the overall Overview, the Clog log system, the CAT tracing platform, and the internal TSDB solution, explaining how billions of logs are processed in real time with low latency, high reliability, and efficient querying.

Big DataDistributed SystemsLog Monitoring

0 likes · 12 min read

How Ctrip Handles Billions of Logs Daily: Real‑Time Monitoring, Clog, CAT & TSDB

ByteDance Data Platform

Oct 28, 2022 · Big Data

How ByteDance’s BitSail is Revolutionizing Data Integration at Scale

BitSail, ByteDance’s open‑source data integration engine built on Flink, has evolved through three major versions to support batch, streaming and CDC modes, handling over 200,000 daily tasks across 20+ data sources, and aims to meet real‑time, cloud‑native integration demands.

Cloud NativeFlinkOpen Source

0 likes · 14 min read

How ByteDance’s BitSail is Revolutionizing Data Integration at Scale

DaTaobao Tech

Oct 17, 2022 · Artificial Intelligence

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

In this AI Live Stream, two Taobao Technology engineers present how causal representation learning enables unbiased data augmentation and factor‑controllable generation to boost fine‑grained image classification, while also unveiling a real‑time color‑enhancement technique that merges cascaded lookup tables with dynamic neural networks, illustrating modern AI trends and practical deployment strategies.

AI AlgorithmsFine-Grained ClassificationReal-time Processing

0 likes · 4 min read

AI Live Stream: Causal Representation Learning and Real-time Color Enhancement

Sohu Tech Products

Sep 7, 2022 · Big Data

Introducing the Fire Framework: Annotation‑Driven Development for Spark and Flink

The Fire framework, open‑source by ZTO Express, provides a unified annotation‑based programming model for real‑time Spark and Flink jobs, dramatically reducing boilerplate, simplifying configuration, and enabling rapid development of large‑scale data processing tasks with concise Scala code examples.

Fire FrameworkFlinkReal-time Processing

0 likes · 12 min read

Introducing the Fire Framework: Annotation‑Driven Development for Spark and Flink

Big Data Technology & Architecture

Aug 23, 2022 · Big Data

Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

This article explains how Flink's Broadcast State feature can be used to dynamically update processing rules and enrich streaming events with user information from MySQL, showing configuration, code examples, key considerations, and runtime results that demonstrate real‑time adaptability without restarting the job.

Broadcast StateDynamic ConfigurationFlink

0 likes · 15 min read

Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

NetEase LeiHuo UX Big Data Technology

Aug 3, 2022 · Big Data

Understanding Spark Streaming Checkpoint Mechanism for Real‑Time Feature Computation

The article explains how Spark Streaming's checkpoint mechanism works, detailing the four-step process—from setting the checkpoint directory to writing RDD data and finalizing the checkpoint—highlighting its role in ensuring fault‑tolerant, fast recovery for real‑time recommendation feature pipelines.

Big DataCheckpointReal-time Processing

0 likes · 7 min read

Understanding Spark Streaming Checkpoint Mechanism for Real‑Time Feature Computation

Big Data Technology & Architecture

Jul 27, 2022 · Big Data

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

This article provides a comprehensive tutorial on setting up Flink 1.11 with Iceberg 0.11.1, creating Hive catalogs, building databases and tables, inserting data, and exploring Iceberg components, file structures, partitioned tables, execution plans, and programmatic access via Scala.

Big DataFlinkHadoop

0 likes · 10 min read

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

Laravel Tech Community

Jul 19, 2022 · Backend Development

The Evolution and Architecture of China’s 12306 Railway Ticketing System

This article examines the historical development, distributed architecture, and high‑concurrency challenges of China’s 12306 railway ticketing platform, tracing its origins from early Unix‑based systems to modern multi‑layered backend solutions that support hundreds of millions of users during peak travel periods.

Backend ArchitectureDistributed SystemsRailway

0 likes · 8 min read

The Evolution and Architecture of China’s 12306 Railway Ticketing System

DataFunSummit

Jul 15, 2022 · Big Data

Apache DolphinScheduler Practice at Xinwang Bank

Xinwang Bank leverages Apache DolphinScheduler to handle over 9,000 daily task instances across real‑time, near‑real‑time, and offline batch scenarios, detailing background, application scenarios, optimizations, workflow improvements, import/export enhancements, alert system upgrades, and future plans to expand data‑ops capabilities.

Apache DolphinSchedulerBig DataDataOps

0 likes · 13 min read

Apache DolphinScheduler Practice at Xinwang Bank

dbaplus Community

Jul 13, 2022 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

From data ingestion to real‑time analytics, this guide breaks down the essential layers of a typical big‑data platform—covering collection methods, HDFS storage, Hive/Spark analysis, data sharing mechanisms, application use‑cases, streaming with Spark Streaming, and the need for robust scheduling and monitoring.

Big DataData WarehouseHDFS

0 likes · 9 min read

Unpacking the Core Technologies Behind Modern Big Data Platforms

High Availability Architecture

Jun 29, 2022 · Big Data

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

During a pre‑GIAC interview, Shopee data engineer Deng Lin discusses the evolution of data lakes and warehouses, lakehouse integration, big‑data technology choices, real‑time processing with Flink and Kafka, and offers career advice for newcomers to the big‑data field.

Big DataFlinkKafka

0 likes · 10 min read

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

DataFunTalk

Jun 23, 2022 · Big Data

Real‑Time Low‑Latency Log Monitoring and Storage at Ctrip: Architecture, Clog System, CAT Tracing, and TSDB

This article details Ctrip's large‑scale, real‑time log monitoring solution, covering the overall monitoring architecture, the Clog log system, the CAT tracing platform, and the TSDB metric store, and explains design choices such as write‑heavy indexing, segment‑based storage, and migration to ClickHouse for high‑cardinality data.

Distributed SystemsLog MonitoringReal-time Processing

0 likes · 11 min read

Real‑Time Low‑Latency Log Monitoring and Storage at Ctrip: Architecture, Clog System, CAT Tracing, and TSDB

DataFunSummit

Jun 6, 2022 · Artificial Intelligence

Event Graphs in Intelligent Customer Service: Concepts, Applications, and System Architecture

This article introduces event graphs as a knowledge‑centric representation of dynamic events, explains their construction and real‑time processing in Meituan's intelligent customer service, and demonstrates applications such as event timeline extraction, hotspot detection, event prediction, multi‑turn dialogue guidance, and business decision support.

AIEvent SchemaIntelligent Customer Service

0 likes · 16 min read

Event Graphs in Intelligent Customer Service: Concepts, Applications, and System Architecture

Architecture Digest

May 23, 2022 · Big Data

Overview of Core Technologies in a Big Data Platform Architecture

This article explains the main layers of a typical big data platform—data collection, storage and analysis, sharing, and application—detailing common tools such as Flume, DataX, Hive, Spark, SparkSQL, Impala, and Spark Streaming, and discusses task scheduling and monitoring in the ecosystem.

Data PlatformDataXHadoop

0 likes · 10 min read

Overview of Core Technologies in a Big Data Platform Architecture

58 Tech

Apr 26, 2022 · Information Security

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

The article presents a comprehensive design of an end‑to‑end data warehouse for information‑security governance, detailing background motivations, multi‑layer data architecture, dimension modeling, bus‑matrix mapping, real‑time (lambda/kappa) processing, data‑dictionary integration, and future directions toward unified streaming‑batch solutions.

Data WarehouseInformation SecurityReal-time Processing

0 likes · 16 min read

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

Xianyu Technology

Apr 13, 2022 · Big Data

Real-time Multi-system Data Aggregation for Fan Tag System

The Xianyu fan‑tag system solves the challenge of displaying full‑history purchase counts with real‑time updates and low‑latency, high‑throughput queries by daily exporting multi‑system data to a LevelDB‑based KV store, converting schemas, and applying real‑time compensation from transaction and follow‑change messages, merging offline and live data to produce sorted fan lists at ~10 k QPS.

KV storageReal-time Processingdata aggregation

0 likes · 6 min read

Real-time Multi-system Data Aggregation for Fan Tag System

Kuaishou Big Data

Feb 25, 2022 · Big Data

How Kuaishou Scales Data Sync: Architecture, Challenges, and Future Plans

This article details the design, evolution, and optimization of Kuaishou's data synchronization platform, covering business overview, architecture, key technologies, performance tuning, data source protection, incremental data lake integration, and future roadmap for a unified data fabric.

Big DataReal-time Processingarchitecture

0 likes · 15 min read

How Kuaishou Scales Data Sync: Architecture, Challenges, and Future Plans

dbaplus Community

Feb 15, 2022 · Big Data

Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies

This comprehensive guide explains data warehouse fundamentals, architecture layers, modeling methods such as dimensional and entity modeling, metadata management, and the transition from offline to real‑time processing with Lambda and Kappa architectures, providing practical steps, best practices, and key terminology for building robust analytical platforms.

Big DataData WarehouseETL

0 likes · 63 min read

Mastering Data Warehouse Architecture: Concepts, Modeling Techniques, and Real‑Time Strategies

Kuaishou Tech

Jan 27, 2022 · Artificial Intelligence

Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing

This article explains the principles, challenges, and implementation details of Kuaishou’s proprietary green‑screen matting algorithm, covering fine‑detail handling, color‑spill reduction, green‑reflection removal, and its real‑time deployment across mobile video‑editing and live‑streaming products.

KuaishouReal-time Processingcomputer vision

0 likes · 13 min read

Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing

DataFunSummit

Dec 13, 2021 · Big Data

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Directions

This article presents the design, implementation, and operational experience of Tencent's game big‑data analysis platform, covering its background, the offline, online, and real‑time multi‑dimensional analysis engines, practical use cases, performance optimizations, and future roadmap.

Game AnalyticsReal-time ProcessingTencent

0 likes · 14 min read

Tencent Game Big Data Analysis Engine: Architecture, Practices, and Future Directions

DataFunSummit

Dec 6, 2021 · Big Data

Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline

This article reviews the background, architecture, and a series of performance‑optimizing techniques—including consumption, batch, storage, and execution‑engine tweaks—applied to a real‑time pipeline that processes hundreds of billions of records daily, and presents the resulting resource savings and latency improvements.

KafkaPerformance OptimizationReal-time Processing

0 likes · 9 min read

Design and Performance Optimization of a Real‑Time Billion‑Scale Data Processing Pipeline

Baidu Geek Talk

Nov 24, 2021 · Big Data

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Apache DorisBig Data ArchitectureData Warehouse

0 likes · 23 min read

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

21CTO

Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureData Warehouse

0 likes · 33 min read

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Java High-Performance Architecture

Oct 12, 2021 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

This article breaks down a typical big data platform architecture into its four layers—data collection, storage and analysis, sharing, and real‑time computation—detailing the essential tools such as Flume, HDFS, Hive, Spark, DataX, and task scheduling systems that enable scalable, low‑latency data processing and delivery.

Big DataData ArchitectureDataX

0 likes · 8 min read

IT Architects Alliance

Sep 5, 2021 · Big Data

Big Data Platform Architecture: Core Layers, Technologies, and Practices

This article outlines a typical big data platform architecture, detailing its core layers—data acquisition, storage and analysis, sharing, application, real‑time computation, and task scheduling—while introducing key technologies such as Flume, HDFS, Hive, Spark, DataX, and monitoring considerations.

Big DataData PlatformHadoop

0 likes · 9 min read

Big Data Platform Architecture: Core Layers, Technologies, and Practices

Architects' Tech Alliance

Sep 2, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

The article outlines a typical big data platform architecture, detailing its core layers—data collection, storage and analysis, sharing, application, real-time computation, and task scheduling—while describing key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and Redis.

Data ArchitectureHadoopReal-time Processing

0 likes · 9 min read

Core Technologies and Architecture of a Big Data Platform

NetEase Smart Enterprise Tech+

Aug 23, 2021 · Artificial Intelligence

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

NetEase Cloud Communication’s Audio Lab presents a low‑complexity neural‑network denoising algorithm that effectively suppresses both stationary and transient noises while preserving speech quality, detailing its mathematical model, feature design, loss function, GRU‑based architecture, real‑time performance, and comparative evaluation against state‑of‑the‑art methods.

Neural NetworkReal-time Processingaudio denoising

0 likes · 13 min read

How a Lightweight Neural Network Cuts Transient Noise in Real‑Time Audio

JD Retail Technology

Aug 12, 2021 · Big Data

Design and Implementation of JD Mini‑Program Custom Data Analysis Service

This article presents the technical solution and key processes of JD's mini‑program custom data analysis service, covering business background, ClickHouse‑based storage design, real‑time processing pipelines, dynamic rule parsing, table architecture, monitoring mechanisms, and future outlook for large‑scale data analytics.

ClickHouseCustom Data AnalysisData Architecture

0 likes · 13 min read

Design and Implementation of JD Mini‑Program Custom Data Analysis Service

Volcano Engine Developer Services

Aug 3, 2021 · Big Data

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

This article, compiled from a Volcano Engine meetup, explains how ByteDance’s unified traffic platform designs, governs, and processes massive event‑tracking data in real time, covering embedding content solutions, link architecture, dynamic processing engines, and data‑governance practices that support trillions of daily events.

Big DataReal-time Processingdata engineering

0 likes · 16 min read

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

ITPUB

Jul 7, 2021 · Big Data

How NetEase Cloud Music Scaled Its Data Warehouse for Billion‑User Traffic

This article details NetEase Cloud Music's journey of redesigning its data warehouse and governance processes to support over a billion monthly active users, covering pain points, standardization, shared services, self‑service tools, and the resulting improvements in data quality, latency, and operational efficiency.

AnalyticsData PlatformReal-time Processing

0 likes · 19 min read

How NetEase Cloud Music Scaled Its Data Warehouse for Billion‑User Traffic

Youzan Coder

Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataFlinkHive

0 likes · 12 min read

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Yuewen Technology

Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataCloud ComputingReal-time Processing

0 likes · 12 min read

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

DataFunTalk

Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataFlinkHive

0 likes · 13 min read

Flink + Iceberg 0.11 Practices in Qunar Data Platform

Qunar Tech Salon

Jun 21, 2021 · Big Data

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

This article examines the challenges of using Kafka, Flink, and Hive for real‑time data warehousing, introduces Apache Iceberg 0.11 as a solution, details its architecture, query planning, Flink integration, code examples, optimization techniques, and summarizes the benefits for large‑scale data processing.

Big DataFlinkIceberg

0 likes · 12 min read

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

IT Architects Alliance

Jun 5, 2021 · Big Data

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

This article walks through a complete real‑time recommendation system built on Apache Flink, detailing its v2.0 architecture, modules for user behavior, interest, and product profiling, the recommendation algorithms (hot‑list, collaborative filtering, item similarity), and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka.

DockerFlinkHBase

0 likes · 11 min read

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

Big Data Technology Architecture

May 31, 2021 · Big Data

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

This article presents Qunar's practical experience with Flink and Iceberg 0.11, covering background challenges such as Kafka data loss and Hive metadata pressure, explaining Iceberg architecture, query planning, and detailed solutions including real‑time ingestion, small‑file handling, sorting, and code examples for seamless migration.

FlinkIcebergReal-time Processing

0 likes · 12 min read

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

IT Architects Alliance

May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 11 min read

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

Tencent Cloud Developer

May 21, 2021 · Big Data

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Big DataFlink SQLOceanus

0 likes · 19 min read

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Architecture Digest

May 7, 2021 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Practices

This article provides a detailed introduction to data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, data governance, service layers, monitoring, and deployment patterns, illustrating how enterprises build unified data ecosystems across various industries.

Big DataData PlatformData Warehouse

0 likes · 25 min read

Comprehensive Overview of Data Middle Platform Architecture and Practices