Tag

Incremental Processing

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Sep 26, 2024 · Big Data

Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC

This article explains Apache Hudi's incremental processing capabilities, covering an overview of the medallion architecture, detailed configuration for incremental queries, the introduction of Change Data Capture (CDC) with required table properties, and a review of how these features enable richer data insights in modern data lake environments.

Apache HudiChange Data CaptureIncremental Processing
0 likes · 9 min read
Apache Hudi Incremental Processing and Change Data Capture (CDC): Overview, Incremental Query, and CDC
DataFunTalk
DataFunTalk
Jun 24, 2023 · Big Data

Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing

This article explains the evolution of Alibaba Cloud's MaxCompute platform into a lakehouse architecture that supports near‑real‑time incremental processing, detailing its development history, core design of transactional tables, five‑module technical stack, data ingestion methods, optimization services, transaction management, query capabilities, ecosystem integration, practical applications, future roadmap, and common user questions.

Incremental ProcessingLakehouseMaxCompute
0 likes · 24 min read
Design and Architecture of MaxCompute Lakehouse Near‑Real‑Time Incremental Processing
Shopee Tech Team
Shopee Tech Team
Sep 2, 2022 · Big Data

Shopee Data System Challenges and Apache Hudi Practices

Shopee tackled its data‑system bottlenecks by customizing Apache Hudi to provide unified stream‑batch integration, efficient state‑detail snapshots, and low‑latency wide‑table generation, using CDC‑based bootstrapping, COW/MOR tables, savepoints and partial updates, which cut latency to ten minutes, lowered resource use, and yielded several community‑backed enhancements.

Apache HudiBig DataIncremental Processing
0 likes · 18 min read
Shopee Data System Challenges and Apache Hudi Practices
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergDelta Lake
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
DataFunTalk
DataFunTalk
May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIFlinkIncremental Processing
0 likes · 27 min read
Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili
Big Data Technology Architecture
Big Data Technology Architecture
May 21, 2020 · Big Data

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

The article explains how Apache Hudi enables near‑real‑time data ingestion from various sources, supports low‑latency analytics, provides incremental processing pipelines, and simplifies data distribution on Hadoop, improving efficiency and reducing operational complexity.

Apache HudiHadoopIncremental Processing
0 likes · 6 min read
Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi
Architect
Architect
May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiIncremental ProcessingQuery Types
0 likes · 11 min read
An Overview of Apache Hudi: Architecture, Concepts, and Query Types
Big Data Technology Architecture
Big Data Technology Architecture
May 10, 2020 · Big Data

Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop

This article explains how Apache Hudi provides an incremental processing framework that enables efficient, low‑latency data ingestion, storage, and query capabilities on Hadoop, detailing its architecture, storage layout, compaction, write and read paths, and support for real‑time and batch analytics.

Data ingestionHadoopHudi
0 likes · 15 min read
Understanding Apache Hudi: Incremental Processing and Low‑Latency Data Management on Hadoop
Big Data Technology Architecture
Big Data Technology Architecture
Mar 22, 2020 · Big Data

Hudi Overview: Design, Architecture, and Use Cases from Uber

This article presents a comprehensive overview of Apache Hudi, covering its background, design motivations, architecture, view types, performance trade‑offs, compaction mechanisms, concurrency guarantees, and real‑world usage at Uber for managing petabyte‑scale data lakes.

HudiIncremental ProcessingUpsert
0 likes · 11 min read
Hudi Overview: Design, Architecture, and Use Cases from Uber
Big Data Technology Architecture
Big Data Technology Architecture
Mar 16, 2020 · Big Data

Understanding Apache Hudi: Concepts, Architecture, Usage, and Best Practices

This article introduces Apache Hudi, explains its architecture and storage models, describes how it enables upserts and incremental queries on Hadoop, provides step‑by‑step guidance for integrating Hudi with Apache Spark, and outlines best practices and comparisons with Apache Kudu.

Apache HudiHadoopIncremental Processing
0 likes · 10 min read
Understanding Apache Hudi: Concepts, Architecture, Usage, and Best Practices