Tag

Real-time Ingestion

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
May 17, 2024 · Big Data

Comprehensive Hudi Real-Time Data Lake Ingestion Solutions

This article presents a complete guide to Hudi-based real-time data lake ingestion, covering overall data integration architecture, batch and streaming ingestion strategies, advanced table design, and practical recommendations for handling challenges such as deduplication, latency, partitioning, and performance optimization.

Batch ProcessingData LakeHudi
0 likes · 12 min read
Comprehensive Hudi Real-Time Data Lake Ingestion Solutions
DataFunTalk
DataFunTalk
Oct 28, 2023 · Big Data

Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices

This article presents a comprehensive overview of a unified data lake architecture, evaluates three ingestion solutions, details real‑time ingestion optimizations for Flink‑Hudi pipelines, and describes how Kyuubi enables unified query access across multiple engines, offering practical guidance for large‑scale data processing.

Data LakeFlinkHudi
0 likes · 14 min read
Data Lake Architecture, Ingestion Options, Real-time Optimization, and Query Practices
DataFunTalk
DataFunTalk
Mar 29, 2023 · Big Data

Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture

This article details the motivation, architectural evolution, and technical implementations of ByteHouse's real‑time ingestion pipeline, covering internal business requirements, distributed‑system challenges, the custom HaKafka engine, memory‑table optimizations, and the transition to a cloud‑native design that delivers high availability, low‑latency, and exactly‑once semantics.

ByteHouseCloud NativeDistributed Architecture
0 likes · 13 min read
Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture
DataFunTalk
DataFunTalk
Nov 3, 2020 · Big Data

Xiaomi Growth Analytics System: Architecture Evolution and Doris Optimization

The article details Xiaomi's growth analytics platform evolution from a Lambda architecture using SparkSQL, Kudu, and HDFS to a streamlined MPP solution with Apache Doris, covering performance gains, real‑time data ingestion, query tuning, and operational improvements for large‑scale analytics.

Apache DorisGrowth AnalyticsOLAP
0 likes · 20 min read
Xiaomi Growth Analytics System: Architecture Evolution and Doris Optimization
Big Data Technology Architecture
Big Data Technology Architecture
May 21, 2020 · Big Data

Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi

The article explains how Apache Hudi enables near‑real‑time data ingestion from various sources, supports low‑latency analytics, provides incremental processing pipelines, and simplifies data distribution on Hadoop, improving efficiency and reducing operational complexity.

Apache HudiHadoopIncremental Processing
0 likes · 6 min read
Near Real-Time Ingestion, Analysis, Incremental Pipelines, and Data Distribution with Apache Hudi