Tag

data streaming

0 views collected around this technical thread.

Sanyou's Java Diary
Sanyou's Java Diary
Dec 2, 2024 · Big Data

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower mechanics, logical and physical storage models, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, log retention and compaction, indexing, zero‑copy transmission, and the reliability concepts that ensure data durability.

KafkaMessage QueueReliability
0 likes · 18 min read
Understanding Kafka: Core Architecture, Storage, and Reliability Explained
Code Mala Tang
Code Mala Tang
Jul 5, 2024 · Frontend Development

Master TransformStream: Real-World Uses, Code Samples, and Common Pitfalls

TransformStream, a core component of the Streams API, enables developers to process and convert data chunks on the fly, offering examples ranging from simple text uppercase conversion to complex scenarios like compression, video transcoding, real-time IoT filtering, and handling common pitfalls such as errors and backpressure.

JavaScriptNode.jsTransformStream
0 likes · 13 min read
Master TransformStream: Real-World Uses, Code Samples, and Common Pitfalls
Java Architect Essentials
Java Architect Essentials
Jun 26, 2024 · Databases

Why Organizations Should Consider Using Apache Kafka Instead of Relational Databases

This article explains why organizations may replace traditional relational databases with Apache Kafka as a system of record, highlighting Kafka's economic, scalable, immutable log capabilities, event replay, flexibility for diverse use cases, and its suitability for highly regulated, data‑intensive environments.

DatabaseImmutable LogKafka
0 likes · 10 min read
Why Organizations Should Consider Using Apache Kafka Instead of Relational Databases
Big Data Technology Architecture
Big Data Technology Architecture
Nov 28, 2023 · Big Data

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

This article demonstrates, with step‑by‑step examples, how to capture MySQL changes via Flink CDC and stream them in real time into Apache Doris using the Doris Flink Connector, covering CDC concepts, connector features, environment setup, SQL client usage, and data verification.

Apache DorisCDCConnector
0 likes · 13 min read
Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector
Sanyou's Java Diary
Sanyou's Java Diary
Sep 21, 2023 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall architecture, key components such as brokers, producers, consumers, topics, partitions, replicas, and ZooKeeper, as well as logical and physical storage mechanisms, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, and the replication model that ensures data reliability.

KafkaMessage QueuesReliability
0 likes · 18 min read
Understanding Kafka: Core Concepts, Architecture, and Reliability Explained
Architects Research Society
Architects Research Society
Jan 31, 2023 · Big Data

Understanding Kafka Schema Registry Compatibility Types and Schema Evolution

This article explains how Kafka's Schema Registry manages schema evolution and compatibility types—backward, forward, transitive, and full—using Avro schemas, demonstrates the impact of field additions or deletions on producers and consumers, and shows how to change a topic's compatibility setting via REST API.

AVROCompatibilityKafka
0 likes · 13 min read
Understanding Kafka Schema Registry Compatibility Types and Schema Evolution
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
IT Architects Alliance
IT Architects Alliance
Oct 9, 2022 · Backend Development

Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation

The article describes how Wix engineers built a robust, Kafka‑based event‑driven messaging infrastructure for over 1,400 microservices, detailing patterns such as consumption and projection, end‑to‑end streaming with websockets, in‑memory KV stores, schedule‑and‑forget jobs, exactly‑once transactions, and event aggregation to achieve scalability, resilience, and low‑latency data access.

Kafkadata streamingdistributed systems
0 likes · 16 min read
Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation
DataFunTalk
DataFunTalk
Jul 31, 2022 · Big Data

Design, Evolution, and Optimization of NetEase's Log Collection and Transmission Service (Datastream‑NG)

This article presents a comprehensive overview of NetEase's log collection and transmission platform, detailing its evolution from 2011 to the current Datastream‑NG architecture, the system's design goals, core component optimizations, operational monitoring, and future plans for intelligent scaling and diagnostics.

Big DataLog Collectioncloud native
0 likes · 23 min read
Design, Evolution, and Optimization of NetEase's Log Collection and Transmission Service (Datastream‑NG)
Tencent Tech
Tencent Tech
Jun 23, 2022 · Big Data

Why Apache InLong’s Graduation Marks a New Era for Big Data Integration

Apache InLong, originally contributed by Tencent, has graduated to an Apache top‑level project, offering a one‑stop framework for petabyte‑scale data ingestion, processing, and reliable streaming, and is now widely adopted across advertising, payment, social, gaming, and AI industries.

ApacheInLongOpen-source
0 likes · 5 min read
Why Apache InLong’s Graduation Marks a New Era for Big Data Integration
DataFunTalk
DataFunTalk
May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataFlink
0 likes · 18 min read
Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome
Architecture Digest
Architecture Digest
Mar 25, 2021 · Big Data

Uber's Multi-Region Kafka Architecture and Disaster Recovery

This article explains how Uber built a multi‑region Kafka infrastructure with disaster‑recovery capabilities, detailing its replication topology, active/active and active/passive consumption modes, offset‑management service, and the challenges of ensuring reliable, low‑latency data streaming across regions.

KafkaOffset Managementdata streaming
0 likes · 9 min read
Uber's Multi-Region Kafka Architecture and Disaster Recovery
Architecture Digest
Architecture Digest
Jan 19, 2020 · Big Data

Why Kafka Is So Fast: Sequential Writes, Memory‑Mapped Files, and Zero‑Copy

This article explains how Kafka achieves high throughput by using sequential disk writes, memory‑mapped files, batch compression, and zero‑copy sendfile for reads, while also covering data retention policies and the role of offsets in consumer processing.

Big DataKafkaMemory-Mapped Files
0 likes · 10 min read
Why Kafka Is So Fast: Sequential Writes, Memory‑Mapped Files, and Zero‑Copy
Efficient Ops
Efficient Ops
Mar 20, 2017 · Big Data

How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform

This article details eBay's year‑long development of an enterprise‑grade, Kafka‑driven data transmission platform, covering its architecture, core services, monitoring and automation strategies, as well as performance tuning techniques that enable high throughput, low latency, and reliable cross‑data‑center replication.

KafkaReal-time Processingdata streaming
0 likes · 22 min read
How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform