Tagged articles

34 articles

Page 1 of 1

Nov 6, 2025 · Cloud Native

How Tencent Music Cut Kafka Costs by 50% with Cloud‑Native AutoMQ

Tencent Music migrated its massive Kafka streaming infrastructure to the cloud‑native AutoMQ platform, slashing operational costs by over half, achieving second‑level partition migration, and dramatically improving scaling efficiency while maintaining high‑throughput, low‑latency data processing for its music services.

AutoMQData StreamingKafka

0 likes · 16 min read

How Tencent Music Cut Kafka Costs by 50% with Cloud‑Native AutoMQ

High Availability Architecture

Oct 30, 2025 · Operations

How Tencent Music Cut Kafka Costs by 50% with Cloud‑Native AutoMQ

Tencent Music replaced its traditional Kafka clusters with the cloud‑native AutoMQ platform, slashing infrastructure costs by over half, achieving second‑level partition migration, and dramatically simplifying operations while maintaining high‑throughput, low‑latency data streams for its massive music services.

AutoMQCloud NativeData Streaming

0 likes · 17 min read

Sanyou's Java Diary

Dec 2, 2024 · Big Data

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall structure, key components such as brokers, producers, consumers, topics, partitions, replicas, leader‑follower mechanics, logical and physical storage models, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, log retention and compaction, indexing, zero‑copy transmission, and the reliability concepts that ensure data durability.

Data StreamingDistributed SystemsKafka

0 likes · 18 min read

Understanding Kafka: Core Architecture, Storage, and Reliability Explained

Big Data Technology & Architecture

Sep 7, 2024 · Big Data

Answering Interview Questions on Binlog Loss and Recovery Using Flink CDC

The article explains how to prepare for interview "if" questions about Binlog loss by describing Flink CDC's binlog extraction principles, possible recovery mechanisms, and practical strategies such as resetting offsets, extending log retention, and building offline‑online reconciliation pipelines.

Big DataData StreamingFlink CDC

0 likes · 4 min read

Answering Interview Questions on Binlog Loss and Recovery Using Flink CDC

Code Mala Tang

Jul 5, 2024 · Frontend Development

Master TransformStream: Real-World Uses, Code Samples, and Common Pitfalls

TransformStream, a core component of the Streams API, enables developers to process and convert data chunks on the fly, offering examples ranging from simple text uppercase conversion to complex scenarios like compression, video transcoding, real-time IoT filtering, and handling common pitfalls such as errors and backpressure.

Data StreamingNode.jsTransformStream

0 likes · 13 min read

Master TransformStream: Real-World Uses, Code Samples, and Common Pitfalls

Java Architect Essentials

Jun 26, 2024 · Databases

Why Organizations Should Consider Using Apache Kafka Instead of Relational Databases

This article explains why organizations may replace traditional relational databases with Apache Kafka as a system of record, highlighting Kafka's economic, scalable, immutable log capabilities, event replay, flexibility for diverse use cases, and its suitability for highly regulated, data‑intensive environments.

Data StreamingDatabaseEvent-Driven Architecture

0 likes · 10 min read

Why Organizations Should Consider Using Apache Kafka Instead of Relational Databases

Big Data Technology Architecture

Nov 28, 2023 · Big Data

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

This article demonstrates, with step‑by‑step examples, how to capture MySQL changes via Flink CDC and stream them in real time into Apache Doris using the Doris Flink Connector, covering CDC concepts, connector features, environment setup, SQL client usage, and data verification.

Apache DorisCDCConnector

0 likes · 13 min read

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

Volcano Engine Developer Services

Nov 16, 2023 · Big Data

Why Replace Logstash with Flink? Boost Log Processing Performance and Reliability

This article examines the shortcomings of Logstash in log collection—data loss, poor performance, high troubleshooting cost, and lack of dynamic scaling—and demonstrates how migrating to Flink can provide at‑least‑once semantics, flexible error handling, high‑throughput low‑latency processing, automatic resource scaling, and advanced analytics within the ELK ecosystem.

Data StreamingELKFlink

0 likes · 9 min read

Why Replace Logstash with Flink? Boost Log Processing Performance and Reliability

21CTO

Oct 4, 2023 · Artificial Intelligence

How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps

LangStream, the new open‑source framework from DataStax, combines event‑driven data streaming with generative AI, offering seamless integration with vector databases like Astra DB, Milvus, and Pinecone, and providing a Kubernetes‑based runtime that enables real‑time LLM applications without extensive coding.

Data StreamingKubernetesLLM

0 likes · 7 min read

How LangStream Merges Data Streams with Generative AI for Real‑Time LLM Apps

Sanyou's Java Diary

Sep 21, 2023 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall architecture, key components such as brokers, producers, consumers, topics, partitions, replicas, and ZooKeeper, as well as logical and physical storage mechanisms, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, and the replication model that ensures data reliability.

Data StreamingDistributed SystemsKafka

0 likes · 18 min read

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

Baidu Geek Talk

Sep 18, 2023 · Big Data

How Real‑Time Interception and Bitmap UV Calculation Boost Mobile App Quality

This article explains how a performance middle‑platform for mobile apps uses real‑time change interception, unique color IDs, bitmap‑based UV counting, exception de‑obfuscation, and a multi‑stage data pipeline to detect and isolate problems early, reduce user impact, and improve overall app reliability.

CachingData Streamingbitmap UV

0 likes · 21 min read

How Real‑Time Interception and Bitmap UV Calculation Boost Mobile App Quality

Architects Research Society

Jan 31, 2023 · Big Data

Understanding Kafka Schema Registry Compatibility Types and Schema Evolution

This article explains how Kafka's Schema Registry manages schema evolution and compatibility types—backward, forward, transitive, and full—using Avro schemas, demonstrates the impact of field additions or deletions on producers and consumers, and shows how to change a topic's compatibility setting via REST API.

AvroData StreamingKafka

0 likes · 13 min read

Understanding Kafka Schema Registry Compatibility Types and Schema Evolution

DataFunTalk

Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture

0 likes · 13 min read

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

21CTO

Nov 1, 2022 · Backend Development

Inside Netflix’s Scalable Backend: Microservices, CDN, and Data Pipelines

This article dissects Netflix’s massive backend system—covering its dual‑cloud deployment, Open Connect CDN, micro‑service architecture, API gateway, container platform, caching layers, data stores, and real‑time streaming pipelines—to reveal how the streaming giant achieves extreme scalability, reliability, and performance.

Cloud NativeData StreamingMicroservices

0 likes · 16 min read

Inside Netflix’s Scalable Backend: Microservices, CDN, and Data Pipelines

IT Architects Alliance

Oct 9, 2022 · Backend Development

Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation

The article describes how Wix engineers built a robust, Kafka‑based event‑driven messaging infrastructure for over 1,400 microservices, detailing patterns such as consumption and projection, end‑to‑end streaming with websockets, in‑memory KV stores, schedule‑and‑forget jobs, exactly‑once transactions, and event aggregation to achieve scalability, resilience, and low‑latency data access.

Data StreamingDistributed SystemsEvent-Driven Architecture

0 likes · 16 min read

Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation

ITFLY8 Architecture Home

Sep 29, 2022 · Backend Development

Scaling Event‑Driven Messaging at Wix with Kafka: Key Patterns

This article explains how Wix uses Kafka‑based event‑driven messaging to decouple microservices, improve scalability, and achieve exactly‑once processing through patterns such as consume‑and‑project, end‑to‑end event streams, in‑memory KV stores, scheduled jobs, transactional events, and event aggregation.

Data StreamingDistributed SystemsEvent-Driven Architecture

0 likes · 16 min read

Scaling Event‑Driven Messaging at Wix with Kafka: Key Patterns

SQB Blog

Sep 22, 2022 · Big Data

How We Built a Low‑Latency Advertising Billing System with Kafka Streams

This article describes the design, implementation, and performance of ShouQianBa's advertising billing system, detailing the migration from Apache Druid to Kafka Streams, the architecture for real‑time event processing, data aggregation, persistence, fault tolerance, and the achieved low‑latency, high‑throughput metrics.

AdvertisingData StreamingReal-time Billing

0 likes · 15 min read

How We Built a Low‑Latency Advertising Billing System with Kafka Streams

Ops Development Stories

Aug 19, 2022 · Big Data

Master Kafka: From Core Concepts to Advanced Operations and Performance Tuning

This comprehensive guide explains Kafka’s origins, core architecture, data structures, write and read workflows, operational commands for topic and consumer‑group management, and practical performance‑tuning tips such as disk layout, JVM settings, flush policies, and log retention, providing a complete reference for engineers working with distributed streaming platforms.

Data StreamingDistributed MessagingKafka

0 likes · 32 min read

Master Kafka: From Core Concepts to Advanced Operations and Performance Tuning

DataFunTalk

Jul 31, 2022 · Big Data

Design, Evolution, and Optimization of NetEase's Log Collection and Transmission Service (Datastream‑NG)

This article presents a comprehensive overview of NetEase's log collection and transmission platform, detailing its evolution from 2011 to the current Datastream‑NG architecture, the system's design goals, core component optimizations, operational monitoring, and future plans for intelligent scaling and diagnostics.

Big DataCloud NativeData Streaming

0 likes · 23 min read

Design, Evolution, and Optimization of NetEase's Log Collection and Transmission Service (Datastream‑NG)

Alibaba Cloud Native

Jul 17, 2022 · Cloud Native

Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS

This article explains Change Data Capture (CDC) concepts, compares open‑source CDC tools, and shows how to leverage Alibaba Cloud EventBridge and DTS to build real‑time CDC pipelines, covering setup steps, event‑bus vs event‑stream choices, best‑practice scenarios such as CQRS, microservice decoupling, database backup, and SQL auditing.

CDCCloud NativeDTS

0 likes · 12 min read

Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS

Tencent Tech

Jun 23, 2022 · Big Data

Why Apache InLong’s Graduation Marks a New Era for Big Data Integration

Apache InLong, originally contributed by Tencent, has graduated to an Apache top‑level project, offering a one‑stop framework for petabyte‑scale data ingestion, processing, and reliable streaming, and is now widely adopted across advertising, payment, social, gaming, and AI industries.

ApacheBig Data IntegrationData Streaming

0 likes · 5 min read

Why Apache InLong’s Graduation Marks a New Era for Big Data Integration

Big Data Technology & Architecture

Mar 15, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based CDC, explains Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including full Java code—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse

0 likes · 17 min read

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

Big Data Technology & Architecture

Feb 16, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based approaches, explains Debezium and ClickHouse, and provides detailed Flink CDC and Flink SQL CDC examples—including Java source code, custom deserialization schema, ClickHouse sink implementation, and required Maven dependencies—to synchronize MySQL data into ClickHouse in real time.

Big DataCDCClickHouse

0 likes · 17 min read

Big Data Technology & Architecture

Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse

0 likes · 14 min read

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

MaGe Linux Operations

Jun 3, 2021 · Big Data

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

This article introduces Kafka, LinkedIn’s high‑throughput distributed messaging system, explains its core concepts such as brokers, topics, partitions, offsets, producers, consumers, and consumer groups, outlines common use cases like asynchronous decoupling and data‑stream processing, and details its fast performance mechanisms, fault‑tolerance, installation, and configuration steps.

Big DataData StreamingInstallation

0 likes · 11 min read

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

DataFunTalk

May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming

0 likes · 18 min read

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

Architecture Digest

Mar 25, 2021 · Big Data

Uber's Multi-Region Kafka Architecture and Disaster Recovery

This article explains how Uber built a multi‑region Kafka infrastructure with disaster‑recovery capabilities, detailing its replication topology, active/active and active/passive consumption modes, offset‑management service, and the challenges of ensuring reliable, low‑latency data streaming across regions.

Data StreamingKafkaOffset Management

0 likes · 9 min read

Uber's Multi-Region Kafka Architecture and Disaster Recovery

Architecture Digest

Jan 19, 2020 · Big Data

Why Kafka Is So Fast: Sequential Writes, Memory‑Mapped Files, and Zero‑Copy

This article explains how Kafka achieves high throughput by using sequential disk writes, memory‑mapped files, batch compression, and zero‑copy sendfile for reads, while also covering data retention policies and the role of offsets in consumer processing.

Data StreamingHigh ThroughputMemory Mapped Files

0 likes · 10 min read

Why Kafka Is So Fast: Sequential Writes, Memory‑Mapped Files, and Zero‑Copy

Mafengwo Technology

Jan 2, 2020 · Big Data

How We Scaled Kafka for Real‑Time Big Data at Mafengwo: Lessons and Practices

This article details Mafengwo's practical experience using Kafka within its big‑data platform, covering application scenarios, evolution through version upgrades, resource isolation, security and monitoring enhancements, and future plans for data duplication handling and consumer throttling.

Big DataData StreamingKafka

0 likes · 16 min read

How We Scaled Kafka for Real‑Time Big Data at Mafengwo: Lessons and Practices

dbaplus Community

Oct 15, 2019 · Big Data

How to Build Real‑Time Data Pipelines for E‑Commerce Promotions

This article examines the surge in real‑time data demands for e‑commerce promotions, outlines how to collect, compute, and deliver streaming data, compares batch and stream processing, lists typical use cases, and discusses the challenges of building scalable, low‑latency pipelines.

Data StreamingMonitoringreal-time

0 likes · 11 min read

How to Build Real‑Time Data Pipelines for E‑Commerce Promotions

21CTO

Jul 20, 2017 · Backend Development

How Ctrip Built a Real-Time User Data Collection System with Netty and Kafka

This article details Ctrip's design and implementation of a high‑throughput, low‑latency user data collection platform that leverages Java NIO, Netty, and a custom Kafka‑based messaging layer, covering architecture, encryption, compression, disaster‑recovery, performance testing, and downstream analytics products.

AvroBackend ArchitectureData Streaming

0 likes · 17 min read

How Ctrip Built a Real-Time User Data Collection System with Netty and Kafka

StarRing Big Data Open Lab

Mar 21, 2017 · Big Data

How Real-Time Data Streaming Is Transforming Industries Today

This article explains how real‑time data streaming turns massive, continuously growing datasets into actionable insights across finance, energy, and e‑commerce, showcasing early adopters like ConocoPhillips and DHL while urging businesses to rethink models for the next wave of data management.

Big DataData Streamingindustry use cases

0 likes · 7 min read

How Real-Time Data Streaming Is Transforming Industries Today

Efficient Ops

Mar 20, 2017 · Big Data

How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform

This article details eBay's year‑long development of an enterprise‑grade, Kafka‑driven data transmission platform, covering its architecture, core services, monitoring and automation strategies, as well as performance tuning techniques that enable high throughput, low latency, and reliable cross‑data‑center replication.

Data StreamingKafkaReal-time Processing

0 likes · 22 min read

How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform

Java High-Performance Architecture

Feb 25, 2016 · Backend Development

Understanding Kafka: Architecture, Use Cases, and Core Components

This article explains Kafka's high‑throughput distributed messaging architecture, its typical scenarios such as log and operational data collection, the roles of producers, brokers, topics, consumers, Zookeeper, and provides a practical example of detecting abnormal user transactions.

Data StreamingKafkaLog Processing

0 likes · 3 min read

Understanding Kafka: Architecture, Use Cases, and Core Components