Topic

batch processing

Collection size
120 articles
Page 5 of 6
Top Architect
Top Architect
Jan 17, 2021 · Big Data

Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture

This article describes how LinkedIn’s Who Viewed Your Profile feature was originally built on a Lambda architecture, the operational challenges it caused, and the step‑by‑step migration to a streamlined, Samza‑driven, Lambda‑less design that improves performance, reduces maintenance overhead, and retains essential batch capabilities.

Batch ProcessingLinkedInPinot
0 likes · 11 min read
Migrating LinkedIn’s Who Viewed Your Profile System from Lambda Architecture to a Lambda‑less Architecture
Architect's Tech Stack
Architect's Tech Stack
Apr 29, 2024 · Databases

Performance Evaluation of Inserting Billion-Scale Data into MySQL Using MyBatis, JDBC, and Batch Processing

This article presents a comprehensive performance test of inserting massive amounts of randomly generated person records into MySQL, comparing three strategies—MyBatis lightweight insertion, direct JDBC handling, and JDBC batch processing—both with and without transactions, and concludes that combining batch processing with transactions yields the fastest insertion speed for large‑scale data loads.

Batch ProcessingJDBCLarge Data Insertion
0 likes · 13 min read
Performance Evaluation of Inserting Billion-Scale Data into MySQL Using MyBatis, JDBC, and Batch Processing
Architect's Tech Stack
Architect's Tech Stack
Sep 11, 2023 · Databases

Performance Evaluation of Large-Scale Data Insertion into MySQL Using MyBatis, JDBC, and Batch Processing

This article presents a systematic performance test of inserting massive data into MySQL, comparing three strategies—MyBatis lightweight insertion, direct JDBC (with and without transactions), and JDBC batch processing—showing how transaction handling and batch execution dramatically affect insertion speed.

Batch ProcessingJDBCJava
0 likes · 15 min read
Performance Evaluation of Large-Scale Data Insertion into MySQL Using MyBatis, JDBC, and Batch Processing
Architect's Tech Stack
Architect's Tech Stack
Oct 9, 2022 · Backend Development

Spring Batch Overview: Architecture, Core Concepts, and Practical Usage

This article provides a comprehensive introduction to Spring Batch, covering its purpose for large‑scale data processing, overall architecture, key concepts such as Job, Step, ItemReader/Writer/Processor, chunk processing, skip policies, and practical configuration examples with Java code.

Batch ProcessingJavaSpring Batch
0 likes · 17 min read
Spring Batch Overview: Architecture, Core Concepts, and Practical Usage
Architect's Tech Stack
Architect's Tech Stack
Jul 31, 2022 · Backend Development

Comprehensive Introduction to Spring Batch: Architecture, Core Concepts, and Best Practices

This article provides a detailed overview of Spring Batch, covering its purpose, architecture, core concepts such as Job, Step, ItemReader/Writer/Processor, execution flow, chunk processing, skip/failed handling, and practical tips for building robust Java batch applications.

Batch ProcessingChunk ProcessingJava
0 likes · 19 min read
Comprehensive Introduction to Spring Batch: Architecture, Core Concepts, and Best Practices
DataFunSummit
DataFunSummit
May 17, 2024 · Big Data

Comprehensive Hudi Real-Time Data Lake Ingestion Solutions

This article presents a complete guide to Hudi-based real-time data lake ingestion, covering overall data integration architecture, batch and streaming ingestion strategies, advanced table design, and practical recommendations for handling challenges such as deduplication, latency, partitioning, and performance optimization.

Batch ProcessingBig DataHudi
0 likes · 12 min read
Comprehensive Hudi Real-Time Data Lake Ingestion Solutions
DataFunSummit
DataFunSummit
Apr 7, 2024 · Big Data

Li Auto’s Flink on Kubernetes Data Integration Practice

This article presents Li Auto’s end‑to‑end data integration journey, detailing the evolution of its data platform, the challenges of heterogeneous sources, and how a unified Flink‑on‑K8s solution with cloud‑native architecture, operator management, monitoring, and checkpointing addresses batch‑stream convergence and future scalability.

Batch ProcessingBig DataCloud Native
0 likes · 12 min read
Li Auto’s Flink on Kubernetes Data Integration Practice
DataFunSummit
DataFunSummit
Apr 28, 2023 · Big Data

Building a Unified Streaming‑Batch Storage Architecture at Xiaohongshu

This article presents Xiaohongshu's design and implementation of a unified streaming‑batch storage system that integrates Lambda architecture, Kafka, Flink, Iceberg, and modern OLAP engines to solve real‑time data warehouse pain points and enable consistent, exactly‑once analytics across streaming and batch workloads.

Batch ProcessingBig DataFlink
0 likes · 16 min read
Building a Unified Streaming‑Batch Storage Architecture at Xiaohongshu
DataFunSummit
DataFunSummit
Jan 8, 2023 · Big Data

Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis

This article presents a comprehensive overview of evolving big‑data architectures—from classic offline warehouses to Lambda and Kappa models—and details a streaming‑batch integrated solution that addresses latency, data freshness, and multi‑table join challenges to achieve minute‑level real‑time multi‑dimensional analytics.

Batch ProcessingBig DataKappa architecture
0 likes · 18 min read
Streaming‑Batch Integrated Real‑time Multi‑dimensional Analysis
DataFunSummit
DataFunSummit
Nov 23, 2022 · Big Data

Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design

The article introduces the Lakehouse Analysis Service (LAS), explains its layered architecture that unifies data lake and warehouse capabilities, discusses challenges with Apache Hudi metadata and consistency, and details the design of the unified MetaServer, Table Management Service, concurrency control, async compaction, event bus, and future roadmap.

Apache HudiBatch ProcessingCloud Native
0 likes · 18 min read
Lakehouse Analysis Service (LAS): Architecture, Challenges, and Service Design
DataFunSummit
DataFunSummit
Nov 17, 2020 · Big Data

Sohu Intelligent Media Data Warehouse Architecture and Technical Practices

This article presents Sohu Intelligent Media's data warehouse construction practice, covering fundamental concepts, batch and real‑time processing, OLAP theory, multidimensional modeling, workflow management, data quality, metadata lineage, and security, with a focus on Apache Doris and a Lambda‑style architecture.

Apache DorisBatch ProcessingOLAP
0 likes · 18 min read
Sohu Intelligent Media Data Warehouse Architecture and Technical Practices
DataFunTalk
DataFunTalk
Dec 22, 2023 · Big Data

Practical Implementation of Flink on Kubernetes for Data Integration at Li Auto

This article details Li Auto's end‑to‑end data integration practice using Flink on Kubernetes, covering the evolution of their integration platform, architectural design, cloud‑native deployment, operational challenges, and future roadmap, while highlighting unified batch‑stream processing and resource elasticity.

Batch ProcessingBig DataCloud Native
0 likes · 12 min read
Practical Implementation of Flink on Kubernetes for Data Integration at Li Auto
DataFunTalk
DataFunTalk
Dec 18, 2023 · Big Data

Unified Data Architecture: Balancing Freshness, Cost, and Performance with Incremental Computing

The article explains why unified data architecture is essential to avoid duplication and inefficiency, discusses differing performance trade‑offs among batch, streaming, and interactive analytics, introduces an incremental computation model that unifies these modes, and invites readers to a Dec 19, 2023 technical sharing event.

Batch ProcessingBig DataIncremental Computing
0 likes · 3 min read
Unified Data Architecture: Balancing Freshness, Cost, and Performance with Incremental Computing
DataFunTalk
DataFunTalk
Mar 12, 2023 · Big Data

Apache Kyuubi 1.6.0 Feature Overview and Enhancements

The article provides a comprehensive walkthrough of Apache Kyuubi 1.6.0, detailing server‑side enhancements such as batch (JAR) task submission, metadata store and unified API/authentication, client‑side improvements to the built‑in JDBC driver and Beeline, as well as engine plugins for Spark, Flink, Trino and Hive, and concludes with the community’s roadmap and statistics.

Apache KyuubiBatch ProcessingBig Data
0 likes · 12 min read
Apache Kyuubi 1.6.0 Feature Overview and Enhancements
DataFunTalk
DataFunTalk
Feb 21, 2023 · Databases

Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB

This article details how SelectDB’s data technology architect designed and implemented a new stream‑batch unified data platform using Apache Doris, covering the shortcomings of the early CDH‑based architecture, the selection process, data modeling, ingestion pipelines, performance testing, operational optimizations, and future plans.

Apache DorisBatch ProcessingBig Data
0 likes · 17 min read
Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB
DataFunTalk
DataFunTalk
Feb 2, 2023 · Big Data

SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap

This article provides a comprehensive overview of Apache SeaTunnel, covering its design objectives, current capabilities such as multi‑engine support and extensive connector ecosystem, detailed architecture including engine‑independent APIs and execution flows, and outlines the upcoming roadmap to expand connectors, launch a visual web UI, and introduce a dedicated SeaTunnel Engine.

ApacheBatch ProcessingBig Data
0 likes · 12 min read
SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap
DataFunTalk
DataFunTalk
Aug 25, 2022 · Big Data

Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku

This article presents Akulaku’s practical experience with OpenMLDB, describing the company’s data‑driven requirements, the design of a unified stream‑batch architecture, implementation details across offline, online and RocksDB modes, and future recommendations for high‑performance, scenario‑agnostic big‑data processing.

AIBatch ProcessingBig Data
0 likes · 17 min read
Applying OpenMLDB for Efficient AI Toolchain and Data‑Driven Architecture at Akulaku
DataFunTalk
DataFunTalk
May 23, 2022 · Big Data

Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions

ByteDance shares its real‑time data lake implementation, covering the evolving definition of data lakes, six core capabilities, challenges such as data management, weak concurrent updates, performance, and log ingestion, and detailed solutions including Hudi Metastore Server, bucket indexing, multi‑stage use cases, and future roadmap.

Batch ProcessingBig DataHudi
0 likes · 32 min read
Real-Time Data Lake Practices at ByteDance: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataFlink
0 likes · 14 min read
Building and Evolving Zhihu’s Flink‑Based Data Integration Platform
DataFunTalk
DataFunTalk
Mar 28, 2021 · Big Data

Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance

This article explains how Apache Flink has evolved into a unified stream‑batch engine by introducing a three‑layer architecture, a unified DataStream SDK, a pipeline‑region‑based DAG scheduler, a common shuffle framework, and enhanced fault‑tolerance mechanisms to address efficiency, consistency, and resource‑utilisation challenges in real‑time big‑data processing.

Apache FlinkBatch ProcessingDAG Scheduler
0 likes · 25 min read
Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance