Tagged articles
6 articles
Page 1 of 1
AI Waka
AI Waka
May 22, 2026 · Artificial Intelligence

How Skills Can Cut Costs and Speed Up High‑Quality LLM Data Pipelines

The article explains how the open‑source DataFlow‑Skills framework lets LLM agents plan, validate, and execute data cleaning and synthesis pipelines with strict field contracts and specialized operators, dramatically reducing costly failures and accelerating high‑quality training data production.

AI data engineeringDataFlowLLM data pipelines
0 likes · 15 min read
How Skills Can Cut Costs and Speed Up High‑Quality LLM Data Pipelines
21CTO
21CTO
Jan 24, 2025 · Fundamentals

Why Traditional Code Fails on Multicore CPUs and How Dataflow Languages Help

The article explains that despite decades of programming following the Von Neumann model, modern multicore processors expose limitations of sequential code, illustrates this with simple examples in Python and Go, and proposes data‑flow programming—exemplified by the experimental Nevalang language—as a more natural, parallel‑friendly paradigm.

DataFlowNevalangProgramming Paradigms
0 likes · 5 min read
Why Traditional Code Fails on Multicore CPUs and How Dataflow Languages Help
21CTO
21CTO
Jul 15, 2024 · Big Data

Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events

Twitter migrated from a Lambda-based dual‑pipeline system to a Kappa architecture that relies on a single real‑time stream using Kafka, Google Pub/Sub, Dataflow, and BigTable, dramatically reducing latency, increasing throughput, and improving data accuracy for processing billions of daily events.

Big DataCloud ComputingDataFlow
0 likes · 9 min read
Twitter’s Kappa Architecture: Scaling Real-Time Processing of Billions of Events
Programmer DD
Programmer DD
Dec 9, 2020 · Big Data

Master Apache Beam: Build a Portable Word Count Pipeline in Minutes

This tutorial introduces Apache Beam’s unified programming model for batch and streaming, explains its core concepts and terminology, compares it with other runners, and walks through a complete Java word‑count example—including dependencies, pipeline construction, transforms, and execution with DirectRunner.

Apache BeamDataFlowDistributed Processing
0 likes · 8 min read
Master Apache Beam: Build a Portable Word Count Pipeline in Minutes
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataFlow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics