Tag

data pipeline

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Apr 9, 2025 · Operations

Proactive Alerting System for Taobao Special Edition: Design, Scope, and Solutions

The article outlines the design and implementation of a proactive alerting system for Taobao Special Edition, covering five alert categories—slot expiration, rights issues, configuration platforms, experiment audience expiration, and public‑opinion problems—detailing data‑driven rule engines, flexible integration, and successful 24‑hour inventory alerts while planning minute‑level rapid‑consumption warnings.

Alert Systemdata pipelinee-commerce
0 likes · 7 min read
Proactive Alerting System for Taobao Special Edition: Design, Scope, and Solutions
JD Tech Talk
JD Tech Talk
Mar 12, 2025 · Big Data

Ensuring Stability of the Double‑11 Supply Chain Dashboard: Full‑Chain Process, Risk Points, and Technical Safeguard Strategies

This article details how the supply‑chain big‑screen dashboard for Double‑11 maintains high stability by mapping the full data‑flow, identifying risk points across ingestion, processing, storage and service layers, and applying comprehensive technical safeguards such as high‑availability design, fault‑tolerance, monitoring, and coordinated operational procedures.

Dashboardbig datadata pipeline
0 likes · 11 min read
Ensuring Stability of the Double‑11 Supply Chain Dashboard: Full‑Chain Process, Risk Points, and Technical Safeguard Strategies
Bilibili Tech
Bilibili Tech
Mar 4, 2025 · Artificial Intelligence

Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team

The Bilibili TTV team optimized OpenSora and CogVideoX text‑to‑video models by redesigning data storage with Alluxio, parallelizing VAE encoding, applying dynamic sequence‑parallel and DeepSpeed‑Ulysses attention, adapting GPU code for NPU execution, leveraging profiling‑driven kernel fusion, FlashAttention, and expandable memory to dramatically increase training efficiency and frame throughput, while outlining future pipeline‑parallel and ZeRO‑3 scaling plans.

Diffusion TransformerFlashAttentionNPU
0 likes · 26 min read
Engineering Practices and Optimizations for Text‑to‑Video Generation Models (OpenSora, CogVideoX) on Bilibili TTV Team
JD Tech Talk
JD Tech Talk
Feb 5, 2025 · Databases

Optimizing Query Performance and Data Architecture for JD BIP Procurement System

This article details how JD’s BIP procurement system tackled massive data volume and complex query performance issues by reducing slow SQL, partitioning “in‑stock” orders, migrating large datasets to Elasticsearch, implementing dynamic query routing, and establishing robust monitoring, resulting in a 92% data reduction and smoother operations.

big datadata pipelinedatabases
0 likes · 11 min read
Optimizing Query Performance and Data Architecture for JD BIP Procurement System
Airbnb Technology Team
Airbnb Technology Team
Jan 24, 2025 · Artificial Intelligence

Chronon — An Open-Source Framework for Production-Level Feature Engineering in Machine Learning

Chronon is an open‑source framework that centralizes feature definitions to guarantee training‑inference consistency, eliminates complex ETL pipelines, and supports real‑time and batch processing across diverse data sources, cutting feature‑development cycles from months to under a week, as demonstrated by Airbnb’s 40,000‑feature deployment.

ChrononHiveSpark
0 likes · 10 min read
Chronon — An Open-Source Framework for Production-Level Feature Engineering in Machine Learning
Test Development Learning Exchange
Test Development Learning Exchange
Dec 1, 2024 · Big Data

How to Install Apache Airflow and Build a Simple Data Processing Pipeline

This tutorial guides you through installing Apache Airflow, initializing its database, starting the web server and scheduler, creating a Python DAG that reads, cleans, groups, and saves CSV data, configuring the DAG directory, and monitoring the pipeline via the Airflow web UI.

Apache AirflowDAGETL
0 likes · 6 min read
How to Install Apache Airflow and Build a Simple Data Processing Pipeline
DaTaobao Tech
DaTaobao Tech
Nov 15, 2024 · Big Data

Engineering Practices for a Billion‑Scale Image Asset Platform

The article recounts how the author built a billion‑scale AI image‑asset library by replacing a week‑long import with a clustered‑table, sharded pipeline, MD5‑based unique keys, a custom DataWorks task scheduler, and multi‑engine query layers, sharing practical engineering practices learned through successive iterations.

HashingImage Processingbig data
0 likes · 14 min read
Engineering Practices for a Billion‑Scale Image Asset Platform
Bilibili Tech
Bilibili Tech
Nov 12, 2024 · Big Data

Scalable Tag System Architecture and Optimization

The rebuilt tag system introduces a three‑layer architecture, standard pipelines, Iceberg‑backed storage and custom ClickHouse sharding, a DSL for crowd selection, and a stateless online service, achieving 99.9% success, sub‑5 ms latency, and supporting thousands of tags across dozens of business scenarios while planning real‑time processing and automated lifecycle management.

ClickHouseIcebergSpark
0 likes · 23 min read
Scalable Tag System Architecture and Optimization
Sohu Tech Products
Sohu Tech Products
Nov 6, 2024 · Operations

Design and Implementation of a Business Operation Log Management System Using Canal and Elasticsearch

The article presents a decoupled business operation log management architecture that uses Alibaba’s Canal to capture MySQL binlog changes, streams them through Kafka, and stores structured before‑and‑after records in Elasticsearch with nested mappings, enabling multi‑table correlation via transaction IDs, visual querying, and reliable rollback without modifying application code.

CanalKafkaMySQL binlog
0 likes · 12 min read
Design and Implementation of a Business Operation Log Management System Using Canal and Elasticsearch
Ctrip Technology
Ctrip Technology
Sep 23, 2024 · Frontend Development

Intelligent Alert Attribution System for Ctrip Hotel Frontend: Design, Implementation, and Outcomes

This article details the design and deployment of an intelligent alert attribution system for Ctrip Hotel's front‑end, describing the background challenges, the unified data pool, weighted alert rules, three attribution algorithms, achieved improvements in accuracy and troubleshooting speed, and future enhancement plans.

AlertAttributiondata pipeline
0 likes · 18 min read
Intelligent Alert Attribution System for Ctrip Hotel Frontend: Design, Implementation, and Outcomes
DevOps Operations Practice
DevOps Operations Practice
Sep 1, 2024 · Operations

Understanding Logstash: Core Syntax, Filters, and Advanced Configuration

This article introduces Logstash’s core configuration syntax, explains key filter plugins such as grok, mutate, date, ruby, and aggregate, demonstrates conditional processing and multi‑event handling, and provides practical code examples to help readers efficiently parse, transform, and route log data.

ELKFiltersLogstash
0 likes · 6 min read
Understanding Logstash: Core Syntax, Filters, and Advanced Configuration
Ctrip Technology
Ctrip Technology
Aug 22, 2024 · Backend Development

Evolution of Ctrip Vacation Product Log System: From Single‑Table DB to ES + HBase Platform

This article details the three‑stage evolution of Ctrip's vacation product log system—from a simple single‑table DB approach, through a platform‑based ES + HBase solution, to a scalable V3.0 architecture that improves storage, search, and business empowerment while handling billions of log entries.

HBasebackendbig data
0 likes · 16 min read
Evolution of Ctrip Vacation Product Log System: From Single‑Table DB to ES + HBase Platform
Top Architect
Top Architect
Aug 10, 2024 · Big Data

Design and Implementation of a Scalable Real-Time Log Monitoring Platform at Baidu

This article introduces Baidu's log platform that handles billions of daily events, explains UBC logging concepts and monitoring requirements, and details a low‑cost, high‑accuracy architecture using real‑time streaming, dimension mapping, watermarking, and time‑window aggregation to achieve reliable, scalable event monitoring.

Real-time StreamingUBCWatermark
0 likes · 14 min read
Design and Implementation of a Scalable Real-Time Log Monitoring Platform at Baidu
JD Tech Talk
JD Tech Talk
Jul 3, 2024 · Big Data

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

This article describes the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard using Flink and ClickHouse, covering data processing pipelines, metric consistency, stability mechanisms, extensible configurations, and monitoring techniques to guide similar large‑screen projects.

ClickHouseReal-time Dashboardbig data
0 likes · 9 min read
Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices
JD Tech
JD Tech
Jul 2, 2024 · Big Data

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

This article presents the design and implementation of a high‑availability, real‑time logistics supply‑chain monitoring dashboard, covering its data processing pipeline with Flink, storage choices between Elasticsearch and ClickHouse, multi‑layer architecture, metric consistency, stability mechanisms, extensibility configurations, and monitoring practices.

ClickHouseDashboardbig data
0 likes · 11 min read
Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 28, 2024 · Artificial Intelligence

Feature Center Overview in iQIYI's Opal Machine Learning Platform

The Feature Center in iQIYI’s Opal platform centralizes feature creation, storage, and real‑time access through a drag‑and‑drop DAG workflow and DSL‑driven transformations, handling massive QPS and low‑latency demands while enabling fast business iteration, cross‑team reuse, and monitoring for advertising, recommendation, and risk‑control applications.

Big DataFeature EngineeringOpal
0 likes · 13 min read
Feature Center Overview in iQIYI's Opal Machine Learning Platform
Zhuanzhuan Tech
Zhuanzhuan Tech
May 23, 2024 · Backend Development

Design and Implementation of a Channel Reconciliation System for ZuanZuan Payments

This article details the background, architecture, data preparation methods, massive‑data handling strategies, verification processes, and error‑handling mechanisms of ZuanZuan's channel reconciliation system, highlighting design choices such as binlog ingestion, task‑driven bill downloads, sharding with Hive archiving, and MQ‑based reconciliation to ensure financial data consistency and safety.

HiveMQbackend
0 likes · 11 min read
Design and Implementation of a Channel Reconciliation System for ZuanZuan Payments
DataFunTalk
DataFunTalk
May 21, 2024 · Big Data

Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights

This article details how Alluxio was adopted to replace NAS in autonomous driving model training, describing the data closed‑loop workflow, the challenges of the previous system, Alluxio's architectural benefits, deployment strategies across single and multiple data centers, functional and performance testing, operational tuning, and the resulting cost and efficiency gains.

AlluxioDistributed StoragePerformance Optimization
0 likes · 15 min read
Applying Alluxio to Autonomous Driving Model Training: Deployment, Performance, and Operational Insights
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Apr 11, 2024 · Backend Development

Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)

The article details NetEase Cloud Music’s end‑to‑end, online‑configurable data‑consumption service and schema‑driven visualization platform that transform raw client logs into ClickHouse records, automatically generate tables and dashboards, and provide observability, dramatically reducing manual effort while supporting over twenty performance metrics for frontend monitoring.

ClickHouseOnline ConfigurationPerformance Monitoring
0 likes · 17 min read
Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)
DataFunSummit
DataFunSummit
Mar 22, 2024 · Artificial Intelligence

Risk Control Model Construction for Online Small Loans: Pre‑loan, In‑loan, Post‑loan and Monitoring

This article presents a comprehensive overview of risk control model building for online small‑loan scenarios, covering pre‑loan, in‑loan and post‑loan stages, the associated data pipelines, model deployment strategies, optimization attempts, and monitoring frameworks to ensure accuracy, stability and effectiveness.

credit scoringdata pipelineloan management
0 likes · 16 min read
Risk Control Model Construction for Online Small Loans: Pre‑loan, In‑loan, Post‑loan and Monitoring