Tag

big data architecture

1 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
Apr 8, 2025 · Big Data

Building a Real-Time Data Warehouse for B站 Game Business

To meet Bilibili’s rapidly expanding game business, the team built a unified real-time data warehouse using Hologres and Flink that replaces the traditional Lambda stack, delivering high-throughput writes, low-latency processing, seamless offline-online integration, global deployment, and real-time support for operations, advertising, and risk analytics.

Data architecture case studyGame business dataHologre
0 likes · 17 min read
Building a Real-Time Data Warehouse for B站 Game Business
DataFunSummit
DataFunSummit
Jul 12, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design

This article examines the current evolution of data lakes, detailing their overall architecture, batch and real‑time integration methods, Lakehouse core functionalities such as enhanced DML, schema evolution, ACID support, and open‑design principles that enable multi‑cloud deployment and seamless interaction with diverse compute engines.

Data LakeLakehouseOpen Data Formats
0 likes · 12 min read
Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design
DataFunTalk
DataFunTalk
Jun 10, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities

This article reviews the latest developments in data lakes, including trend analysis, overall architecture, data integration methods, Lakehouse core capabilities, open design principles, stream‑batch unified processing, real‑time OLAP, and lake‑internal warehousing, highlighting how these advances reduce complexity and cost while improving data sharing and performance.

Data LakeLakehouseReal-time OLAP
0 likes · 14 min read
Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
May 30, 2024 · Big Data

Data Lineage System Design and Implementation for Big Data Platforms

This article presents a comprehensive data lineage system (Data-Lineage) for big data platforms, addressing challenges in heterogeneous data sources, multiple execution engines, and complex dependencies through hook-based architecture and modular design.

Data qualitySQL parsingbig data architecture
0 likes · 12 min read
Data Lineage System Design and Implementation for Big Data Platforms
DataFunTalk
DataFunTalk
Jul 10, 2023 · Big Data

Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture

This article presents a comprehensive overview of Lakehouse‑based in‑lake warehousing, covering common data‑lake misconceptions, the evolution from databases to data warehouses and lakes, the advantages of Lakehouse over traditional architectures, a reference multi‑layer architecture, typical use cases, challenges, future plans, and a brief Q&A.

Data LakeHudiIceberg
0 likes · 20 min read
Practical Experience of In‑Lake Warehouse Implementation Based on Lakehouse Architecture
ByteDance Data Platform
ByteDance Data Platform
Nov 16, 2022 · Big Data

How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics

This article explains ByteDance’s data lake technology, its Apache Hudi‑based features, near‑real‑time architecture, and practical e‑commerce use cases such as marketing promotion, traffic diagnosis, logistics monitoring, risk governance, and operational monitoring, while outlining future challenges and plans.

Apache HudiData Lakebig data architecture
0 likes · 15 min read
How ByteDance’s Data Lake Powers Near‑Real‑Time E‑Commerce Analytics
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticData Lake
0 likes · 15 min read
Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice
Bilibili Tech
Bilibili Tech
Jul 15, 2022 · Big Data

Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement

Bilibili’s lakehouse architecture merges Iceberg‑based data lake flexibility with data‑warehouse efficiency, using Kafka‑Flink real‑time ingestion, Spark offline loads, Trino queries, Alluxio caching, Z‑Order/Hilbert sorting, and enhanced BloomFilter and bitmap indexes to boost query speed up to tenfold while drastically cutting file reads.

Data LakeIcebergZ-Order sorting
0 likes · 17 min read
Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiData Lakehouse
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
Tencent Cloud Developer
Tencent Cloud Developer
Feb 28, 2022 · Big Data

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.

GooseFSMetadata OptimizationRaft consensus
0 likes · 18 min read
GooseFS: Distributed Caching System for Storage-Compute Separation Architecture
Baidu Geek Talk
Baidu Geek Talk
Nov 24, 2021 · Big Data

Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned

At Baidu Aifanfan, the data team built a unified real‑time and offline big‑data platform—leveraging Watt, Bigpipe, Fengge, AFS and Palo within Lambda/Kappa patterns and a fast‑slow parallel rollout—that cut OLAP query latency from 18 minutes to under 15 seconds, enabled self‑service analytics, and standardized metrics across 15 agile teams.

Apache DorisKappa architectureKimball methodology
0 likes · 23 min read
Building Big Data Infrastructure at Baidu Aifanfan: Architecture Practices and Lessons Learned
Didi Tech
Didi Tech
Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

StreamSQLbig data architecturedata platform
0 likes · 20 min read
Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 6, 2019 · Big Data

Real-Time Data Architecture, Evolution, and Applications at an Online School

The article details the six‑layer big‑data architecture of an online school, chronicles its migration from Storm to Spark Streaming and finally to Flink, and showcases concrete real‑time applications such as gateway monitoring, user‑profile tagging, renewal reporting, and advertising analysis, while outlining future development directions.

Real-time StreamingSpark Streaminganalytics
0 likes · 14 min read
Real-Time Data Architecture, Evolution, and Applications at an Online School
Didi Tech
Didi Tech
Mar 28, 2019 · Big Data

Scaling Hive Metadata Storage with Federation Architecture

Didi solved Hive’s MySQL metadata bottleneck by building a federation architecture—using waggle_dance to route requests to multiple MySQL instances based on database names—enabling horizontal scaling, read/write support, and seamless compatibility with existing Hive clients while improving stability and performance.

Hive FederationMetadata ScalabilityMySQL Optimization
0 likes · 7 min read
Scaling Hive Metadata Storage with Federation Architecture
Liulishuo Tech Team
Liulishuo Tech Team
Jun 17, 2016 · Big Data

Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design

This article details the architectural design and implementation of a scalable big data platform built on AWS services, highlighting the transition from HDFS to S3 for storage, the use of EMR for elastic compute, and a custom Execution Service integrated with Consul and Airflow for automated cluster management and task scheduling.

AWS EMRAirflowData Engineering
0 likes · 11 min read
Building a Scalable Big Data Platform on AWS: Architecture and Execution Service Design