Tag

big data storage

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerData Infrastructure
0 likes · 16 min read
Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans
DataFunTalk
DataFunTalk
Jul 4, 2022 · Big Data

Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations

This article explains the shortcomings of HDFS at large scale, describes the Federation and Scaling approaches, and details how Apache Ozone redesigns metadata storage, introduces container abstraction, object semantics, and new features such as optimized OM, streaming writes, erasure coding, and RocksDB consolidation to improve scalability and performance.

Apache OzoneErasure CodingHDFS
0 likes · 11 min read
Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations
DataFunTalk
DataFunTalk
Aug 11, 2021 · Big Data

OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System

This article introduces OPPO's self‑developed data lake storage system CBFS, covering the fundamentals of data lake storage, the multi‑layer CBFS architecture, its core technologies such as metadata management and erasure coding, and future directions for large‑scale, low‑cost data analytics.

CBFSCloud NativeData Lake
0 likes · 14 min read
OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System
DataFunTalk
DataFunTalk
Jul 8, 2021 · Big Data

Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture

This article explains how ByteDance extended the Apache HDFS architecture with a multi‑datacenter design, introducing components such as DanceNN, NNProxy, and BookKeeper to achieve scalable storage, cross‑datacenter data placement, and rack‑level disaster recovery for petabyte‑scale workloads.

ByteDanceHDFSbig data storage
0 likes · 13 min read
Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture
Tencent Tech
Tencent Tech
Mar 31, 2020 · Big Data

How Tencent Cloud Keeps Big Data Disks Reliable: Inside Their Health Assurance Plan

This article examines the challenges of hard‑disk reliability in large‑scale big‑data services, explains how Tencent Cloud reduces failure rates through hardware and software optimizations, custom collaborations, pre‑deployment health checks, scoring systems, and usage‑pattern improvements, and reveals the comprehensive strategies that keep data storage stable and performant.

big data storagecloud infrastructuredisk health scoring
0 likes · 11 min read
How Tencent Cloud Keeps Big Data Disks Reliable: Inside Their Health Assurance Plan