Tag

distributed file system

1 views collected around this technical thread.

ByteDance Cloud Native
ByteDance Cloud Native
Mar 13, 2025 · Backend Development

Inside DeepSeek 3FS: Architecture of a High‑Performance Parallel File System

This article dissects DeepSeek's 3FS parallel file system, detailing its four‑component architecture, high‑throughput RDMA networking, metadata handling with FoundationDB, client access methods, chain replication (CRAQ), custom FFRecord format, and recovery mechanisms, offering a deep technical perspective for storage engineers.

RDMAchain replicationdistributed file system
0 likes · 22 min read
Inside DeepSeek 3FS: Architecture of a High‑Performance Parallel File System
AntData
AntData
Mar 4, 2025 · Big Data

Design and Analysis of 3FS: An AI‑Optimized Distributed File System

The article provides a comprehensive English overview of 3FS, an AI‑focused distributed file system that leverages FoundationDB for metadata, CRAQ for chunk replication, and a hybrid Fuse/native client architecture, detailing its design, components, fault handling, and performance considerations for large‑scale training workloads.

AI StorageCRAQ replicationCloud Native
0 likes · 25 min read
Design and Analysis of 3FS: An AI‑Optimized Distributed File System
IT Services Circle
IT Services Circle
Feb 9, 2025 · Big Data

Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability

This article explains how HDFS, the Hadoop Distributed File System, splits large files into blocks, replicates them for fault tolerance, organizes the cluster into NameNode and DataNode components, and provides high‑availability and scalability mechanisms such as standby NameNode and federation, enabling reliable big‑data storage and access.

Big DataDataNodeHDFS
0 likes · 11 min read
Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 26, 2024 · Big Data

Understanding Hadoop HDFS and MapReduce: Principles, Architecture, and Sample Code

This article explains the origins of big‑data technologies, details the architecture and read/write mechanisms of Hadoop's HDFS, describes the MapReduce programming model, and provides complete Java code examples for a simple distributed file‑processing job using Maven dependencies.

Big DataHDFSHadoop
0 likes · 15 min read
Understanding Hadoop HDFS and MapReduce: Principles, Architecture, and Sample Code
JD Retail Technology
JD Retail Technology
Oct 29, 2024 · Big Data

JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS

This article details JD's large‑scale HDFS unified storage implementation, covering cross‑region storage challenges, topology design, asynchronous block replication, flow‑control mechanisms, tiered storage strategies, automatic hot‑cold data migration, and the resulting performance and cost improvements for big‑data workloads.

Big DataCross-Region StorageHDFS
0 likes · 20 min read
JD Unified Storage Practice: Cross‑Region and Tiered Storage on HDFS
DataFunSummit
DataFunSummit
Oct 4, 2024 · Big Data

JD Retail HDFS Unified Storage: Cross‑Region and Tiered Storage Practices

This article presents JD Retail's large‑scale HDFS deployment, detailing its unified storage architecture, cross‑region data replication challenges and solutions, tiered storage strategies for hot, warm and cold data, and the operational modules that together improve performance, reliability and cost efficiency in a big‑data environment.

Big DataCross-Region StorageHDFS
0 likes · 21 min read
JD Retail HDFS Unified Storage: Cross‑Region and Tiered Storage Practices
360 Smart Cloud
360 Smart Cloud
May 15, 2024 · Cloud Native

Polefs: A Cloud‑Native Distributed Cache File System for AI Training Workloads

The article outlines the challenges of massive AI training data, defines storage performance requirements, and presents Polefs—a cloud‑native distributed cache file system with unified storage, metadata acceleration, and read/write caching designed to improve GPU utilization and reduce data redundancy.

AICloud NativeGPU utilization
0 likes · 14 min read
Polefs: A Cloud‑Native Distributed Cache File System for AI Training Workloads
DataFunTalk
DataFunTalk
Jan 27, 2024 · Big Data

JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse

This article presents JuiceFS, a cloud‑native distributed file system that bridges the gaps between HDFS and object storage, explaining Data Lake and Lakehouse concepts, comparing storage options, detailing JuiceFS's architecture and performance benefits, and showcasing real‑world user case studies.

Big DataJuiceFSLakehouse
0 likes · 23 min read
JuiceFS: A Cloud‑Native Distributed File System for Data Lake and Lakehouse
Didi Tech
Didi Tech
Sep 19, 2023 · Cloud Native

OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System

OrangeFS is Didi’s cloud‑native, multi‑protocol distributed data‑lake storage system that unifies POSIX, S3 and HDFS access on a single logical hierarchy, integrates with Kubernetes via a CSI plugin, supports on‑premise and public‑cloud backends, provides multi‑tenant isolation, and dramatically improves elasticity, utilization and latency for petabyte‑scale workloads such as ride‑hailing logs, machine‑learning training, finance and analytics.

CSIFUSEKubernetes
0 likes · 17 min read
OrangeFS: A Cloud‑Native Multi‑Protocol Distributed Data Lake Storage System
DataFunTalk
DataFunTalk
Sep 15, 2023 · Cloud Computing

Design and Architecture of Baidu CFS Large‑Scale Distributed File System and Metadata Service

The talk from DataFun Summit 2023 explains how Baidu's CFS storage builds a trillion‑file‑scale distributed file system by revisiting file system fundamentals, POSIX limitations, historical storage architectures, and introducing a lock‑free metadata service with single‑shard primitives, data‑layout optimizations, and a simplified client‑centric architecture that achieves high scalability and performance.

Big DataCFSPOSIX
0 likes · 31 min read
Design and Architecture of Baidu CFS Large‑Scale Distributed File System and Metadata Service
ByteDance SYS Tech
ByteDance SYS Tech
Aug 1, 2023 · Cloud Native

How ByteFUSE Revolutionizes High‑Performance Cloud‑Native Storage with FUSE and RDMA

ByteFUSE, a user‑space FUSE‑based solution for ByteNAS, delivers low‑latency, high‑throughput, POSIX‑compatible storage across AI training, database backup, and search services by replacing NFS with a cloud‑native architecture that leverages CSI, RDMA, and kernel‑module hot‑upgrade techniques.

Cloud NativeFUSEKubernetes
0 likes · 19 min read
How ByteFUSE Revolutionizes High‑Performance Cloud‑Native Storage with FUSE and RDMA
Baidu Geek Talk
Baidu Geek Talk
May 29, 2023 · Backend Development

CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections - Baidu's Implementation Journey

Baidu’s CFS metadata service scales to billions of files by shrinking critical sections through a lock‑free Namespace 2.0 design that confines conflicts to single shards, uses field‑level atomic primitives, and integrates the proxy into the client, delivering up to 76× throughput gains and significant latency reductions in production.

Baidu CFSEuroSys 2023POSIX compatibility
0 likes · 40 min read
CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical Sections - Baidu's Implementation Journey
DataFunSummit
DataFunSummit
Jan 20, 2023 · Cloud Native

Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System

This article reviews the evolution of file storage, outlines the challenges of the cloud era, and details JuiceFS's design philosophy, architecture, key capabilities, and real‑world use cases such as Kubernetes, AI, big‑data analytics, and NAS migration to the cloud.

AIBig DataCloud Native
0 likes · 22 min read
Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System
Ctrip Technology
Ctrip Technology
Aug 4, 2022 · Cloud Native

Case Study of Using JuiceFS for Cold Data Storage at Ctrip: Architecture, Performance Evaluation, and Optimization

This article presents Ctrip's experience migrating over 2 PB of cold data to JuiceFS, detailing the system's architecture, metadata engine selection, extensive performance testing, fault‑tolerance analysis, and operational optimizations that reduced storage and maintenance costs while supporting future petabyte‑scale workloads.

Cloud NativeCold Data StorageJuiceFS
0 likes · 15 min read
Case Study of Using JuiceFS for Cold Data Storage at Ctrip: Architecture, Performance Evaluation, and Optimization
IT Architects Alliance
IT Architects Alliance
Jun 3, 2022 · Backend Development

Open‑Source Distributed File System Based on Spring Boot and Vue CLI – Features and Technical Overview

This article introduces an open‑source distributed file system built with Spring Boot and Vue CLI, detailing its MIT licensing, UI layout, file operations, multiple upload methods, online preview and editing capabilities, storage options, and the underlying backend and frontend technologies.

Spring BootVue.jsdistributed file system
0 likes · 9 min read
Open‑Source Distributed File System Based on Spring Boot and Vue CLI – Features and Technical Overview
DataFunTalk
DataFunTalk
May 17, 2022 · Big Data

Exploring JuiceFS in Data Lake Storage Architecture

This presentation provides a comprehensive overview of JuiceFS, an open‑source cloud‑native distributed file system, detailing its role in modern data lake and lakehouse architectures, comparing it with HDFS and object storage, and highlighting its performance, integration, and community ecosystem.

Big DataJuiceFSLakehouse
0 likes · 19 min read
Exploring JuiceFS in Data Lake Storage Architecture
vivo Internet Technology
vivo Internet Technology
Apr 20, 2022 · Backend Development

FastDFS Overview: Principles, Architecture, Upload/Download Process, Synchronization, and Storage Management

FastDFS is a lightweight, open‑source distributed file system written in C that uses a three‑component architecture—client, tracker server for load‑balancing and discovery, and storage servers with push‑based binlog replication—to handle high‑concurrency upload/download of small to medium files, support group‑wide synchronization, optional trunk storage, Nginx anti‑leech integration, and extensible deduplication via FastDHT.

Nginx ModuleSynchronizationUpload Download
0 likes · 15 min read
FastDFS Overview: Principles, Architecture, Upload/Download Process, Synchronization, and Storage Management
Bilibili Tech
Bilibili Tech
Mar 30, 2022 · Big Data

HDFS Architecture, Optimizations, and Future Plans at Bilibili

Bilibili’s HDFS now runs a three‑tier architecture—access, metadata, and data layers—enhanced with a custom MergeFS router, observer NameNode, dynamic load balancing, fast‑failover pipelines, and storage‑aware policies, while future work targets transparent erasure coding, tiered data routing, lock refinements, and a Hadoop 3.x migration.

Big DataHDFSMetadata Scaling
0 likes · 22 min read
HDFS Architecture, Optimizations, and Future Plans at Bilibili
Architecture Digest
Architecture Digest
Dec 28, 2021 · Big Data

HDFS Overview: Architecture, Features, Data Management and Storage Policies

This article provides a comprehensive overview of HDFS, covering basic file system concepts, HDFS architecture, high availability, federation, replica placement, storage policies, colocation, data integrity, and key design considerations for large‑scale distributed storage.

Big DataColocationHDFS
0 likes · 23 min read
HDFS Overview: Architecture, Features, Data Management and Storage Policies