Cloud Native 22 min read

Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System

This article reviews the evolution of file storage, outlines the challenges of the cloud era, and details JuiceFS's design philosophy, architecture, key capabilities, and real‑world use cases such as Kubernetes, AI, big‑data analytics, and NAS migration to the cloud.

DataFunSummit
DataFunSummit
DataFunSummit
Design and Architecture of JuiceFS: A Cloud‑Native Distributed File System

JuiceFS is a cloud‑native distributed file system that emerged after rapid development of open‑source storage solutions since its open‑source release in early 2021. The presentation first revisits the history of file storage—from LAN‑era hardware appliances, through the internet boom, mobile internet, and finally the cloud‑native era—highlighting the limitations of each generation.

It then discusses the pain points of the cloud era, such as the trade‑offs of traditional NAS, the scalability constraints of early software‑defined storage, and the reduced feature set and eventual consistency of object storage services like S3.

The design philosophy of JuiceFS aims to provide a cloud‑optimized file system that combines POSIX compatibility, HDFS API support, and S3‑compatible access while offering strong consistency, high availability, and efficient handling of massive small files required by AI and big‑data workloads.

Key architectural choices include a plugin‑based engine with separate metadata and data engines, support for multiple metadata back‑ends (Redis, MySQL, PostgreSQL, TiKV, SQLite, BadgerDB, etc.), and leveraging existing object storage for data persistence. The client exposes four access methods: FUSE (POSIX), Java SDK (HDFS), CSI driver (Kubernetes), and an S3 gateway.

JuiceFS also provides observability through detailed access logs and tooling to diagnose performance issues, aiming to improve both developer experience and operational maintenance.

Typical use cases highlighted are Kubernetes persistent volumes, AI pipelines with transparent caching, big‑data processing with billions of files, and migration of traditional NAS workloads to the cloud across various industries.

cloud-nativeBig DataAIkubernetesstoragedistributed file systemJuiceFS
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.