Backend Development 8 min read

NebulasFs: A Distributed High‑Availability Small‑File Storage System Developed by 360 Infrastructure Team

NebulasFs is a self‑developed distributed, highly available, and persistent storage system designed to efficiently store billions of small files, offering simple RESTful APIs, automatic request routing, multi‑tenant isolation, customizable replication, automated scaling, rebalancing, and fault‑tolerant replica recovery for large‑scale unstructured data workloads.

360 Quality & Efficiency

Jun 11, 2019

NebulasFs: A Distributed High‑Availability Small‑File Storage System Developed by 360 Infrastructure Team

Background: Rapid business growth at the company generated massive amounts of unstructured data such as images, documents, audio, and video, especially with the rise of mobile internet, AI, and IoT devices. Storing these tiny files cost‑effectively became a challenge, prompting the development of an in‑house small‑file storage service.

NebulasFs Overview: Inspired by Facebook's Haystack paper, NebulasFs is a distributed, high‑availability, highly reliable, and persistent small‑file storage system capable of handling hundreds of billions of files.

Architecture Design: The system consists of two main roles—Master and Datanode. The Master manages metadata, cluster administration, and task scheduling, using external consistency tools like ETCD, and operates in a primary‑multiple‑replica mode. Datanodes handle user‑facing storage, request routing, and contain Volume storage and Proxy modules.

Features:

Simple, lightweight, and generic HTTP RESTful APIs for PUT/GET operations, eliminating the need for custom clients.

Full request proxy and automatic routing on Datanodes, allowing a single request to reach the correct storage node without client‑side metadata handling.

Multi‑tenant resource pools that physically isolate resources to prevent cross‑tenant impact, supporting various pool scopes from cross‑data‑center to within a single rack.

Customizable multi‑replica storage schemes with five defined fault‑domain levels (data center, rack row, rack, server, disk) to meet different availability requirements.

Automated capacity expansion and load‑based scaling, with automatic rebalancing of data when new Datanodes are added for traffic‑driven expansions.

Automated replica repair and replenishment driven by the Master, ensuring lost replicas are recreated as read‑only copies before becoming fully writable.

Fault‑Domain and Resource Pool Relationship: Visual diagrams illustrate how servers, racks, rows, and data centers map to logical fault domains and physical resource pools, enabling clear separation of management and user request paths.

Conclusion: NebulasFs has been in production internally for nearly a year, also serving as the backend storage for an object storage service compatible with the AWS S3 protocol, and will continue to evolve to support growing business demands.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high availability cloud distributed storage NebulasFs Small Files

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.