Understanding HDFS: Architecture, Data Blocks, Fault Tolerance, and High Availability
This article explains how HDFS, the Hadoop Distributed File System, splits large files into blocks, replicates them for fault tolerance, organizes the cluster into NameNode and DataNode components, and provides high‑availability and scalability mechanisms such as standby NameNode and federation, enabling reliable big‑data storage and access.
What is HDFS?
HDFS (Hadoop Distributed File System) is a widely used distributed file system in the big‑data ecosystem that acts as an intermediate layer between applications and multiple server file systems, abstracting away the underlying servers and providing reliable read/write operations.
Data Blocks
Large files are divided into fixed‑size block s (default 128 MB). Each block can be stored on different servers, improving disk‑space utilization and enabling parallel storage.
Fault Tolerance
Each block is replicated across several DataNodes; if a server fails, other replicas ensure the data remains accessible, dramatically increasing the system’s fault‑tolerance.
HDFS Architecture
The cluster is split into two roles:
NameNode (Master) : manages metadata, directory tree (NameSpace), and the Block Manager that tracks which DataNode holds each block.
DataNode (Slave) : stores the actual block data on local disks and periodically reports status to the NameNode.
Clients interact with the NameNode to obtain block locations, then read/write directly to the appropriate DataNodes.
NameNode Internals
Metadata (NameSpace) and block information are kept in memory for high performance and periodically persisted to disk as fsimage (a snapshot) and editlog (incremental changes) to guarantee durability.
High Availability
To avoid a single point of failure, a standby NameNode continuously synchronizes its fsimage and editlog with the active NameNode; if the active node crashes, the standby takes over instantly.
Scalability – Federation
When a single NameNode becomes a bottleneck, the namespace can be partitioned into multiple independent NameSpaces, each managed by its own NameNode, forming an HDFS Federation that distributes load and improves scalability.
Writing a Large File
Client requests the active NameNode to create a file; the NameNode records metadata in the EditLog and updates the NameSpace.
The client obtains a list of DataNodes for the first block; it writes the block to a primary DataNode, which replicates it to other DataNodes.
After each block is stored, the client notifies the NameNode, which updates timestamps and logs the operation.
The process repeats for subsequent blocks until the file is complete.
Reading a Large File
Client asks the NameNode for the file’s metadata; the NameNode returns the block list and their DataNode locations.
For each block, the client connects to a chosen DataNode, requests the block, and verifies integrity via checksum.
Blocks are streamed back, reassembled, and the full file is reconstructed.
Summary
HDFS abstracts multiple servers as a single file system, splitting files into blocks and replicating them for reliability.
The cluster consists of a NameNode (metadata manager) and DataNodes (block storage).
Metadata resides in memory for speed and is persisted via fsimage and editlog for durability.
High availability is achieved with a standby NameNode; scalability is addressed through federation of multiple NameNodes.
Clients write and read large files by interacting with the NameNode for metadata and directly with DataNodes for block transfer.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.