Big Data 18 min read

Overview and Architecture of Hadoop Distributed File System (HDFS)

This article provides a comprehensive overview of Hadoop Distributed File System (HDFS), detailing its design goals, architecture components such as NameNode, DataNode and SecondaryNameNode, data block handling, replication strategies, communication protocols, and the read, write, and delete processes.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Overview and Architecture of Hadoop Distributed File System (HDFS)

HDFS (Hadoop Distributed File System) is designed to store massive data sets across many commodity machines while providing high reliability and high throughput. It introduces network programming and fault tolerance to handle node failures.

Design Premises and Goals

Store extremely large files : support GB‑scale files, high bandwidth, and scale to hundreds or thousands of nodes; a single instance can manage tens of millions of files.

Streaming data access : optimized for batch processing, emphasizing high throughput rather than low‑latency interactive queries.

Fault tolerance : robust redundancy mechanisms.

Simple consistency model : write‑once‑read‑many; files rarely change after being written.

Move computation to data : APIs allow computation to run on the node closest to the data.

Hardware/Software compatibility .

Unsuitable Scenarios

Large numbers of small files – metadata stored in NameNode memory would be exhausted.

Low‑latency data access – HDFS is built for high throughput, not fast response.

Multiple users writing or arbitrarily modifying files.

HDFS Architecture

HDFS consists of three main components: NameNode , SecondaryNameNode and DataNode . It follows a master/slave model where NameNode and SecondaryNameNode run on master nodes and DataNodes run on slaves.

Data Blocks

Files are split into blocks (default 64 MB). Large blocks reduce the ratio of seek time to transfer time, but overly large blocks can limit parallelism for MapReduce jobs. Blocks are stored on DataNodes and replicated (default replication factor = 3) for fault tolerance.

Supports arbitrarily large files without being limited by a single node’s disk capacity.

Simplifies the file subsystem – metadata stays in NameNode, block data stays in DataNodes.

Facilitates backup, high availability, and load balancing.

NameNode

The NameNode manages the namespace and metadata (file‑to‑block mapping, directory hierarchy, permissions, etc.). It does not store the actual block data; that is the responsibility of DataNodes. Metadata is kept in memory and persisted to two files:

fsimage – a snapshot of the namespace.

edits – a log of recent changes.

During startup, NameNode loads fsimage into memory and replays edits . The metadata size is roughly 200 bytes per block; therefore Hadoop prefers large files.

SecondaryNameNode

It is not a standby NameNode. Its role is to periodically merge fsimage and edits into a new checkpoint, reducing NameNode restart time. The merge steps are:

Instruct NameNode to write new edits to edits.new .

Fetch the current fsimage and edits .

Merge them into a new fsimage .

Send the new fsimage back to NameNode and replace the old files.

Update the fstime checkpoint file.

DataNode

DataNodes store block replicas and serve read/write requests from clients. They send periodic heartbeats and block reports to the NameNode. When a client reads a file, the NameNode returns the locations of block replicas; the client then contacts the nearest DataNode directly.

Data Replication and Placement

With a default replication factor of 3, HDFS places the first replica on one rack, the second on the same rack, and the third on a different rack. This balances write performance with fault tolerance. During reads, the closest replica is chosen.

Safety Mode

When the NameNode starts, it enters safe mode, waiting for a configurable percentage of blocks to achieve the minimum replication before exiting. While in safe mode the filesystem is read‑only.

Communication Protocols

All communication uses TCP/IP. Clients use the ClientProtocol RPC to talk to NameNode; DataNodes use the DataNodeProtocol . The NameNode never initiates connections; it only responds to requests.

Reliability Guarantees

DataNode failures are detected via missed heartbeats; missing blocks are re‑replicated.

Corrupted blocks are detected by checksum mismatches and are re‑read from another replica.

HDFS File Read Process

A client obtains a FileSystem instance and calls open() . The client contacts the NameNode to get block locations, then streams data directly from the nearest DataNode. The process can be summarized as:

1. Client sends read request.
2. NameNode returns block location list.
3. Client reads data directly from DataNodes.
4. Client closes the stream.

If a DataNode fails during read, the client retries another replica and marks the failed node.

HDFS File Write Process

The client creates a DistributedFileSystem instance and calls create() . After permission checks, the NameNode logs the creation in edits and returns a FSDataOutputStream . The write pipeline works as follows:

1. Client writes data to a local temporary file until block size is reached.
2. Client requests DataNode locations from NameNode.
3. NameNode returns a list of DataNodes for the new block.
4. Client streams the block to the first DataNode, which forwards it to the second, then third (pipeline).
5. Each DataNode acknowledges receipt; the first DataNode acknowledges the client.
6. On successful write, the client notifies NameNode.
7. If any DataNode fails, the pipeline is re‑established with remaining nodes.

HDFS File Delete Process

Deletion is a three‑step operation:

1. NameNode moves the file to /trash (fast metadata rename).
2. After the configured retention period, the file is removed from the namespace.
3. Block replicas are marked for garbage collection, freeing space.

The article concludes with references to the original CSDN source.

big datadata replicationdistributed file systemHDFSHadoopNameNode
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.