Big Data 11 min read

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

This guide walks through planning a three‑node Hadoop 3.3.5 cluster, explains default and custom configuration files, details core‑site, hdfs‑site, yarn‑site, and mapred‑site settings, shows how to distribute configs, start HDFS and YARN, and perform basic file‑system tests.

Efficient Ops

Apr 23, 2024

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

1. Cluster Planning

Deploy Hadoop version 3.3.5 on three hardware nodes. Hadoop configuration files are divided into default files and custom files.

Key Knowledge Points

HDFS is the open‑source implementation of Google File System (GFS); MapReduce implements Google’s MapReduce; HBase is the open‑source version of BigTable.

Hadoop 2.0+ consists of four major components: HDFS, MapReduce, YARN (Yet Another Resource Negotiator), and COMMON.

The two most prominent features of Hadoop are its distributed architecture and fault‑tolerance mechanisms.

Hadoop follows a master‑slave structure for both computation and storage.

YARN scheduling: the ResourceManager receives a job request from a client, selects a worker node (NodeManager) to run the job, and the NodeManager handles execution while the ResourceManager manages resources.

HDFS storage: the NameNode stores only metadata; DataNodes store the actual data blocks. Clients first query the NameNode for block locations, then retrieve data from the appropriate DataNode.

2. Default Configuration Files

After extracting the Hadoop package, the default JAR files are located under hadoop-3.3.5\share\hadoop.

3. Custom Configuration Files

The four main custom XML files are core-site.xml, hdfs-site.xml, yarn-site.xml, and mapred-site.xml, stored in $HADOOP_HOME/etc/hadoop. Modify them according to project requirements.

4. Cluster Configuration

4.1 Core Configuration (core-site.xml)

[antares@hadoop102 ~]$ cd $HADOOP_HOME/etc/hadoop
[antares@hadoop hadoop]$ vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- Specify NameNode address -->
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop102:8020</value>
  </property>
  <!-- Specify Hadoop temporary directory -->
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/module/hadoop-3.3.5/data</value>
  </property>
  <!-- Set static user for HDFS web UI -->
  <property>
    <name>hadoop.http.staticuser.user</name>
    <value>antares</value>
  </property>
</configuration>

4.2 HDFS Configuration (hdfs-site.xml)

[antares@hadoop hadoop]$ vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>dfs.namenode.http-address</name>
    <value>hadoop102:9870</value>
  </property>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop104:9868</value>
  </property>
</configuration>

4.3 YARN Configuration (yarn-site.xml)

[antares@hadoop hadoop]$ vim yarn-site.xml
<configuration>
  <!-- Enable MapReduce shuffle -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <!-- ResourceManager hostname -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop103</value>
  </property>
  <!-- Inherit environment variables -->
  <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
  </property>
</configuration>

4.4 MapReduce Configuration (mapred-site.xml)

[antares@hadoop hadoop]$ vim mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- Run MapReduce on YARN -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

4.5 Distribute Configuration Files

Copy the edited XML files to all nodes (e.g., using scp) and ensure the original files are removed or renamed to avoid conflicts.

5. Configure Workers

[antares@hadoop102 hadoop]$ vim /opt/module/hadoop-3.3.5/etc/hadoop/workers
hadoop102
hadoop103
hadoop104

6. Start the Cluster

6.1 First‑time NameNode Formatting

On the NameNode (hadoop102) format the NameNode. Note that formatting generates a new cluster ID; if you need to re‑format later, stop all NameNode and DataNode processes, delete data and logs directories on every machine, then format again.

[antares@hadoop102 hadoop-3.3.5]$ pwd
/opt/module/hadoop-3.3.5
[antares@hadoop102 hadoop-3.3.5]$ hdfs namenode -format

6.2 Start HDFS

[antares@hadoop102 hadoop-3.3.5]$ sbin/start-dfs.sh

6.3 Start YARN

Run on the ResourceManager node (hadoop103):

[antares@hadoop103 hadoop-3.3.5]$ sbin/start-yarn.sh

6.4 Verify via Web UI

HDFS NameNode UI: http://hadoop102:9870

YARN ResourceManager UI: http://hadoop103:8088

7. Basic Cluster Tests

Upload Small File

[antares@hadoop102 ~]$ hadoop fs -mkdir /test
[antares@hadoop102 ~]$ hadoop fs -put $HADOOP_HOME/testinput/kk.txt /test

If the file is missing, locate it under $HADOOP_HOME and retry.

Upload Large File

[antares@hadoop102 ~]$ hadoop fs -put /opt/software/jdk-8u391-linux-x64.tar.gz /test

The large file is replicated three times by default.

Check Storage Path

[antares@hadoop102 subdir0]$ pwd
/opt/module/hadoop-3.3.5/data/dfs/data/current/BP-1445008223-192.168.193.161-1706011370209/current/finalized/subdir0/subdir0

Inspect block files (e.g., blk_1073741826) to see how Hadoop splits large files.

After confirming replication on nodes hadoop103 and hadoop104, the cluster setup is complete.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data configuration YARN HDFS Hadoop Cluster Setup

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.