How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes
This guide walks through planning a three‑node Hadoop 3.3.5 cluster, explains default and custom configuration files, details core‑site, hdfs‑site, yarn‑site, and mapred‑site settings, shows how to distribute configs, start HDFS and YARN, and perform basic file‑system tests.
1. Cluster Planning
Deploy Hadoop version 3.3.5 on three hardware nodes. Hadoop configuration files are divided into default files and custom files.
Key Knowledge Points
HDFS is the open‑source implementation of Google File System (GFS); MapReduce implements Google’s MapReduce; HBase is the open‑source version of BigTable.
Hadoop 2.0+ consists of four major components: HDFS, MapReduce, YARN (Yet Another Resource Negotiator), and COMMON.
The two most prominent features of Hadoop are its distributed architecture and fault‑tolerance mechanisms.
Hadoop follows a master‑slave structure for both computation and storage.
YARN scheduling: the ResourceManager receives a job request from a client, selects a worker node (NodeManager) to run the job, and the NodeManager handles execution while the ResourceManager manages resources.
HDFS storage: the NameNode stores only metadata; DataNodes store the actual data blocks. Clients first query the NameNode for block locations, then retrieve data from the appropriate DataNode.
2. Default Configuration Files
After extracting the Hadoop package, the default JAR files are located under
hadoop-3.3.5\share\hadoop.
3. Custom Configuration Files
The four main custom XML files are
core-site.xml,
hdfs-site.xml,
yarn-site.xml, and
mapred-site.xml, stored in
$HADOOP_HOME/etc/hadoop. Modify them according to project requirements.
4. Cluster Configuration
4.1 Core Configuration (core-site.xml)
<code>[antares@hadoop102 ~]$ cd $HADOOP_HOME/etc/hadoop
[antares@hadoop hadoop]$ vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Specify NameNode address -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
<!-- Specify Hadoop temporary directory -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.3.5/data</value>
</property>
<!-- Set static user for HDFS web UI -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>antares</value>
</property>
</configuration>
</code>4.2 HDFS Configuration (hdfs-site.xml)
<code>[antares@hadoop hadoop]$ vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
</property>
</configuration>
</code>4.3 YARN Configuration (yarn-site.xml)
<code>[antares@hadoop hadoop]$ vim yarn-site.xml
<configuration>
<!-- Enable MapReduce shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- ResourceManager hostname -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<!-- Inherit environment variables -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
</code>4.4 MapReduce Configuration (mapred-site.xml)
<code>[antares@hadoop hadoop]$ vim mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Run MapReduce on YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
</code>4.5 Distribute Configuration Files
Copy the edited XML files to all nodes (e.g., using
scp) and ensure the original files are removed or renamed to avoid conflicts.
5. Configure Workers
<code>[antares@hadoop102 hadoop]$ vim /opt/module/hadoop-3.3.5/etc/hadoop/workers
hadoop102
hadoop103
hadoop104
</code>6. Start the Cluster
6.1 First‑time NameNode Formatting
On the NameNode (hadoop102) format the NameNode. Note that formatting generates a new cluster ID; if you need to re‑format later, stop all NameNode and DataNode processes, delete
dataand
logsdirectories on every machine, then format again.
<code>[antares@hadoop102 hadoop-3.3.5]$ pwd
/opt/module/hadoop-3.3.5
[antares@hadoop102 hadoop-3.3.5]$ hdfs namenode -format
</code>6.2 Start HDFS
<code>[antares@hadoop102 hadoop-3.3.5]$ sbin/start-dfs.sh
</code>6.3 Start YARN
Run on the ResourceManager node (hadoop103):
<code>[antares@hadoop103 hadoop-3.3.5]$ sbin/start-yarn.sh
</code>6.4 Verify via Web UI
HDFS NameNode UI: http://hadoop102:9870
YARN ResourceManager UI: http://hadoop103:8088
7. Basic Cluster Tests
Upload Small File
<code>[antares@hadoop102 ~]$ hadoop fs -mkdir /test
[antares@hadoop102 ~]$ hadoop fs -put $HADOOP_HOME/testinput/kk.txt /test
</code>If the file is missing, locate it under
$HADOOP_HOMEand retry.
Upload Large File
<code>[antares@hadoop102 ~]$ hadoop fs -put /opt/software/jdk-8u391-linux-x64.tar.gz /test
</code>The large file is replicated three times by default.
Check Storage Path
<code>[antares@hadoop102 subdir0]$ pwd
/opt/module/hadoop-3.3.5/data/dfs/data/current/BP-1445008223-192.168.193.161-1706011370209/current/finalized/subdir0/subdir0
</code>Inspect block files (e.g.,
blk_1073741826) to see how Hadoop splits large files.
After confirming replication on nodes hadoop103 and hadoop104, the cluster setup is complete.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.