Step-by-Step Guide to Building a Hadoop Big Data Cluster on ARM Architecture
This comprehensive tutorial details the process of deploying a complete Hadoop-based big data ecosystem on ARM architecture, covering the installation and configuration of essential components including Java, Zookeeper, Hadoop, MySQL, Hive, and Spark with practical code examples.
This article provides a comprehensive guide to deploying a Hadoop-based big data cluster on ARM architecture, addressing the growing demand for open-source, power-efficient computing solutions. It begins by comparing X86 and ARM architectures, highlighting ARM's advantages in power consumption and open-source flexibility, and outlines three primary deployment strategies before focusing on open-source component integration.
The tutorial starts with essential cluster prerequisites, including configuring passwordless SSH access and NTP time synchronization across a minimum of three nodes. It then details the step-by-step installation of Java 8, followed by Zookeeper for distributed coordination. The Zookeeper configuration involves setting up the configuration file and generating unique myid files for each node:
cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/opt/zookeeper/data
server.1=node1:2888:3888
server.1=node1:2888:3888
server.1=node1:2888:3888Next, the core Hadoop ecosystem is deployed. The guide walks through extracting Hadoop packages, configuring environment variables, and modifying critical configuration files. The core-site.xml configuration defines global cluster parameters:
fs.defaultFS
hdfs://node1:8020
fs.trash.interval
1
io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec
hadoop.security.authentication
simple
hadoop.security.authorization
false
hadoop.rpc.protection
authentication
hadoop.security.auth_to_local
DEFAULT
hadoop.proxyuser.oozie.hosts
*
hadoop.proxyuser.oozie.groups
*
hadoop.proxyuser.flume.hosts
*
hadoop.proxyuser.flume.groups
*
hadoop.proxyuser.HTTP.hosts
*
hadoop.proxyuser.HTTP.groups
*
hadoop.proxyuser.hive.hosts
*
hadoop.proxyuser.hive.groups
*
hadoop.proxyuser.hue.hosts
*
hadoop.proxyuser.hue.groups
*
hadoop.proxyuser.httpfs.hosts
*
hadoop.proxyuser.httpfs.groups
*
hadoop.proxyuser.hdfs.groups
*
hadoop.proxyuser.hdfs.hosts
*
hadoop.proxyuser.yarn.hosts
*
hadoop.proxyuser.yarn.groups
*
hadoop.security.group.mapping
org.apache.hadoop.security.ShellBasedUnixGroupsMapping
hadoop.security.instrumentation.requires.admin
false
net.topology.script.file.name
/etc/hadoop/conf.cloudera.yarn/topology.py
io.file.buffer.size
65536
hadoop.ssl.enabled
false
hadoop.ssl.require.client.cert
false
true
hadoop.ssl.keystores.factory.class
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
true
hadoop.ssl.server.conf
ssl-server.xml
true
hadoop.ssl.client.conf
ssl-client.xml
trueThe hdfs-site.xml, yarn-site.xml, and mapred-site.xml files are similarly configured to define storage paths, replication factors, resource manager addresses, and job scheduling parameters. After distributing files to worker nodes, the cluster is formatted and launched:
cd /opt/hadoop/bin
./hadoop namenode -format
cd /opt/hadoop/sbin
./start-all.shThe tutorial continues with MySQL installation to serve as the Hive metastore, followed by Hive deployment. It covers extracting Hive packages, modifying hive-site.xml for database connectivity, initializing the schema, and verifying table creation. Finally, Spark is integrated as a high-performance in-memory computing engine, with configuration steps for spark-env.sh and spark-defaults.conf to enable YARN integration and event logging. The article concludes by confirming the successful startup of the Spark shell, marking the completion of a fully functional ARM-based big data cluster.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.