Big Data 19 min read

Step-by-Step Guide to Building a Hadoop Big Data Cluster on ARM Architecture

This comprehensive tutorial details the process of deploying a complete Hadoop-based big data ecosystem on ARM architecture, covering the installation and configuration of essential components including Java, Zookeeper, Hadoop, MySQL, Hive, and Spark with practical code examples.

政采云技术
政采云技术
政采云技术
Step-by-Step Guide to Building a Hadoop Big Data Cluster on ARM Architecture

This article provides a comprehensive guide to deploying a Hadoop-based big data cluster on ARM architecture, addressing the growing demand for open-source, power-efficient computing solutions. It begins by comparing X86 and ARM architectures, highlighting ARM's advantages in power consumption and open-source flexibility, and outlines three primary deployment strategies before focusing on open-source component integration.

The tutorial starts with essential cluster prerequisites, including configuring passwordless SSH access and NTP time synchronization across a minimum of three nodes. It then details the step-by-step installation of Java 8, followed by Zookeeper for distributed coordination. The Zookeeper configuration involves setting up the configuration file and generating unique myid files for each node:

cd /opt/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
dataDir=/opt/zookeeper/data
server.1=node1:2888:3888
server.1=node1:2888:3888
server.1=node1:2888:3888

Next, the core Hadoop ecosystem is deployed. The guide walks through extracting Hadoop packages, configuring environment variables, and modifying critical configuration files. The core-site.xml configuration defines global cluster parameters:

fs.defaultFS
hdfs://node1:8020
fs.trash.interval
1
io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec
hadoop.security.authentication
simple
hadoop.security.authorization
false
hadoop.rpc.protection
authentication
hadoop.security.auth_to_local
DEFAULT
hadoop.proxyuser.oozie.hosts
*
hadoop.proxyuser.oozie.groups
*
hadoop.proxyuser.flume.hosts
*
hadoop.proxyuser.flume.groups
*
hadoop.proxyuser.HTTP.hosts
*
hadoop.proxyuser.HTTP.groups
*
hadoop.proxyuser.hive.hosts
*
hadoop.proxyuser.hive.groups
*
hadoop.proxyuser.hue.hosts
*
hadoop.proxyuser.hue.groups
*
hadoop.proxyuser.httpfs.hosts
*
hadoop.proxyuser.httpfs.groups
*
hadoop.proxyuser.hdfs.groups
*
hadoop.proxyuser.hdfs.hosts
*
hadoop.proxyuser.yarn.hosts
*
hadoop.proxyuser.yarn.groups
*
hadoop.security.group.mapping
org.apache.hadoop.security.ShellBasedUnixGroupsMapping
hadoop.security.instrumentation.requires.admin
false
net.topology.script.file.name
/etc/hadoop/conf.cloudera.yarn/topology.py
io.file.buffer.size
65536
hadoop.ssl.enabled
false
hadoop.ssl.require.client.cert
false
true
hadoop.ssl.keystores.factory.class
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory
true
hadoop.ssl.server.conf
ssl-server.xml
true
hadoop.ssl.client.conf
ssl-client.xml
true

The hdfs-site.xml, yarn-site.xml, and mapred-site.xml files are similarly configured to define storage paths, replication factors, resource manager addresses, and job scheduling parameters. After distributing files to worker nodes, the cluster is formatted and launched:

cd /opt/hadoop/bin
./hadoop namenode -format

cd /opt/hadoop/sbin
./start-all.sh

The tutorial continues with MySQL installation to serve as the Hive metastore, followed by Hive deployment. It covers extracting Hive packages, modifying hive-site.xml for database connectivity, initializing the schema, and verifying table creation. Finally, Spark is integrated as a high-performance in-memory computing engine, with configuration steps for spark-env.sh and spark-defaults.conf to enable YARN integration and event logging. The article concludes by confirming the successful startup of the Spark shell, marking the completion of a fully functional ARM-based big data cluster.

distributed systemsBig DataZookeeperHiveSparkHadoopCluster DeploymentARM architecture
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.