Big Data 5 min read

Step-by-Step Guide to Installing and Configuring Apache Flume on a Cluster

This guide walks through downloading Apache Flume, setting up a master‑slave cluster, and configuring NetCat, Exec, and Avro sources with corresponding sinks and memory channels, including verification commands to ensure the agents run correctly.

Practical DevOps Architecture
Practical DevOps Architecture
Practical DevOps Architecture
Step-by-Step Guide to Installing and Configuring Apache Flume on a Cluster

1. Software download

wget http://mirror.bit.edu.cn/apache/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz

tar zxvf apache-flume-1.6.0-bin.tar.gz

2. Cluster environment

Master: 172.16.11.97 Slave1: 172.16.11.98 Slave2: 172.16.11.99

3. NetCat source configuration (conf/flume-netcat.conf)

vim conf/flume-netcat.conf

# Name the components on this agent agent.sources = r1 agent.sinks = k1 agent.channels = c1 # Source configuration agent.sources.r1.type = netcat agent.sources.r1.bind = 127.0.0.1 agent.sources.r1.port = 44444 # Sink configuration agent.sinks.k1.type = logger # Channel configuration agent.channels.c1.type = memory agent.channels.c1.capacity = 1000 agent.channels.c1.transactionCapacity = 100 # Bind source and sink to the channel agent.sources.r1.channels = c1 agent.sinks.k1.channel = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-netcat.conf --name=agent -Dflume.root.logger=INFO,console

telnet master 44444

4. Exec source configuration (conf/flume-exec.conf)

vim conf/flume-exec.conf

# Name the components on this agent agent.sources = r1 agent.sinks = k1 agent.channels = c1 # Source configuration agent.sources.r1.type = exec agent.sources.r1.command = tail -f /data/hadoop/flume/test.txt # Sink configuration agent.sinks.k1.type = logger # Channel configuration agent.channels.c1.type = memory agent.channels.c1.capacity = 1000 agent.channels.c1.transactionCapacity = 100 # Bind source and sink to the channel agent.sources.r1.channels = c1 agent.sinks.k1.channel = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-exec.conf --name=agent -Dflume.root.logger=INFO,console

while true; do echo `date` >> /data/hadoop/flume/test.txt ; sleep 1; done

5. Avro source configuration (conf/flume-avro.conf)

vim conf/flume-avro.conf

# Define a memory channel agent.channels.c1.type = memory # Define Avro source agent.sources.r1.type = avro agent.sources.r1.bind = 127.0.0.1 agent.sources.r1.port = 44444 agent.sources.r1.channels = c1 # Define HDFS sink agent.sinks.k1.type = hdfs agent.sinks.k1.channel = c1 agent.sinks.k1.hdfs.path = hdfs://master:9000/flume_data_pool agent.sinks.k1.hdfs.filePrefix = events- agent.sinks.k1.hdfs.fileType = DataStream agent.sinks.k1.hdfs.writeFormat = Text agent.sinks.k1.hdfs.rollSize = 0 agent.sinks.k1.hdfs.rollCount = 600000 agent.sinks.k1.hdfs.rollInterval = 600 # Bind components agent.sources = r1 agent.sinks = k1 agent.channels = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-avro.conf --name=agent -Dflume.root.logger=DEBUG,console

telnet master 44444

Big DataconfigurationCluster SetupData ingestionApache Flume
Practical DevOps Architecture
Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.