Flume Installation

Posted on August 5, 2014May 9, 2020 by ProTechSkills

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. In this post, we would discuss about flume installation.

The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

Flume Installation Pre-requisites

a) Flume installation requires Hadoop configurations available on the same node, i.e. a node should either be part of Hadoop cluster or should have Hadoop client configuration.

b) Java must also be installed on the node. You can refer to the java installation steps mentioned in the blog here.

Flume Installation Steps

Follow the steps mentioned below to install and configure Flume on a linux node.

1. Download the latest version of Flume from here.
2. Extract the apache flume bundle file.

tar -xzvf apache-flume-1.5.0-bin.tar.gz

3. Add Flume to Path in user bash profile file.

sudo nano ~/.bash_profile

export FLUME_HOME="/opt/apache-flume-1.5.0-bin"
export PATH=$PATH:$FLUME_HOME/bin

Use source command to the update the values of environment variables.

source ~/.bash_profile

4. Navigate to flume home directory

cd /opt/apache-flume-1.5.0-bin

5. Copy sample template flume environment file to “flume-env.sh” file for putting some custom environment configurations.

cp conf/flume-env.sh.template conf/flume-env.sh

The flume-ng executable looks for a file named flume-env.sh in the Flume conf directory.

6. Open flume-env.sh and configure Java variables.

sudo nano conf/flume-env.sh

Add below lines to end of file:

JAVA_HOME=/usr/lib/jvm/jdk1.8.0
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

7. Create a new configuration file “flume.conf” in conf directory and add following configuration.

sudo nano conf/flume.conf

agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

# Define a source on agent and connect to channel memoryChannel.
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /opt/hadoop-2.6.0/logs/hadoop-hadoop-datanode-node1.log
agent.sources.tail-source.channels = memoryChannel

# Define a sink that outputs to logger.
agent.sinks.log-sink.channel = memoryChannel
agent.sinks.log-sink.type = logger

agent.sinks.hdfs-sink.channel = memoryChannel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = hdfs://node1:8020/flumedata/
agent.sinks.hdfs-sink.hdfs.fileType = DataStream

# Activate channel, source and sinks
agent.channels = memoryChannel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

Where –

“agent.channels.memoryChannel.capacity” is the maximum number of events stored in the channel.

“agent.sources.tail-source.command” is set to the log file that would be the source for data. Channel keeps monitoring for the changes in log and transfers the updated data to HDFS.

“agent.sinks.hdfs-sink.hdfs.path” is the output path of HDFS specified with Namenode hostname/IP and port.

8. Start flume-ng agent

flume-ng agent -n agent -f conf/flume.conf -Dflume.root.logger=DEBUG,console

It will start reading the logs from source location and will put into HDFS at mentioned location.

9. Verify the imported data in new terminal screen:

Run below command to list out data in HDFS-

hadoop fs –ls /flumedata/

PROTECHSKILLS

Flume Installation Pre-requisites

Flume Installation Steps

5 thoughts on “Flume Installation”

Leave a Reply to Khuram Dhanani Cancel reply

Get in touch with us today! Let's Talk About Your Needs.