Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. In this post, we would discuss about flume installation.

The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.

Flume Installation Pre-requisites

a)  Flume installation requires Hadoop configurations available on the same node, i.e. a node should either be part of Hadoop cluster or should have Hadoop client configuration.

b)  Java must also be installed on the node. You can refer to the java installation steps mentioned in the blog here.

Flume Installation Steps

Follow the steps mentioned below to install and configure Flume on a linux node.

1. Download the latest version of Flume from here.
2. Extract the apache flume bundle file.

tar -xzvf apache-flume-1.5.0-bin.tar.gz

3. Add Flume to Path in user bash profile file.

sudo nano ~/.bash_profile
export FLUME_HOME="/opt/apache-flume-1.5.0-bin"
export PATH=$PATH:$FLUME_HOME/bin

Use source command to the update the values of environment variables.

source ~/.bash_profile

4. Navigate to flume home directory

cd /opt/apache-flume-1.5.0-bin

5. Copy sample template flume environment file to “flume-env.sh” file for putting some custom environment configurations.

cp conf/flume-env.sh.template conf/flume-env.sh

The flume-ng executable looks for a file named flume-env.sh in the Flume conf directory.

6. Open flume-env.sh and configure Java variables.

sudo nano conf/flume-env.sh

Add below lines to end of file:

JAVA_HOME=/usr/lib/jvm/jdk1.8.0
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

7. Create a new configuration file “flume.conf” in conf directory and add following configuration.

sudo nano conf/flume.conf
agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

# Define a source on agent and connect to channel memoryChannel.
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /opt/hadoop-2.6.0/logs/hadoop-hadoop-datanode-node1.log
agent.sources.tail-source.channels = memoryChannel

# Define a sink that outputs to logger.
agent.sinks.log-sink.channel = memoryChannel
agent.sinks.log-sink.type = logger

agent.sinks.hdfs-sink.channel = memoryChannel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = hdfs://node1:8020/flumedata/
agent.sinks.hdfs-sink.hdfs.fileType = DataStream

# Activate channel, source and sinks
agent.channels = memoryChannel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

Where –

agent.channels.memoryChannel.capacity” is the maximum number of events stored in the channel.

agent.sources.tail-source.command” is set to the log file that would be the source for data. Channel keeps monitoring for the changes in log and transfers the updated data to HDFS.

agent.sinks.hdfs-sink.hdfs.path” is the output path of HDFS specified with Namenode hostname/IP and port.

8. Start flume-ng agent

flume-ng agent -n agent -f conf/flume.conf -Dflume.root.logger=DEBUG,console

It will start reading the logs from source location and will put into HDFS at mentioned location.

9. Verify the imported data in new terminal screen:

Run below command to list out data in HDFS-

hadoop fs –ls /flumedata/
Share this:

5 thoughts on “Flume Installation

  1. Hello there! I know this is somewhat off topic but I was wondering
    if you knew where I could locate a captcha plugin for my comment form?
    I’m using the same blog platform as yours and I’m having trouble finding one?
    Thanks a lot!

  2. Just wish to say your article is as astonishing. The clarity in your post is just great and i can assume you are an expert on this subject. Well with your permission allow me to grab your RSS feed to keep up to date with forthcoming post. Thanks a million and please continue the gratifying work.

Leave a Reply to Khuram Dhanani Cancel reply

Your email address will not be published. Required fields are marked *