Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. In this post, we would discuss about flume installation.
The use of Apache Flume is not only restricted to log data aggregation. Since data sources are customizable, Flume can be used to transport massive quantities of event data including but not limited to network traffic data, social-media-generated data, email messages and pretty much any data source possible.
Flume Installation Pre-requisites
a) Flume installation requires Hadoop configurations available on the same node, i.e. a node should either be part of Hadoop cluster or should have Hadoop client configuration.
b) Java must also be installed on the node. You can refer to the java installation steps mentioned in the blog here.
Flume Installation Steps
Follow the steps mentioned below to install and configure Flume on a linux node.
1. Download the latest version of Flume from here.
2. Extract the apache flume bundle file.
tar -xzvf apache-flume-1.5.0-bin.tar.gz
3. Add Flume to Path in user bash profile file.
sudo nano ~/.bash_profile
export FLUME_HOME="/opt/apache-flume-1.5.0-bin" export PATH=$PATH:$FLUME_HOME/bin
Use source command to the update the values of environment variables.
4. Navigate to flume home directory
5. Copy sample template flume environment file to “flume-env.sh” file for putting some custom environment configurations.
cp conf/flume-env.sh.template conf/flume-env.sh
The flume-ng executable looks for a file named flume-env.sh in the Flume conf directory.
6. Open flume-env.sh and configure Java variables.
sudo nano conf/flume-env.sh
Add below lines to end of file:
JAVA_HOME=/usr/lib/jvm/jdk1.8.0 JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"
7. Create a new configuration file “flume.conf” in conf directory and add following configuration.
sudo nano conf/flume.conf
agent.channels.memoryChannel.type = memory agent.channels.memoryChannel.capacity = 100 # Define a source on agent and connect to channel memoryChannel. agent.sources.tail-source.type = exec agent.sources.tail-source.command = tail -F /opt/hadoop-2.6.0/logs/hadoop-hadoop-datanode-node1.log agent.sources.tail-source.channels = memoryChannel # Define a sink that outputs to logger. agent.sinks.log-sink.channel = memoryChannel agent.sinks.log-sink.type = logger agent.sinks.hdfs-sink.channel = memoryChannel agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.path = hdfs://node1:8020/flumedata/ agent.sinks.hdfs-sink.hdfs.fileType = DataStream # Activate channel, source and sinks agent.channels = memoryChannel agent.sources = tail-source agent.sinks = log-sink hdfs-sink
“agent.channels.memoryChannel.capacity” is the maximum number of events stored in the channel.
“agent.sources.tail-source.command” is set to the log file that would be the source for data. Channel keeps monitoring for the changes in log and transfers the updated data to HDFS.
“agent.sinks.hdfs-sink.hdfs.path” is the output path of HDFS specified with Namenode hostname/IP and port.
8. Start flume-ng agent
flume-ng agent -n agent -f conf/flume.conf -Dflume.root.logger=DEBUG,console
It will start reading the logs from source location and will put into HDFS at mentioned location.
9. Verify the imported data in new terminal screen:
Run below command to list out data in HDFS-
hadoop fs –ls /flumedata/
5 thoughts on “Flume Installation”
Hello there! I know this is somewhat off topic but I was wondering
if you knew where I could locate a captcha plugin for my comment form?
I’m using the same blog platform as yours and I’m having trouble finding one?
Thanks a lot!
We are using Slider Captcha plugin to avoid spam comments. Easy to use and looks good as well. Hope this will work for you.
Excellent web site.
Just wish to say your article is as astonishing. The clarity in your post is just great and i can assume you are an expert on this subject. Well with your permission allow me to grab your RSS feed to keep up to date with forthcoming post. Thanks a million and please continue the gratifying work.
Im grateful for the article post.Much thanks again. Much obliged.