Apache PIG is a data analytics framework in Hadoop ecosystem. It is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform consists of a textual language called Pig Latin. Pig internally execute its Hadoop jobs in MapReduce.

Pig’s infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs.

Pig gives a Command Line interface for querying the data. This command line interface is known as Grunt shell.

Pig installation is done in either of the following mode:

  1. Local Mode: Pig is installed on a single machine and it uses the local file system for storage. Local mode is used mainly for debugging and testing Pig Latin scripts. Specify ‘local’ as an argument to pig to start it in local mode. In ‘Local Mode’, the source data would be picked from the local directory in your computer system.
  2. MapReduce Mode: This is the default installation mode for Pig. It requires a running Hadoop cluster, with data stored in HDFS. The Pig Latin scripts are executed on the cluster in the form of a MapReduce program.

Follow the steps below to configure PIG in your environment:

  • Download the latest version of pig from here OR use wget command through terminal.
wget http://archive.apache.org/dist/pig/pig-0.15.0/pig-0.15.0.tar.gz
  • Extract the pig tar.gz file using command, where you have put the pig package.
    For example, you have pig installation package in “/opt” folder then first go to that directory and run command to extract the pig:
cd /opt
tar -xzvf pig-0.15.0.tar.gz
  • Specify environment variables in ~/.bash_profile file.

[NOTE: You can also define environment variables in ~/.bashrc or /etc/environment files.]

nano ~/.bash_profile

Add below lines in file:

export PIG_HOME=/opt/pig-0.15.0
export PATH=$PATH:$PIG_HOME/bin
  • Run below command to reflect environment variables in current session:
source ~/.bash_profile
  • Start Job History Server:-
mr-jobhistory-daemon.sh start historyserver
  • To test for successful pig installation, execute Pig help command.
pig -h
  • Start Pig in local mode.
pig -x local

It will open the grunt shell in local mode where you can run the pig commands.

  • Start pig in MapReduce mode:- To start Pig with Hadoop, specify the value of -x option as mapreduce. Since it is the default mode, so executing simple pig will also instantiate the Grunt in MapReduce mode.
pig –x mapreduce

OR

pig

It will open the grunt shell in MapReduce mode where you can run the pig commands.


Related Posts:

Data Types in PIG

Share this:

2 thoughts on “Apache Pig Installation

Leave a Reply

Your email address will not be published. Required fields are marked *