26 Haziran 2014 Perşembe

Pig 0.11.1 Installation on Ubuntu

I will try to outline basic Pig 0.11.1 installation.

This was tested on Ubuntu 12.04 with Java 1.7 installed. Hadoop 1.0.3 is used on the same machine as Pig.

1. Download Pig

You can download Pig from http://www.apache.org/dyn/closer.cgi/pig. In my location, download link is:

> cd /home/myuser/hadoop/
> wget http://apache.bilkent.edu.tr/pig/pig-0.11.1/pig-0.11.1.tar.gz

After downloading Pig, extract it

> tar -xzvf pig-0.11.1.tar.gz

This will extract the files to /home/myuser/hadoop/pig-0.11.1

2. Set Environment Variables

If not set, set the following environment variables:
Recommended way is to put these variables in shell script under /etc/profile.d. Create env_variables.sh and write:

export JAVA_HOME=/path/to/java/home
export HADOOP_HOME=/path/to/hadoop/home
export PIG_HOME=/path/to/pig/home

To run pig command from anywhere, we must add it to PATH variable. Append following to env_variables.sh. If java and hadoop are also not on the PATH variable, add them also.

export PATH=$PATH:$PIG_HOME/bin 

3. Hadoop Cluster Information

If you have setupped $HADOOP_HOME environment variable, it will find namenode and jobtracker addresses from Hadoop's configuration files (core-site.xml and mapred-site.xml).

If it could not find your cluster you can add configuration directory of Hadoop to Pig's classpath

4. Map-Reduce Mode

Pig supports local and map-reduce mode. We will try map-reduce mode.
You can run an existing pig script with:

> pig myscript.pig

You can get your script with parameters substituted (dry-run) with:
> pig -r myscript.pig

You can enter to grant shell and run your Pig statement there.
> pig
2014-06-26 23:40:31,977 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2014-06-26 23:40:31,978 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/myuser/pig_1403815231975.log
2014-06-26 23:40:31,992 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/myuser/.pigbootup not found
2014-06-26 23:40:32,124 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:10000
2014-06-26 23:40:32,948 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:10001

grunt> logs = LOAD '/data/logs' using PigStorage() as (id:int, log:chararray);
grunt> DUMP logs;

