This was tested on Ubuntu 12.04 with Java 1.7 installed. Hadoop 1.0.3 is used on the same machine as Pig.
1. Download Pig
You can download Pig from http://www.apache.org/dyn/closer.cgi/pig. In my location, download link is:
http://apache.bilkent.edu.tr/pig/pig-0.11.1/pig-0.11.1.tar.gz
> cd /home/myuser/hadoop/ > wget http://apache.bilkent.edu.tr/pig/pig-0.11.1/pig-0.11.1.tar.gz
After downloading Pig, extract it
> tar -xzvf pig-0.11.1.tar.gz
This will extract the files to /home/myuser/hadoop/pig-0.11.1
2. Set Environment Variables
If not set, set the following environment variables:Recommended way is to put these variables in shell script under /etc/profile.d. Create env_variables.sh and write:
export JAVA_HOME=/path/to/java/home export HADOOP_HOME=/path/to/hadoop/home export PIG_HOME=/path/to/pig/home
To run pig command from anywhere, we must add it to PATH variable. Append following to env_variables.sh. If java and hadoop are also not on the PATH variable, add them also.
export PATH=$PATH:$PIG_HOME/bin
3. Hadoop Cluster Information
If you have setupped $HADOOP_HOME environment variable, it will find namenode and jobtracker addresses from Hadoop's configuration files (core-site.xml and mapred-site.xml).If it could not find your cluster you can add configuration directory of Hadoop to Pig's classpath
export PIG_CLASSPATH=$HADOOP_HOME/conf/
4. Map-Reduce Mode
Pig supports local and map-reduce mode. We will try map-reduce mode.You can run an existing pig script with:
> pig myscript.pig
You can get your script with parameters substituted (dry-run) with:
> pig -r myscript.pig
You can enter to grant shell and run your Pig statement there.
> pig 2014-06-26 23:40:31,977 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53 2014-06-26 23:40:31,978 [main] INFO org.apache.pig.Main - Logging error messages to: /home/myuser/pig_1403815231975.log 2014-06-26 23:40:31,992 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/myuser/.pigbootup not found 2014-06-26 23:40:32,124 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:10000 2014-06-26 23:40:32,948 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:10001 grunt> logs = LOAD '/data/logs' using PigStorage() as (id:int, log:chararray); grunt> DUMP logs;
Hiç yorum yok:
Yorum Gönder