11 Mayıs 2014 Pazar

Installation of Hive

Hive gives you great capability when it is to query data on HDFS. Its sql syntax is very similar to MySQL and you can start running your queries in a very short time. Hive gets your sql and creates map reduce jobs out of it.

This installation is done on Centos machine using Hive. The machine Hive is installed contains Hadoop binary and is used as Hadoop client.

1. Download Hive
You can get hive http://www.apache.org/dyn/closer.cgi/hive/. I will continue with hive-0.12.0, in my country link is http://ftp.itu.edu.tr/Mirror/Apache/hive/hive-0.12.0/hive-0.12.0.tar.gz

> cd /your/desired/path
> wget  http://ftp.itu.edu.tr/Mirror/Apache/hive/hive-0.12.0/hive-0.12.0.tar.gz

After downloading hive, extract it.

> tar -xzvf hive-0.12.0.tar.gz

2. Setup environment variables
If not set, set following environment variables:
Recommended way is to put these variables in shell script under /etc/profile.d. Create env_variables.sh and write:

export JAVA_HOME=/path/to/java/home
export HADOOP_HOME=/path/to/hadoop/home
export HIVE_HOME=/path/to/hive/home

To run hive command from anywhere, we must add it to PATH variable. Append following to env_variables.sh. If java and hadoop are also not on the PATH variable, add them also.

export PATH=$PATH:$HIVE_HOME/bin

3. If you have installed Hive on a Hadoop node or on a Hadoop client machine, it will find namenode and jobtracker addresses from Hadoop's configuration files (core-site.xml and mapred-site.xml).

4. If you run Hive with its defaults, it will store its metadata related to Hive tables into a local Derby database.
In this case, where you run Hive gets important; because it creates this database under the directory you run Hive. This has 2 complications:

  • Only one connection is allowed. Others cannot run Hive jobs under that directory
  • If you run Hive in another location, you cannot see previous table definitions.

To overcome this, we must create metastore database on a database server. I will use MySQL.

5. Create a schme name hive_metastore and set its character set as "latin1 - default collation".

sql> CREATE SCHEMA hive_metastore DEFAULT CHARACTER SET latin1;

6. Create a new configuration file under "hive-0.12.0/conf" named hive-site.xml and enter your database connection properties.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://server_address/hive_metastore?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>myusername</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
  <description>password to use against metastore database</description>
</property>
</configuration>
6. For Hive to connect MySQL server you must place Mysql JDBC driver under "hive-0.12.0/lib". You can download from http://dev.mysql.com/downloads/connector/j/

7. Now you can type hive command and run:
hive> SHOW TABLES;
This will print existing tables.





Hiç yorum yok:

Yorum Gönder