Hadoop 1.0.3 is used and installed on Ubuntu 12.04 TLS.
Prerequisites
1. Java
For installing Hadoop, you must have Java 1.5+ (Java 5 or above). I will continue with Java 1.7.> sudo apt-get install openjdk-7-jdk
If you have different Java versions installed on your machine, you can select new Java 1.7 by typing:
> update-alternatives --config javaYou can then select your desired java version.
You can check current java version by typing:
> java -version java version "1.7.0_65" OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.12.04.2) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
2. SSH
Hadoop uses ssh to connect and manages its nodes. This is also valid for a single node setup.Openssh client is included in Ubuntu by default. However, you should also have openssh server installed.
> dpkg --get-selections | grep -v deinstall | grep openssh
If the list does not contain client or server, you can install missing one:
> sudo apt-get install openssh-client > sudo apt-get install openssh-server
Now, you should connect to localhost:
> ssh localhost
This will ask for your password. Hadoop needs to establish connection without entering password.
To enable this:
Generate public and private keys
> ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsaAuthorize the key by adding it to the list of authorized keys
> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
You should now connect without password
> ssh localhost
3. Disable IPv6
To disable IPv6 , open /etc/sysctl.conf and add the following lines to the end of the file:> vi /etc/sysctl.conf # disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
You should restart your machine for changes to take effect.
4. Dedicated User for Hadoop
Although it is not necessary, you can create a dedicated user for Hadoop. This will help you seperate Hadoop management from other applications.Create a user named hadoopuser and assign to group named hadoopgroup. You can get more detail about creating users and groups in this post.
> sudo groupadd hadoopgroup > sudo useradd hadoopuser -m > sudo usermod -aG hadoopgroup hadoopuser
Installation
1. Get Hadoop
You can get your desired Hadoop version from Apache download mirrors:I will download Hadoop 1.0.3
> cd /home/hadoopuser > wget http://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz
Extract hadoop package under home directory
> cd /home/hadoopuser > sudo tar -xzvf hadoop-1.0.3.tar.gz > sudo chown -R hadoopuser:hadoopgroup /home/hadoopuser/hadoop-.1.0.3
2. Set your environment variables
I will set HADOOP_HOME environment variable. You get more detail about setting environment variables in this post.I will make HADOOP_HOME accesible system-wide, not per user. To do this, first create a system_env.sh under /etc/profile.d folder.
> vi /etc/profile.d/system_env.sh export HADOOP_HOME=/home/hadoopuser/hadoop-1.0.3 export PATH=$PATH:$HADOOP_HOME/bin
3. Configuration
You can configure following configuration files as stated below for a basis.$HADOOP_HOME/conf/hadoop-env.sh
Set JAVA_HOME variable in this file:> vi /home/hadoopuser/hadoop-1.0.3/conf/hadoop-env.sh # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
$HADOOP_HOME/conf/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:10000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoopuser/tmp</value> <description>A base for other temporary directories.</description> </property> </configuration>
$HADOOP_HOME/conf/mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:10001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>
$HADOOP_HOME/conf/hdfs-site.xml
<configuration> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
4. Format HDFS
To start using your Hadoop cluster, we should format HDFS. This is done when cluster is first setup. If you format an existing HDFS, data stored on it will be removed.> hadoop namenode format
5. Start the Cluster
Hadoop provides several control scripts that enables you start/stop Hadoop daemonsTo start Hadoop cluster, run:
> /home/hadoopuser/hadoop-1.0.3/bin/start-all.sh
This will start all 5 daemons: NameNode, SecondaryNameNode, JobTracker, TaskTracker and DataNode. You can check whether these daemons are running by typing:
> jpsor
> ps aux | grep hadoop
Hiç yorum yok:
Yorum Gönder