23 Haziran 2014 Pazartesi

Hadoop Does Not Stop - Missing Pid Files

In our cluster, we have experienced that if pid files of Hadoop daemons go missing, daemons will not stop. If daemons do not stop properly and you try to kill forcefully (kill -9), Hadoop can stay in an erroneous state. For example, if you kill datanode daemon, blocks can go missing.

Hadoop 1.0.3 running on a pseudo-distributed single node cluster is used on Ubuntu 12.04

1. Pid Files

Hadoop stores process ids in files under /tmp directory by default.
Files are named as:

  • hadoop-myuser-namenode.pid
  • hadoop-myuser-datanode.pid
  • hadoop-myuser-jobtracker.pid
  • hadoop-myuser-tasktracker.pid
  • hadoop-myuser-secondarynamenode.pid

These files store process ids as text.
If these files are deleted and you try to run Hadoop stop scripts, they cannot find Hadoop daemons and cannot stop them.

2. Create Pid Files

2.1. First, learn process ids of running Hadoop daemons by jps command or ps aux | grep hadoop. If jps is not on the PATH, you can try with full path.
> jps

25564 JobTracker
24896 NameNode
25168 DataNode
25878 TaskTracker
12726 Jps
25448 SecondaryNameNode

2.2. Find out the directory where the pid files should be stored. It is /tmp by default. However this path can be changed in $HADOOP_HOME/hadoop-env.sh.

# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

2.3. Go to pid directory and create missing files. Write corresponding process ids and save.
> vi hadoop-myuser-namenode.pid
> vi hadoop-myuser-datanode.pid
> vi hadoop-myuser-jobtracker.pid
> vi hadoop-myuser-tasktracker.pid
> vi hadoop-myuser-secondarynamenode.pid


2.4. Change permissions of these files so that the user running Hadoop daemons can read and write.
I personally use chown (given that file permissions are 664)
> chown -R myuser:myuser /tmp/hadoop*.pid

2.5. Then you can stop your deamons as explained in this post:
> $HADOOP_HOME/bin/stop-all.sh

stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode

You can check with jps or ps aux | grep hadoop
> jps

14237 Jps






Hiç yorum yok:

Yorum Gönder