23 Haziran 2014 Pazartesi

Hadoop - Map and Reduce Slots

We have seen how number of maps and reduce tasks are calculated here. I will continue the topic with configuration of avaiable map and reduce slots in the cluster. Hadoop uses these slots to run map and reduce tasks and these slots are fixed with certain properties.

Hadoop 1.0.3 is used on Ubuntu 12.04 machine.

1. Map Slots

Number of map slots can be defined in mapred-site.xml under $HADOOP_HOME/conf directory on tasktracker nodes.
This value is 2 by default as defined in mapred-default.xml:

  mapred.tasktracker.map.tasks.maximum
  2
  The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  


To change its value, open mapred-site.xml and set:

  mapred.tasktracker.map.tasks.maximum
  7


You should configure all your tasktracker nodes this way. After setting these properties, you should restart your tasktracker daemons. You can see new values on jobtracker web interface at http://<jobtracker-server>:50030

2. Reduce Slots

Like map slots, number of reduce slots can be defined in mapred-site.xml under $HADOOP_HOME/conf directory on tasktracker nodes.
This value is 2 by default as defined in mapred-default.xml:

  mapred.tasktracker.reduce.tasks.maximum
  2
  The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  


To change its value, open mapred-site.xml and set:

  mapred.tasktracker.reduce.tasks.maximum
  7


You should configure all your tasktracker nodes this way.

3. How to decide the number of map/reduce slots?

To decide the number of map slots, you should investigate number of processors on the machine.
As a general practice, you can allow 2 tasks per CPU core. If your tasktracker machine have 8 cores, you can have 16 task slots counting both map and reduce tasks.
You can identify number of cpu cores by:
> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              9
CPU MHz:               1600.000
BogoMIPS:              6784.81
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

I have 8 cpu cores stated by CPU(s) line. I will configure mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum as 8-1 = 7, since tasktracker and datanode daemons will consume 1 slot resource.


Hiç yorum yok:

Yorum Gönder