Hadoop 1.0.3 is used on Ubuntu 12.04 machine.
1. Map Slots
Number of map slots can be defined in mapred-site.xml under $HADOOP_HOME/conf directory on tasktracker nodes.
This value is 2 by default as defined in mapred-default.xml:
To change its value, open mapred-site.xml and set:
You should configure all your tasktracker nodes this way. After setting these properties, you should restart your tasktracker daemons. You can see new values on jobtracker web interface at http://<jobtracker-server>:50030
mapred.tasktracker.map.tasks.maximum 2 The maximum number of map tasks that will be run simultaneously by a task tracker.
To change its value, open mapred-site.xml and set:
mapred.tasktracker.map.tasks.maximum 7
You should configure all your tasktracker nodes this way. After setting these properties, you should restart your tasktracker daemons. You can see new values on jobtracker web interface at http://<jobtracker-server>:50030
2. Reduce Slots
Like map slots, number of reduce slots can be defined in mapred-site.xml under $HADOOP_HOME/conf directory on tasktracker nodes.
This value is 2 by default as defined in mapred-default.xml:
To change its value, open mapred-site.xml and set:
You should configure all your tasktracker nodes this way.
mapred.tasktracker.reduce.tasks.maximum 2 The maximum number of reduce tasks that will be run simultaneously by a task tracker.
To change its value, open mapred-site.xml and set:
mapred.tasktracker.reduce.tasks.maximum 7
You should configure all your tasktracker nodes this way.
3. How to decide the number of map/reduce slots?
To decide the number of map slots, you should investigate number of processors on the machine.
As a general practice, you can allow 2 tasks per CPU core. If your tasktracker machine have 8 cores, you can have 16 task slots counting both map and reduce tasks.
You can identify number of cpu cores by:
I have 8 cpu cores stated by CPU(s) line. I will configure mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum as 8-1 = 7, since tasktracker and datanode daemons will consume 1 slot resource.
As a general practice, you can allow 2 tasks per CPU core. If your tasktracker machine have 8 cores, you can have 16 task slots counting both map and reduce tasks.
You can identify number of cpu cores by:
> lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Stepping: 9 CPU MHz: 1600.000 BogoMIPS: 6784.81 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7
I have 8 cpu cores stated by CPU(s) line. I will configure mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum as 8-1 = 7, since tasktracker and datanode daemons will consume 1 slot resource.
Hiç yorum yok:
Yorum Gönder