Optimizing the Performance
and Scalability of MapReduce
for Multicore
Haibo Chen
Parallel Processing Institute
Fudan University
http://ppi.fudan.edu.cn/haibo_chen
Multicore is commercially prevalent recently
• Eight cores and Twelve cores on a chip are
common,
• Hundreds of cores on a single chip will appear in
near feature
!
Multicore
1X 4X 8X 64X
Multicore: Challenges
How to fully harness the likely abundant
cores?
– Data parallel applications fit well with multi-core
system
• processes data in private cache of cores
• shares data within cores by main memory
• Issue#1: easy to use
– Average programmers can use
• Issues#2: easy to scale
– Can easily scale to a number of cores/nodes
!!"!#"#$ %%&'$()*+,$-,./012.34$ 3
Data-parallel applications emerge and rapidly
increase in past 10 years
• Google processes about 24 petabytes of data per
day in 2008
• The movie tar” is takes over 1 petabyte of local
storage for 3D rendering *
• …
Data-Parallel Application
*
!""#$%%&&&'()*+,-."(+)/-.).01-1)"'2+-%)1&341""1,3%
.5.".,67.".6#,+2133()0/8998:;;</8'!"-4==
Data-parallel Programming Model
MapReduce: a simple programming model for
data-parallel applications from