Google Architecture
Saturday, November 22, 2008 at 10:01AM
Todd Hoff in BigTable, C, Cluster File System, Example, Geo-distributed Clusters, Java,
Linux, Map Reduce, Python
Update 2: Sorting 1 PB with MapReduce. PB is not peanut-butter-and-
jelly misspelled. It's 1 petabyte or 1000 terabytes or 1,000,000 gigabytes.
It took six hours and two minutes to sort 1PB (10 trillion 100-byte
records) on 4,000 computers and the results were replicated thrice on
48,000 disks.
Update: Greg Linden points to a new Google article MapReduce:
simplified data processing on large clusters. Some interesting stats:
100k MapReduce jobs are executed each day; more than 20 petabytes of
data are processed per day; more than 10k MapReduce programs have
been implemented; machines are dual processor with gigabit ethernet and
4-8 GB of memory.
Google is the King of scalability. Everyone knows Google for their large,
sophisticated, and fast searching, but they don't just shine in search. Their
platform approach to building scalable applications allows them to roll out
internet scale applications at an alarmingly high competition crushing
rate. Their goal is always to build a higher performing higher scaling
infrastructure to support their products. How do they do that?
Information Sources
1. Video: Building Large Systems at Google
2. Google Lab: The Google File System
3. Google Lab: MapReduce: Simplified Data Processing on
http://highscalability.com/google-architecture
http://weibo.com/developerworks 2012-11-11 整理
第 1/9页