

Last night Paul & I went to a talk at Google’s Fremont (Seattle) campus on the future of cloud computing. It was a great talk about Google’s core technology assets - GFS, MapReduce and BigTable - and the open source re-implementations of those same technologies - Hadoop and Hbase. Aaron Kimball, UW computer science grad student and founder of Spinnaker Labs, gave a great presentation that was well attended by 100 or so folks who are interested in Internet scale computing.
It was a good, high level overview, which stimulated some good questions and discussion. Among the more interesting things I noted last night include:
*) The optimal profile for a server in a compute farm doesn’t require a lot of top speed CPUs nor should it be packed with RAM. The bottleneck is getting the data from the disk. A 2.4 Ghz 1U server with 4 CPU cores, 4-8 GB of RAM and 2 SATA disk drives (as fast and big as possible) is best.
*) There was some discussion about the performance delta between Hadoop and Google’s technologies. A great point was made that in the long term grand scheme of things, the delta is irrelevant. MapReduce represents a paradigm shift as signficant as the shift to client/server programming 2 decades ago. This approach is likely to be the norm for batch processing of very large data sets for 20 or 30 years to come.
*) The time and overhead of starting up a MapReduce job means that it really is inappropriate for processing datasets under 20 GB.
*) Cycles and bytes, not hardware, are the new commodity.
*) As more technology companies like Amazon and Google provide temporal, on demand access to large compute farms it has the effect of democratizing distributed computing.
*) Hadoop has stabilized significantly over the last year. Hbase needs another 6 months to reach the same level of maturity and stability.
*) There was some minor disagreement about whether virtualization is a foundational prerequisite for cloud computing. In an Amazon Web Services model where different external customers are commissioning and decommissioning servers often, virtualization is mandatory. In the case of internally consumed Internet scale compute farms like the ones Yahoo and Microsoft use exclusively for their own needs, virtualization isn’t a prerequisite.
*) Aaron characterized how SmugMug is building their business on Amazon Web Services. He made an interesting observation that SmugMug has effectively become a value added reseller of S3.
Thanks to Aaron for conducting this talk and to Google for hosting it. As you would expect for Google, there was a great spread of appetizers, beer and wine for the event. It was personally surprising and rewarding to see Aaron highlight blist in one his slides. Aaron was making a case for the web replacing the desktop and pointed to GMail, facebook, Google Apps, meebo, flickr and blist as examples. Hey, I can’t complain about keeping company with this group of innovators.














Not only was the food great, but I had a conversation afterwards with the owner of the catering company about John von Neumann!!!
Life is funny. At an event on cloud computing, even the caterer was conversant in computers!!!
Is it a Google thing or a Seattle thing or just a coincidence? What a well-educated, well-traveled, well-read group it was.
Left by Michael R. Wolf on May 1st, 2008