When you go to a fairly academic conference, it’s frowned upon to award a best in show. Yesterday I attended the Hadoop Summit and expected to hear all the cool stuff Yahoo and Powerset were doing with it. By far, however, the runaway winner for “best use of Hadoop” in my book goes to Facebook. Joydeep Sen Sarma and Asish Thusoo gave a talk on a project called Hive that helps the analysts and engineers at Facebook grok their clickstream and logfile data. Good geeks are, well, geeks. I know many of them. What really impressed me about these two gentlemen and the Hive project was just how business driven it is.
Joydeep started his talk by saying “We asked our current BI [business intelligence] users what tools they could and couldn’t use and they told us they know how to use SQL.” So often technologists forget about their audience. Hive was developed iteratively by a 2 or 3 person team (I think Jeff Hammerbacher was also involved) making it easy for business analysts to ask ad hoc questions of terabytes worth of logfile data by abstracting MapReduce into a SQL like dialect. Think of it as a data warehouse sitting on top of thousands of servers’ logfiles. Beneath the surface Hive leverages Hadoop and translates SQL-like imperatives into MapReduce jobs. It’s really a great use of technology. My highest compliments to the Facebook team for their work in this area.
I’d also like to commend IBM Research for their work on JAQL. It’s essentially a query interface into a JSON data store. It’s really intriguing. Conceptually I love JAQL and think it could be extremely useful. I have concerns about it coming from IBM Research and how open its open source license will be once it gets through IBM legal.
The Hadoop Summit was a great day long event attended by about 400 folks interested in Internet scale computing. It was a pleasant surprise to learn that folks outside of those whom I expected are doing really interesting and innovative work.














[…] http://blog.blist.com/index.php/2008/03/26/hadoop-summit-best-in-show/ […]
Left by Hadoop Summit: Facebook creates business intelligence tool called Hive on March 27th, 2008