MySpace figures out how to do massive data analysis on commodity systems
By Galen Gruman June 1, 2009 11:01 AM ET
InfoWorld - 2009 InfoWorld CTO 25 Awards
Aber Whitcomb CTO MySpace
It's hard sometimes to fathom the scale of the Web. Yet as CTO of News Corp.'s MySpace.com social site, Aber Whitcomb has to not only fathom it but build for it. In 2008, his
BI team built one of the largest data warehouses in the world, capturing between 7 and 10 billion events each daily generated by its 130 million users. Whitcomb's team did so using commodity hardware, giving it super-computer-like analytic capabilities for a fraction of the cost.
Running a Web business on commodity hardware is not a new idea -- both Amazon.com and Google do so, for example. Neither is using
MapReduce, the technology Google introduced in the early 2000s to break apart data sets for parallelized computing. (Google made MapReduce available to others in mid-2008.) But MySpace's implementation of Aster Data System's nCluster as its data warehouse extends MapReduce to handle rich in-database analytics on massive data volumes.
What MySpace gained, says Whitcomb, is a full understanding of what is happening online, immediately reflecting what people are doing on an hourly basis, both to give marketing efforts an edge and to identify customer issues before the spiral out of control.
[ Discover how the lessons learned from the
2009 InfoWorld CTO 25 Award winners can help your IT efforts. ]