| |
| ||||||
HPCC vs Hadoop at a glanceThis is a discussion on HPCC vs Hadoop at a glance within the DWH Tip Feeds forums, part of the Data Warehousing Tips and Techniques category; Yesterday I noticed this tweet by Andrei Savu : . This prompted me to read the related GigaOM article and then check out the HPCC Systems website . If you're ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,067
![]() | Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems website.If you're too lazy to read the article or visit that website: HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems. The platform is now Open Source!HPCC Systems compares itself to Hadoop, which I think is completely justified in terms of functionality. Its product originated as a homegrown solution of LexisNexis Risk Solutions allowing its customers (banks, insurance companies, law enforcment and federal government) to quickly analyze billions of records, and as such it has been in use for a decade or so. It is now open sourced, and I already heard an announcement that Pentaho is its major Business Intelligence Partner. Based on the limited information a made a quick analysis, which I emailed to the HPCC Systems CTO, Armando Escalante. My friend Jos van Dongen said it was a good analysis and told me I should post it. Now, I don't really have time to make a nice blog post out of it, but I figured it can't hurt to just repeat what I said in my emails. So here goes: Just going by the documentation, I see a two real unique selling points in HPCC Systems as compared to Hadoop:
(there may be many more benefits, but these are just the ones I could clearly distill from the press release and the website) Especially for Business Intelligence, Roxie maybe a big thing. If real-time Big Data queries could be integrated with Business Intelligence OLAP and reporting tools, then this is certainly a big thing. I can't disclose the details but I have trustworthy information that integration with Pentaho's Analysis Engine, the Mondrian ROLAP engine is underway and will be available as an Enterprise feature. A few things that look different but which may not matter too much when looking at HPCC and Hadoop from a distance:
The only thing I can think of where HPCC looks like it has a disadvantage as compared to Hadoop is adoption rate and licensing. I hope these will prove not to be significant hurdles for HPCC, but I think that these might be bigger problems then they seem. Especially the AGPL licensing seems problematic to me. The AGPL is not well regarded by anyone I know - not in the open source world. The general idea seems to be that even more than plain GPL3 it restricts how the software may be used. If the goal of open sourcing HPCC is to gain mindshare and a developer community (something that hadoop has done and is doing extremely well) then a more permissive license is really the way to go. If you look at products like MySQL but also Pentaho - they are both very strongly corporately led products. The have a good number of users, but few contributions from outside the company, and this is probably due to a combination of GPL licensing and the additional requirement for handing over the copyright of any contributions to the company. Hence these products don't really benefit from an open source development model (or at least not as much as they could). For these companies, Open source may help initially to gain a lot of users, but those are in majority the users that just want a free ride: conversion rates to enterprise edition customers are quite low. It might be enough to make a decent buck, but eventually you'll hit a cap on how far you can grow. I'm not saying this is bad - you only need to grow as much as you have to, but it is something to be aware of. Contrast this to Hadoop. The have a Apache 2.0 permissive license, and this results in many individuals but also companies contributing to the project. And there are still companies like Cloudera that manage to make a good living off of the services around their distribution of Hadoop. You don't lose the ability to develop add-ons either with this model - apache 2.0 allows all that. The difference with GPL (and AGPL) of course is that it allows this also to other users and companies. So the trick to stay on top in this model is to simply offer the best product (as opposed to being the sole holder of the copyright to he code). Anyway - that is it for now - I hope this is helpful. More from Roland Bouman's Blog ... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Hadoop: What Is It Good For? Absolutely . . . Something | Latest News Headlines | Forrester | 0 | 7th June 2011 03:01 AM |
| Hadoop: Is it Soup Yet? | Latest News Headlines | Forrester | 0 | 4th June 2011 01:30 AM |
| The Emergence of Hadoop | Latest News Headlines | Other International Vendors | 0 | 8th February 2011 04:59 AM |
| More Hadoop in New York City | Latest News Headlines | Open Source News and Opinion | 0 | 28th January 2011 03:09 AM |
| CBA results at a glance | Latest News Headlines | 2009 Q3 News Headlines | 0 | 13th August 2009 08:35 AM |
| | |
| | |