Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > Innovations in Data Management
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

HadoopDB discussion with Daniel Abadi

This is a discussion on HadoopDB discussion with Daniel Abadi within the Innovations in Data Management forums, part of the CORTEX Blogs category; I spoke to Daniel Abadi this morning about his HadoopDB announcement that came out a couple of days back.* I am sure this has been a busy time for Daniel ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 23rd July 2009, 11:17 AM   #1
Senior Member
 
Join Date: Jun 2009
Posts: 71
Tony Bain is on a distinguished road
Default HadoopDB discussion with Daniel Abadi


I spoke to Daniel Abadi this morning about his HadoopDB announcement that came out a couple of days back.* I am sure this has been a busy time for Daniel and his team over in Yale as HadoopDB has been getting a lot of interest which I am sure will continue to build.

Some notes from our discussion:
  • HadoopDB is primarily focused on high scalability and the required availability at scale.* Daniel questions current MPP’s ability to truly scale past 100 nodes whereas Hadoop has real examples on 3000+ nodes.
  • HadoopDB like many MPP analytical database platforms uses shared nothing relational database as processing units. HadoopDB uses Postgres.* Unlike other MPP databases, HadoopDB uses Hadoop as the distributed mechanism.
  • I am adlibbing here, but I understand that Daniel doesn’t dispute DeWitt & Stonebrakers (and his) paper which claims Map/Reduce underperforms when compared to current MPP DBMS.* HadoopDB however is focused on massive scale, hundreds or thousands of nodes.* Currently the largest MPP database we know of is 96 nodes.
  • Early benchmarking shows HadoopDB outperforms Hadoop but is slower than current MPP databases under normal circumstances.* However when simulating node failure mid query HadoopDB outperformed current MPP databases significantly.
  • The higher the scalability the higher the possibility of node failure mid query.* Very large Hadoop deployments may experience at least 1 node failure per query (job).
  • HadoopDB is usable today, but should not be considered an “out of the box” solution.* HadoopDB is an outcome from a database research initiative, not a commercial venture.* Anyone planning to use HapoopDB will require the appropriate systems & development skills to effectively deploy.
HadoopDB is an innovative approach to the scalability challenges that continue to push the architecture of the modern database forward.

Related articles by Zemanta




Get More from the original blog...
Tony Bain is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +11. The time now is 06:28 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO