Go Back   CORTEX Forums > Vendors and Service Provders > Open Source Analytics > Open Source News and Opinion
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Pentaho and Hadoop: Big Data + Big ETL + Big BI = Big Deal

This is a discussion on Pentaho and Hadoop: Big Data + Big ETL + Big BI = Big Deal within the Open Source News and Opinion forums, part of the Open Source Analytics category; Earlier today Pentaho announced support for Hadoop – read about it here . There are many reasons we are doing this: Hadoop lacks graphical design tools – Pentaho provides plug-able ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 19th May 2010, 07:20 PM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,085
Latest News Headlines is on a distinguished road
Post Pentaho and Hadoop: Big Data + Big ETL + Big BI = Big Deal

Earlier today Pentaho announced support for Hadoop – read about it here.

There are many reasons we are doing this:

  • Hadoop lacks graphical design tools – Pentaho provides plug-able design tools.
  • Hadoop is Java -* Pentaho’s technologies are Java.
  • Hadoop needs embedded ETL – Pentaho Data Integration is easy to embed.
  • Pentaho’s open source model enables us to provide technology with great price/performance.
  • Hadoop lacks visualization tools – Pentaho has those
  • Pentaho provides a full suite of ETL, Reporting, Dashboards, Slice ‘n’ Dice Analysis, and Predictive Analytics/Machine Learning
The thing is, taking all of these in combination, Pentaho is the only technology that satisfies all of these points.

You can see a few of the upcoming integration points in the demo video. The ones shown in the video are only a few of the many integration points we are going to deliver.

Most recently I’ve been working on integrating the Pentaho suite with the Hive database. This enables desktop and web-based reporting, integration with the Pentaho BI platform components, and integration with Pentaho Data Integration. Between these use cases, hundreds of different components and transformation steps can be combined in thousands of different ways with Hive data. I had to make some modifications to the Hive JDBC driver and we’ll be working with the Hive community to get these changes contributed. These changes are the minimal changes required to get some of the Pentaho technologies working with Hive. Currently the changes are in a local branch of the Hive codebase. More specifically they are a ‘SHort-term Rapid-Iteration Minimal Patch’ fork – a SHRIMP Fork.

Technically, I think the most interesting Hive-related feature so far is the ability to call an ETL process within a SQL statement (as a Hive UDF). This enables all kinds of complex processing and data manipulation within a Hive SQL statement.

There are many more Hadoop-related ETL and BI features and tools to come from Pentaho.* It’s gonna be a big summer.




More from James Dixon’s Blog ...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Book Review : Pentaho 3.2 Data Integration Latest News Headlines Data Integration News Feeds 0 7th May 2010 09:23 AM
Comment on Pentaho listed as a top 10 open source business application by Pentaho lis Latest News Headlines Open Source News and Opinion 0 10th April 2010 06:30 AM
Pentaho Data Integration: Javascript Step Performance Latest News Headlines DWH Tip Feeds 0 25th November 2009 11:03 AM
Hadoop ported to R (and it's trivial) admin Analytic News Feeds 0 18th November 2009 12:07 PM
CDC gets nod in data deal Latest News Headlines 2009 Q3 News Headlines 0 18th August 2009 03:07 AM


All times are GMT +11. The time now is 07:02 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO