Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > Innovations in Data Management
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Realtime Data Pipelines

This is a discussion on Realtime Data Pipelines within the Innovations in Data Management forums, part of the CORTEX Blogs category; In life there are really two major types of data analytics. Firstly, we don’t know what we want to know – so we need analytics to tell us what is ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 1st August 2011, 01:52 PM   #1
Senior Member
 
Join Date: Jun 2009
Posts: 71
Tony Bain is on a distinguished road
Post Realtime Data Pipelines

In life there are really two major types of data analytics. Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting. This is broadly called discovery. Secondly, we already know...

In life there are really two major types of data analytics. *Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting. *This is broadly called discovery. *Secondly, we already know what we want to know – we just need analytics to tell us this information, often repeatedly and as quickly as possible. *This is called anything from reporting or dashboarding through more general data transformation and so on.

Typically we are using the same techniques to achieve this. *We shove lots of data into a repository of some from (SQL, MPP SQL, NoSQL, HDFS etc) then run queries/ jobs/ processes across that data to retrieve the information we care about. *

Now this makes sense for data discovery. *If we don’t know what we want to know, having lots of data in a big pile that we can slice and dice in interesting ways is good. * But when we already know what we want to know, continued batch based processing across mounds of data to produce “updated” results of data, that is often changing in constantly, can be highly inefficient.

Enter Realtime Data Pipelines. *Data is fed in one end, results are computed in real time as data flows down the pipeline and come out the other end whenever relevant changes we care about occur. *Data Pipelines / workflow / streams are becoming much more relevant for processing massive amounts of data with real time results. *Moving relevant forms of analytics out of large repositories into the actual data flow from producer to consumer, I believe, will be a fundamental step forward in big data management.

There are some emerging technologies looking to address this, more details to follow.

*





Get More from the original blog...
Tony Bain is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
ezViz RealTime Latest News Headlines 2010 Q2 News Headlines 0 29th April 2010 06:08 AM
Twitter makes the realtime web look more like the old web Latest News Headlines Open Source News and Opinion 0 25th November 2009 11:03 AM
SQLstream powers Firefox 3.5 realtime downloads monitor Latest News Headlines Open Source News and Opinion 0 25th November 2009 11:03 AM
IBM’s Steve Mills on RealTime admin IBM and Cognos Forum 0 9th November 2009 08:23 AM
Aura - Realtime P&L, Performance, Risk & Trade Monitor Latest News Headlines 2009 Q4 News Headlines 0 2nd October 2009 10:48 AM


All times are GMT +11. The time now is 06:25 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO