Go Back   CORTEX Forums > Best Practices > Subject Matter Expertise > Data Integration Forum > Data Integration News Feeds
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reading from MongoDB

This is a discussion on Reading from MongoDB within the Data Integration News Feeds forums, part of the Data Integration Forum category; Hi Folks, Now that we’re blogging again I thought I might as well continue to do so. Today we’re reading data from MongoDB with Pentaho Data Integration.* We haven’t had ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 3rd March 2011, 02:23 AM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,067
Latest News Headlines is on a distinguished road
Post Reading from MongoDB

Hi Folks,

Now that we’re blogging again I thought I might as well continue to do so.

Today we’re reading data from MongoDB with Pentaho Data Integration.* We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet.* However, it is surprisingly simple to do with the “User Defined Java Class” step.

For the following sample to work you need to be on a recent 4.2.0-M1 build.* Get it from here.

Then download mongo-2.4.jar and put it in the libext/ folder of your PDI/Kettle distribution.

Then you can read from a collection with the following “User Defined Java Class” code:

import java.math.*;import java.util.*;import java.util.Map.Entry;import com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;import com.mongodb.DBCursor;private Mongo m;private DB db;private DBCollection coll;private int outputRowSize = 0;public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException{ DBCursor cur = coll.find(); if (first) { first=false; outputRowSize = data.outputRowMeta.size(); } while(cur.hasNext() && !isStopped()) { String json = cur.next().toString(); Object[] row = createOutputRow(new Object[0], outputRowSize); int index=0; row[index++] = json; // putRow will send the row on to the default output hop. // putRow(data.outputRowMeta, row); } setOutputDone(); return false;}public boolean init(StepMetaInterface stepMetaInterface, StepDataInterface stepDataInterface){ try { m = new Mongo("127.0.0.1", 27017); db = m.getDB( "test" ); coll = db.getCollection("testCollection"); return parent.initImpl(stepMetaInterface, stepDataInterface); } catch(Exception e) { logError("Error connecting to MongoDB: ", e); return false; }}You can simply paste this code into a new UDJC step dialog. Change the parts in the init() method to server your needs. This code reads all the data from a collection in a Mongo database.* The output of this step is a set of rows contain each one JSON string. So make sure to specify one JSON String field as output of your step.* These JSON structures can be parsed with the new “JSON Input” step and then you can do whatever you want with it.

Please let us know what you think of this and whether or not you would like to see support for writing to MongoDB and/or dedicated steps for it.* I’m sorry to say I have no idea of the popularity of these new NoSQL databases.

Until next time,

Matt

P.S. To install and run MongoDB on your Ubuntu 10.10 machine, do this:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10sudo apt-get updatesudo apt-get install mongodb

More from Matt Casters on Data Integration (Pentaho) Blog...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Economics Reading for the Kids admin Prediction Markets News Feeds 0 23rd December 2010 12:48 AM
Further reading on stress tests Latest News Headlines Latest News 0 24th July 2010 02:52 PM
MongoDB 30,000 downloads a month? Tony Bain Innovations in Data Management 0 26th April 2010 08:56 PM
Speed reading: How I started reading 3-4 times faster in just a shorttime admin Prediction Markets News Feeds 0 19th November 2009 02:37 PM
Recommended Reading admin Business Intelligence 101 5 2nd September 2009 09:06 AM


All times are GMT +11. The time now is 05:54 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO