Go Back   CORTEX Forums > Best Practices > Subject Matter Expertise > Data Integration Forum > Data Integration News Feeds
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Data Cleaner 2

This is a discussion on Data Cleaner 2 within the Data Integration News Feeds forums, part of the Data Integration Forum category; Dear Kettle friends, Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner . DataCleaner is ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 8th March 2011, 02:47 AM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,067
Latest News Headlines is on a distinguished road
Post Data Cleaner 2

Dear Kettle friends,

Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner.

DataCleaner is an open source data quality tool released (like Kettle) under the LGPL license.* It is essentially to blame for the lack of a profiling tool inside of Kettle.* That is because having DataCleaner available to our users was enough to push the priority of having our own data profiling tool far enough down.

Kasper worked on DataCleaner pretty much in his spare time in the past.* Now that Human Inference took over the project I was expecting more frequent updates and that’s what we got indeed.* Not only did version 2 come out recently, we also got versions 2.0.1 a few weeks back and today version 2.0.2.* All this indicates a fast-paced project.

DataCleaner was mentioned a few times in books about Pentaho software.* For example it was referenced in Pentaho Solutions as well as in Pentaho Kettle Solutions (chapter 6 - Data profiling).* This was done to allow folks that need to do a bit data profiling before they start with the data integration work, to get the job done.

So what’s happening with DataCleaner besides Kasper going all-out now that he works full time on the product? What purpose does it serve?

Let’s start with my favorite option: the “Quick Analysis” option.* You point it to a database table (or CSV file) and you let it fly.* Here’s the sort of thing it comes back with:



In essence it will give you most of what you need to know about the quality of your data before getting into the data integration work.* It’s offers a really nice and rich user interface.* In the previous screen shot you can for example click on the green arrows to display sample rows with that particular data characteristic.

Because not all profiling jobs are as easy as this one, DataCleaner has been featuring more “data integration” like features in version 2.0.* These will for example allow you to Filter certain rows based on a wide pallet of DQ oriented criteria such as dictionaries, JavaScript, Rules and much more.* The next screen shot shows the use of a filter to limit the number of analyzed rows:



Don’t expect any Kettle like drag&drop like data integration. This is specifically targeted towards on-line data quality and data profiling more specifically. However, that’s what the tool claims to be good at and it is good at that.

There’s obviously a lot more to tell about DataCleaner but I hope that this little blog post will make you at least interested and makes you want to give it a go yourself.

Since DataCleaner and Kettle are license-compatible I’ll be looking at creating a plugin to integrate DataCleaner into Spoon … once I find a bit of time to do so or if someone volunteers to jump right in.* Kasper wasn’t quite convinced it would be easy to do but not all things in life have to be easy.

You can download DataCleaner over here so download it now and make sure to let them know what you think of it.

Until next time,

Matt



More from Matt Casters on Data Integration (Pentaho) Blog...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Aster Data and Tableau Software Partner to Deliver Big Data Analytics and Data Visual Latest News Headlines Other International Vendors 0 1st December 2010 01:22 AM
Interactive Data Wins Three Awards from Inside Market Data and Inside Reference Data Latest News Headlines 2010 Q2 News Headlines 0 3rd June 2010 01:46 AM
DATA DATA DATA ANALYSIS, STRONG SQL, Technical Consulting admin 2009 Job Archive 0 21st October 2009 06:56 PM


All times are GMT +11. The time now is 05:55 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO