| |
| ||||||
Data Cleaner 2This is a discussion on Data Cleaner 2 within the Data Integration News Feeds forums, part of the Data Integration Forum category; Dear Kettle friends, Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner . DataCleaner is ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,067
![]() | Dear Kettle friends, Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner. Kasper worked on DataCleaner pretty much in his spare time in the past.* Now that Human Inference took over the project I was expecting more frequent updates and that’s what we got indeed.* Not only did version 2 come out recently, we also got versions 2.0.1 a few weeks back and today version 2.0.2.* All this indicates a fast-paced project. DataCleaner was mentioned a few times in books about Pentaho software.* For example it was referenced in Pentaho Solutions as well as in Pentaho Kettle Solutions (chapter 6 - Data profiling).* This was done to allow folks that need to do a bit data profiling before they start with the data integration work, to get the job done. So what’s happening with DataCleaner besides Kasper going all-out now that he works full time on the product? What purpose does it serve? Let’s start with my favorite option: the “Quick Analysis” option.* You point it to a database table (or CSV file) and you let it fly.* Here’s the sort of thing it comes back with: ![]() In essence it will give you most of what you need to know about the quality of your data before getting into the data integration work.* It’s offers a really nice and rich user interface.* In the previous screen shot you can for example click on the green arrows to display sample rows with that particular data characteristic. Because not all profiling jobs are as easy as this one, DataCleaner has been featuring more “data integration” like features in version 2.0.* These will for example allow you to Filter certain rows based on a wide pallet of DQ oriented criteria such as dictionaries, JavaScript, Rules and much more.* The next screen shot shows the use of a filter to limit the number of analyzed rows: ![]() Don’t expect any Kettle like drag&drop like data integration. This is specifically targeted towards on-line data quality and data profiling more specifically. However, that’s what the tool claims to be good at and it is good at that. There’s obviously a lot more to tell about DataCleaner but I hope that this little blog post will make you at least interested and makes you want to give it a go yourself. Since DataCleaner and Kettle are license-compatible I’ll be looking at creating a plugin to integrate DataCleaner into Spoon … once I find a bit of time to do so or if someone volunteers to jump right in.* Kasper wasn’t quite convinced it would be easy to do but not all things in life have to be easy. You can download DataCleaner over here so download it now and make sure to let them know what you think of it. Until next time, Matt More from Matt Casters on Data Integration (Pentaho) Blog... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Aster Data and Tableau Software Partner to Deliver Big Data Analytics and Data Visual | Latest News Headlines | Other International Vendors | 0 | 1st December 2010 01:22 AM |
| Interactive Data Wins Three Awards from Inside Market Data and Inside Reference Data | Latest News Headlines | 2010 Q2 News Headlines | 0 | 3rd June 2010 01:46 AM |
| DATA DATA DATA ANALYSIS, STRONG SQL, Technical Consulting | admin | 2009 Job Archive | 0 | 21st October 2009 06:56 PM |
| | |
| | |