Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Fuzzy Thinking

This is a discussion on Fuzzy Thinking within the BI Monkey forums, part of the CORTEX Blogs category; I’ve covered off the Fuzzy Lookup and Fuzzy Grouping transformations in SSIS and noticed in my research that these capabilities aren’t particularly coherently talked about on the web. So below ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 11th November 2009, 11:22 AM   #1
Guru
 
Join Date: Jun 2009
Posts: 122
James Beresford is on a distinguished road
Thumbs up Fuzzy Thinking

I’ve covered off the Fuzzy Lookup and Fuzzy Grouping transformations in SSIS and noticed in my research that these capabilities aren’t particularly coherently talked about on the web. So below I thought i’d collect some of the better articles for your late night reading. There isn’t all that much out there, unfortunately.

So, how does it all work?

Here are a few articles covering theory, mostly from Microsoft:
It is probably worth reiterating that because of the way the algorithms and their Q-Grams work, when longer strings are being analysed for fuzzy matches, the better the chances of a good match. When I first started using the algorithms I was doing some client matching and matched first and last names separately. Once I had a deeper understanding of the components, I started matching on a full name and the quality and reliability of matches improved significantly.

Ok, so how do I make it work?

Now, some articles covering practical implementation of the tasks:
The best thing you can do is get some sample data and play with the components to understand what it is they do. The results are impressive – if not bulletproof – and can make a great contribution to de-duplicating client data, etc.



And what does the BI Monkey have to say about it?

Fuzzy Matching is a powerful and easy to use tool which is great for approximate grouping of data for analysis where a margin of error is tolerable. It is also a great helper in data cleansing exercises. Having too much faith in the results where exact matches are required will cause you to fall over at some point, so be careful. If you are engaged in such an exercise and want some experienced support, please get in touch.

And in other news, fresh from Jamie Thompson – Fuzzy Lookup and Regex are going to become available in SQL2008R2.

If you have come across any articles that you think really contribute something to the understanding of fuzzy matching technologies in SQL Server / SSIS, please let me know or post a link in the comments so I can improve this article.



Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Fuzzy Grouping Transformation James Beresford BI Monkey 0 5th November 2009 08:00 PM
Short-term rates thinking at Westpac Latest News Headlines 2009 Q4 News Headlines 0 4th November 2009 08:10 AM
The Fuzzy Lookup Transformation James Beresford BI Monkey 0 23rd June 2009 09:34 PM
Start thinking about lodging your 2007-08 tax return now binboy Jobs Wanted 0 15th July 2008 10:40 AM


All times are GMT +11. The time now is 07:50 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO