Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

BI Monkey BI Monkey is the ‘nom de plume’ of James Beresford, a Certified Microsoft BI Professional and MBA living and consulting in Sydney

Reply
 
LinkBack Thread Tools Display Modes
Old 11th November 2009, 11:22 AM   #1 (permalink)
Senior Member
 
Join Date: Jun 2009
Posts: 62
James Beresford is on a distinguished road
Thumbs up Fuzzy Thinking

I’ve covered off the Fuzzy Lookup and Fuzzy Grouping transformations in SSIS and noticed in my research that these capabilities aren’t particularly coherently talked about on the web. So below I thought i’d collect some of the better articles for your late night reading. There isn’t all that much out there, unfortunately.

So, how does it all work?

Here are a few articles covering theory, mostly from Microsoft:
It is probably worth reiterating that because of the way the algorithms and their Q-Grams work, when longer strings are being analysed for fuzzy matches, the better the chances of a good match. When I first started using the algorithms I was doing some client matching and matched first and last names separately. Once I had a deeper understanding of the components, I started matching on a full name and the quality and reliability of matches improved significantly.

Ok, so how do I make it work?

Now, some articles covering practical implementation of the tasks:
The best thing you can do is get some sample data and play with the components to understand what it is they do. The results are impressive – if not bulletproof – and can make a great contribution to de-duplicating client data, etc.



And what does the BI Monkey have to say about it?

Fuzzy Matching is a powerful and easy to use tool which is great for approximate grouping of data for analysis where a margin of error is tolerable. It is also a great helper in data cleansing exercises. Having too much faith in the results where exact matches are required will cause you to fall over at some point, so be careful. If you are engaged in such an exercise and want some experienced support, please get in touch.

And in other news, fresh from Jamie Thompson – Fuzzy Lookup and Regex are going to become available in SQL2008R2.

If you have come across any articles that you think really contribute something to the understanding of fuzzy matching technologies in SQL Server / SSIS, please let me know or post a link in the comments so I can improve this article.



Get More from the original blog...
James Beresford is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Fuzzy Grouping Transformation James Beresford BI Monkey 0 5th November 2009 08:00 PM
Short-term rates thinking at Westpac Latest News Headlines Latest News 0 4th November 2009 08:10 AM
The Fuzzy Lookup Transformation James Beresford BI Monkey 0 23rd June 2009 09:34 PM
Start thinking about lodging your 2007-08 tax return now binboy Jobs Wanted 0 15th July 2008 10:40 AM


All times are GMT +11. The time now is 09:50 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO 3.3.0