Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Replacing multiple characters in a Text String in SSIS

This is a discussion on Replacing multiple characters in a Text String in SSIS within the BI Monkey forums, part of the CORTEX Blogs category; Today as part of some ongoing painful text file imports, I had to clean up some phone numbers that had been entered into a free text field. Consequently the data ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 30th June 2009, 05:09 PM   #1
Guru
 
Join Date: Jun 2009
Posts: 135
James Beresford is on a distinguished road
Thumbs up Replacing multiple characters in a Text String in SSIS

Today as part of some ongoing painful text file imports, I had to clean up some phone numbers that had been entered into a free text field. Consequently the data contained this kind of garbage data:

(03) 8765 4321

0387654321

03-87654321 - Bob’s mums number

(03)87654321 - or use 04 87654321 during day

Which didn’t lend itself to being fixed in a Derived Column, as every possible replace character would have to be hardcoded, like so:

REPLACE(REPLACE([PhoneNumberText],”(”,”"),”(”,”") - and that’s just to clear the brackets. Doing this for evey possible stray character was out of the question.

What I wanted was just to get rid of any character which wasn’t a numeric. My mind immediately leapt to Regular Expressions, but couldn’t see how to do these easily in T-SQL against the source or at all in a standard SSIS component.

Fortunately Brian Knight has a solution - using Regular Expressions within a script task. So, quoting him almost verbatim:

In the Imports area, add the following line of code:
Imports System.Text.RegularExpressions

Next, replace the Input0_ProcessInputRow subroutine with the following code:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

If Row.ProductNumber_IsNull = False Or Len(Row.ProductNumber.ToString) 0 Then

Dim pattern As String = String.Empty

Dim r As Regex = Nothing

pattern = “[^0-9]”
r = New Regex(pattern, RegexOptions.Compiled)

Row.ProductNumber = Regex.Replace(Row.ProductNumber, pattern, “”)

End If

End Sub

Which worked beautifully and saved me writing some truly awful code. However it does lead me to gripe about an absent feature of SSIS that bugs me. The limited number of functions and inability to create custom functions to use in the Derived Column transformation means it is easy to fall back on to the Script Component to do your dirty work. This annoys me because a) i’m not a VB programmer and b) it reduces the transparency of what is going on in the data flow. Fingers crossed this will improve in 2011 - i’ve added feedback on Connect to request this - please vote it up if you consider it important.



More...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Debugging SSIS ScriptTasks and ScriptComponents with Information Messages Darren Gosbell Random Procrastination 0 23rd June 2009 07:30 PM
Teragram unveils Sentiment Analysis Manager at Text Analytics Summit* Latest News Headlines SAS Forum 0 3rd June 2009 12:54 AM
How to split Cognos report (PDF) into multiple pdf's bujji IBM and Cognos Forum 0 28th February 2009 01:00 AM


All times are GMT +11. The time now is 11:01 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO