Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

BI Monkey BI Monkey is the ‘nom de plume’ of James Beresford, a Certified Microsoft BI Professional and MBA living and consulting in Sydney

Reply
 
LinkBack Thread Tools Display Modes
Old 10th September 2009, 03:54 PM   #1 (permalink)
Senior Member
 
Join Date: Jun 2009
Posts: 76
James Beresford is on a distinguished road
Thumbs up Flat Files with too many Delimiters

It is not uncommon for flat file sources to have delimiters where they shouldn’t be – extra commas in address fields is a particularly common problem, for example. These don’t necessarily cause the import to fail – especially if the data is not being imported is not being done so in a strongly typed manner.

To understand what happens, you need to understand how SSIS Flat File sources handle delimited files. Unlike Access or Excel, it doesn’t carve up the rows based on it finding a delimiter and creating a column. In an SSIS source the number of columns is predefined. So imagine I had a Flat File Connection manager configured as below:

Fig 1: Flat File Connection Manager


How it reads in a row in a file is as follows:
  1. Scan until reaches 1st comma – Put found data in “RowID”
  2. Continue scan until reaches 2nd comma – Put found data in “MonkeyFood”
  3. Continue scan until reaches 3rd comma – Put found data in “Motivation Factor”
  4. Continue scan until reaches Carriage Return / Line Feed – Put found data in “Cost”
So the outcome is that if you happen to have more than 3 commas in your row, anything after the 3rd comma but before the Carriage Return / Line Feed ends up in the 4th column. So if you look at my example below, where the 2nd row of the text file has gone a little comma crazy and has 6 commas in it:
RowID,Monkey Food,Motivation Factor,Cost
1,Peanuts,1,10
2,C,a,s,hews,2,15
3,Tea,3,24
4,Coffee,7,81

The end result of pumping it through the connection manager as defined in Fig 1 is this:

Fig 2: Output from corrupted file


Note how everything after the 3rd comma has ended up in the final column, because after reaching the 3rd column it has scanned across for the Carriage return / Line Feed to define the final column ignoring any further commas.

There is a sample package here demonstrating the above example.



Get More from the original blog...
James Beresford is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Formatting SSIS Configuration files James Beresford BI Monkey 0 11th August 2009 11:58 AM
Flat File Sources and the Decimal Data Type James Beresford BI Monkey 0 27th July 2009 07:22 PM
Flat File Source Error: The column delimiter for column [ColumnName] was not found James Beresford BI Monkey 0 30th June 2009 11:21 AM
Flat File Import tips James Beresford BI Monkey 0 23rd June 2009 08:34 PM
IBM Files ITC Complaint against ASUSTeK Latest News Headlines 2007 News 0 7th December 2007 05:16 AM


All times are GMT +11. The time now is 06:14 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO 3.3.0