I was importing some flat files recently and by virtue of forgetting my own best practices managed to create headaches for myself. So here’s a couple of
BI Monkey aspirins to ensure your imports go well:
Tip 1 : Always import the columns of your file into varchars big enough to take the contents of the column
Flat files have the habit of containing bad data - strings instead of numerics being the worst culprit, with oversize fields and unexpected delimiters being the close runners up in the trouble stakes. So if you want to avoid errors importing the file, pull everything you can into suitably big varchars. That way you avoid data conversion and truncation errors. It’s much easier to deal with these issues once the fields are in the database, as you can fix a single broken row far more easily in a database than in a massive text file.
Tip 2: Always TRIM the contents of the columns you are pulling in
Unless you desperately need those trailing spaces, TRIM them in a Derived Column Transform as part of the import, particularly if you are pulling in from a Fixed With file. This prevents database bloat - if you have a 100 character wide column which is just a ‘Y’ with 99 trailing spaces, the database will still store 100 characters - 99 bytes of junk - over a million rows thats nearly 100MB. If you trim it you pass 1 byte to the database, and if it goes into a varchar, you will only store one byte of data. Otherwise it’s very easy to have Gigabyte sized import tables which actually contain a few hundred
MB of useful data.
Please feel free to add any other hints and tips in the comments.
More...