PhilDev


what is the best way to uses ssis to deal with a CSV file that contains a number of records that contain a different number of fields

I could just load as a single column delimited by <CRLF> but then I would have to write the code to parse the line, effectively detokenising the columns myself but if that is the case then why uses ssis at all





Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

Phil Brammer


If there is a record type indicator, you can read in the whole file and push it through a conditional split to separate flat file destinations/sql server tables/etc... based on that record type. Then you can use those destinations as sources in another data flow.

Or, as you say, you'll have to write code (substrings perhaps) to split up the rows. Not much SSIS can do with different record layouts in the same file.







Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

Jamie Thomson

Phil Brammer wrote:

If there is a record type indicator, you can read in the whole file and push it through a conditional split to separate flat file destinations/sql server tables/etc... based on that record type. Then you can use those destinations as sources in another data flow.

Or, as you say, you'll have to write code (substrings perhaps) to split up the rows. Not much SSIS can do with different record layouts in the same file.

I completely disagree with that last statement Phil. You've supplied a method of doing this using SSIS right here. Smile







Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

Jamie Thomson

PhilDev wrote:

I could just load as a single column delimited by <CRLF> but then I would have to write the code to parse the line, effectively detokenising the columns myself but if that is the case then why uses ssis at all

Because once you have done the parsing then you can do whatever you want with the data. It is in memory, parsed, and you can load it wherever you want. I suppose the other alternative is to parse the file in a custom application. I would much rather do it in SSIS but then, I am biased Smile

-Jamie






Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

Phil Brammer

Jamie Thomson wrote:

I completely disagree with that last statement Phil. You've supplied a method of doing this using SSIS right here.

What I mean is that you need to design the packages accordingly. SSIS doesn't handle multiple record layouts on its own, so you need to do some work. Of which I listed a few options. Wink






Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

mr_superlove

You could do a poor man's Delimited ragged-right, coupled with a tokenization if your complaint is number of fields that you would be required to tokenize by the method you mentioned in your first post.

In your flat-file connector, define the minimum number of fields that a record could have and set the final column's delimiter to <CR><LF> creating a "basket" field. Feed that into a Transformation Script Component. On the outputs of the script component, create the parsed fields and tokenize only the "basket" field into each output column. That could eliminate some work if you only have a small number of "ragged" fields.





Re: SSIS - how to deal with a csv file that contains multiple records with a differing number of fields

PhilDev

This was kind of what I expected... but I would have thought that if you defined a number of columns (more than what there are - say 15), then you could parse each column to see if it was null and then know that the data had been populated but SSIS seems to then ignore the line delimiter <CRLF> and items on the second row/record up to 15 columns are included as one line which I think is wrong.

The first column in each row is a record indicator which I can use to parse the rest of the line.

I could use the suggestion you made as the minimum number of columns in each row is 5 and then parse rows containing more but it is not as pure a solution as just parsing all the tokens individually. Alternatively I may just get the first column which is the record indicator and then roll all remaining fields into the second column and then parse these fields into their respective tokens as you suggested.

thanks