sadie519590


Hi,

I have a flat file source. In the flat file, there are some bad, i.e. incomplete rows.

I set up the flat file error output to send the bad rows to a file destination on both "error" and "truncation".

However, for some reason it's really buggy, and doesn't work correctly.

For example, if my data looks like this:

cat, whiskers, 10 -- good row

dog, spot, 15 -- good row

blah -- bad row

blah blah -- bad row

horse, flicka, 20 -- good row

It correctly sends "blah" & "blah blah" to the error output, but it ALSO sends "horse, flicka, 20" to the error output as well.

This is NOT what I want.

I have this particular scenario 5 times in the data file. And each and every time, it sends the first two bad rows PLUS the following GOOD row to the error output, so basically, I lose 5 good rows in the process.

Aside from writing a script to remove the bad rows, is there an explanation for this strange behavior

Thanks much




Re: Help. Serious problem with error redirection

Phil Brammer


I know there are work arounds (read in the file first with the whole row as one field, use a conditional split to filter bad records off, etc...), but you might want to post a Connect bug. http://connect.microsoft.com/sqlserver/feedback

I agree, on one hand, that the remaining records should continue as normal. On the other, I can appreciate erroring out the rest of the file because of "trust" issues. That is, can you "trust" the rest of the file is right

Technically speaking, the file is bad.







Re: Help. Serious problem with error redirection

jwelch

sadie wrote:

cat, whiskers, 10 -- good row

dog, spot, 15 -- good row

blah -- bad row

blah blah -- bad row

horse, flicka, 20 -- good row

It correctly sends "blah" & "blah blah" to the error output, but it ALSO sends "horse, flicka, 20" to the error output as well.

I have this particular scenario 5 times in the data file. And each and every time, it sends the first two bad rows PLUS the following GOOD row to the error output, so basically, I lose 5 good rows in the process.

If you have:

1 good row

2 good row

3 bad row

4 good row

5 good row

6 bad row

7 good row

8 good row

Are you getting 1, 2, 5, and 8 in the good output, and 3, 4, 6, and 7 in the error output







Re: Help. Serious problem with error redirection

Andy Leonard

Hi Sadie,

If the above truly matches your file format (and assuming defaults for csv files), I wonder if the file source or connection manager is confused by the "short" row, schema-wise.

There are no commas in the rows containing "blah" but there are line feeds. I'm not sure how the file connections manage this scenario. I'm curious if the file adapters interpret the line feeds as part of the text in the row, or whether this throws them in interpreting both their row and the next. This is all hypothetical mind you, but it could explain why you're losing an extra good row each time.

Andy





Re: Help. Serious problem with error redirection

Phil Brammer

Or does it simply throw away the remaining rows




Re: Help. Serious problem with error redirection

Andy Leonard

Hi Phil,

I'm kind of surprised the flat file connection manager doesn't throw away remaining rows after a schema-level error - pleasantly surprised!

Andy





Re: Help. Serious problem with error redirection

Bob Bojanic - MSFT

It has been written many times but let try to repeat again how the flat file parser works. It parses column by column without giving any importance to row delimiters (other than as delimiters of another column which happens to be the last in the row). The flat file parser does not know how to find incomplete rows (other than maybe the last one after all prior rows that follow a misplaced delimiter get wrapped).

I am not going into reasons for this behavior and yes we are aware that there are a lot of SSIS customers who are not happy with this.

Lets look what this does to Sadie's example:

sadie wrote:

cat, whiskers, 10 -- good row

dog, spot, 15 -- good row

blah -- bad row

blah blah -- bad row

horse, flicka, 20 -- good row

After the first two rows (good ones) we start parsing the third one and we look for comma and do not care about new lines until we get our comma. So the last three rows get parsed as one:

blah{CR}{LF}blah blah{CR}{LF}horse, flicka, 20

This row looks OK to the parser but it will probably cause some truncation and be passed to the error output.

Thanks.






Re: Help. Serious problem with error redirection

Andy Leonard

Hi Bob,

Thanks for this explanation!

This is the first time I've read this. It makes sense and explains behavior I've seen in the field.

If I run into this a lot, I usually stage the data first by some means. Depending on how dirty the data, the file size, etc. I've used standalone applications, Excel automation, and SQL Server staging tables.

Thanks,

Andy





Re: Help. Serious problem with error redirection

Phil Brammer

Bob Bojanic - MSFT wrote:

It has been written many times but let try to repeat again how the flat file parser works. It parses column by column without giving any importance to row delimiters (other than as delimiters of another column which happens to be the last in the row). The flat file parser does not know how to find incomplete rows (other than maybe the last one after all prior rows that follow a misplaced delimiter get wrapped).

I am not going into reasons for this behavior and yes we are aware that there are a lot of SSIS customers who are not happy with this.

Lets look what this does to Sadie's example:

sadie wrote:

cat, whiskers, 10 -- good row

dog, spot, 15 -- good row

blah -- bad row

blah blah -- bad row

horse, flicka, 20 -- good row

After the first two rows (good ones) we start parsing the third one and we look for comma and do not care about new lines until we get our comma. So the last three rows get parsed as one:

blah{CR}{LF}blah blah{CR}{LF}horse, flicka, 20

This row looks OK to the parser but it will probably cause some truncation and be passed to the error output.

Thanks.

Yeah, sorry. This is simply nonsense and the wrong way to go about parsing a file. I'll have to go see if there is a Connect bug submission, because frankly this is a bug. There are reasons for having a "ROW DELIMITER." Those delimiters are there to define the end of a row and should be looked at FIRST before looking for complete column counts. If SSIS runs into a row delimiter and hasn't reached it's column count, then that row should be deemed bad. After which, the counts should get reset for the next row.






Re: Help. Serious problem with error redirection

sadie

Hi,

I would have posted sooner, but I have been away for the weekend.

My data actually looks like this:

cat, whiskers, 10 -- good row

dog, spot, 15 -- good row

blah, blah -- bad row

blah -- bad row

horse, flicka, 20 -- good row

That is, there is a comma between "blah, blah", and there is a single blah (no comma) after that.

Sorry for the confusion, although the conclusion will probably be the same.





Re: Help. Serious problem with error redirection

sadie

Hi Phil,

Do you have an example of this workaround I'm not sure how I would do this on my own, but it's looking like I'm going to have to.

My questions are:

1.) How to read the file as one field

2. How to write the conditional split

Or alternatively, how to read the file as one line, write to a raw file, then parse out the bad rows

Thanks much