This is very helpful. Thanks!
I am completely new to the SQL Server Integration Services. I went through your link and understand how you filter out diffrent records. Could you pointting out how you actually update you existing records
Thanks!
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way..
regards
Sudhir Kesharwani
Sudhir Kesharwani wrote:
Hi phill,
Great article, I loved it.
This is what i had been looking for since long time.
One doubt is there in my mind, my current assignment i have to transform about 40K rows from a database.
Will this method work for me or there is any other better way..
regards
Sudhir Kesharwani
Phil,
As we discussed about updating/inserting techniques a while ago(see http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=1211321&SiteID=1) you told me SSIS can perform this operation much faster than using plain SQL... I put the emphasize on the "much faster".
I have created a performance test package comparing the 2 methods:
1) stored proc with UPDATE and INSERT statements
2) Data flow using look up/conditional split
I have a data flow task to create a data test set with a variable number of rows (I used Jamie's script component source example) with IDs (used as PK) always ranging from 1 to nbrRows. Each row is constituted of 5 int, 4 string (10, 20, 40 and 80 characters) and 1 datetime with values chosen randomly.
So what I do is calling the package with nbrRows=1, 10, 100... The first time 1 new row is added. The second time, id=1 already exists so 1 row is updated, the other 9 are added. And so on... I then perform a second round of tests with the same numbers, the only difference being that the destination table is now full (therefore performing only updates).
What I've found out so far is that SSIS is faster only when inserting a certain number of rows (approx . 1000 to 100000). SSIS performed on avg. 30% better with 50000 rows while at 1 million rows, SQL performed better by 5%.
When the data was updated (second round), SQL was always faster.
Of course, these figures are based on my tests which may not reflect actual usage. I run SQL/SSIS in parallel, the tests are done in batch, .... I thought about using DBCC DROPCLEANBUFFER but it doesn't seem to change much.
Maybe my way of comparing is totally flawed... Have you ever tried comparing the two methods in terms of performance How did you conclude the SSIS way is much faster
Leonce
Leonce,
First, when using fast load option on the destination, there are two parameters which need to be adjusted based on the rows you are working with. Max Insert Commit Size and Rows Per Batch. What were the values for these set at
Also note, that when possible, using a SQL Server Destination over the OLE DB Destination would be preferred.
SSIS should be no different than using BULK INSERT (it seems to be just a glorified wrapper to that). Performing updates via an OLE DB Command will always be slower than a batch update using SQL. (Only because the OLE DB Command is executed for every row in the data flow)
Phil
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred What would typical values be for a table with 100000 rows Does it dependent on the row size
I use OLEDB Destination, I'll try with SQL Server Destination.
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates I would tend to think it's impossible.
Leonce
fleo wrote:
I left the default values for Commit Size and Rows Per Batch ('' and 0). You suggest I should adjust these values depending on the number of rows transferred What would typical values be for a table with 100000 rows Does it dependent on the row size
MICS should be set to whatever you wish the batches to be committed. So if you want to commit in 10,000 row batches, set MICS to 10,000. Rows Per Batch should be set to 100,000 in this case.
fleo wrote:
I forgot to mention I use your method for updates (dump into table then update). Do you think SSIS can be faster than plain SQL for updates I would tend to think it's impossible.
True. Using an Execute SQL task will likely have a slight slowness to it compared to straight SQL, but in the end they do the same thing, so they should be REAL close.
Let us say that a package uses merge join option to insert new rows and it uses OLE DB Destination with fast load, table lock, and batch size (1000) options. The table lock option on the destination forces us to put a NOLOCK hint in the OLE DB Source query (on the destination table). Otherwise, they run into locking contentions (one is requesting exclusive table lock when the other is requesting shared table lock). This works most of the times but occationally I see that SQL Server does not honour the hint (one scenario: when the target table does not have a clustered index on the join key column) and runs into locking contentions. Have you seen this Do you rely on NOLOCK hint too Thanks for your help in advance.