chip_levis


I've successfully created SSIS packages where I compare two tables in different databases on different servers. However, this is good enough to compare hundreds of thousands of records quickly. The process becomes a huge performance problem when trying to compare table differences when I'm looking at tables that each contain tens of millions of records.

One database is on a SQL 2005 box and the other DB is SQL 7.0 so the lookup component fails for this type of SQL Server. I've been implementing merge joins and conditional components to do my standard table comparisons.

Is there another way to implement this process or maybe partition it somehow to take pieces of the table at a time and compare them I'm open to ideas.




Re: Comparing Large Tables

Phil Brammer


One way is to simply copy the data over to a staging table. From there, you can write your joins in a SQL statement however you need -- the idea being to let the database do the hard processing for you, rather than SSIS. 'Course if you move it over to SQL 2k, you can use the lookup as well. Wink

"Tens of millions of records" isn't much in my opinion, but it certainly does depend on the record size.






Re: Comparing Large Tables

chip_levis

Well, when you're doing a lookup for tens of millions of records against another table with tens of millions records I would think it becomes pretty considerable since you're doing a row-by-row comparison.

Do you think it would be faster to stage the tables and try comparing them through T-SQL or just use SSIS's lookup component






Re: Comparing Large Tables

Phil Brammer

chip_levis wrote:

Well, when you're doing a lookup for tens of millions of records against another table with tens of millions records I would think it becomes pretty considerable since you're doing a row-by-row comparison.



That's true. The initial cartesian product is quite large!

chip_levis wrote:
Do you think it would be faster to stage the tables and try comparing them through T-SQL or just use SSIS's lookup component



If you can keep the result set small in the lookup component, I would expect that would perform better because of the way it caches its records. That is to say you must include ONLY the columns necessary to perform your work and nothing else. Select just those columns in the lookup SQL upfront. If you're going to select all of the columns, then you could run into memory/paging problems, which may end up in slower performance compared to T-SQL.







Re: Comparing Large Tables

sqlsundar

Hi Chip_Levis

I am new to SSIS and almost in the same situation you have. I need to create a SSIS Package to compare data in two tables in different databases on different servers. Can I know how you have accomplished this in SSIS package

Thanks,

Sundar.





Re: Comparing Large Tables

Phil Brammer

Sundar,
You may be able to use the Lookup component.

Chip's issues arise with the fact that one of the servers is SQL Server 7. Do you have the same situation





Re: Comparing Large Tables

sqlsundar

Hi Phil-

Thanks for the reply. Both of my tables are in the same SQL 2005 Server. Only different databases.

Regards,

Sundar.





Re: Comparing Large Tables

Phil Brammer

sqlsundar wrote:

Hi Phil-

Thanks for the reply. Both of my tables are in the same SQL 2005 Server. Only different databases.

Regards,

Sundar.




Use the lookup component in the data flow then. Actually, better yet, you'd be better off to write the join in a SQL statement. Even though they are on different databases, you can still join to them in one SQL statement. This would be the preferred method when possible. Simply use fully qualified table names, which is the following format: database.owner.tablename

Chip's issues are nothing related to yours. If you have further questions with using the lookup component, I encourage you to search this forum for plenty of conversations around that topic. Should you have any more remaining questions, you can start your own thread.





Re: Comparing Large Tables

sqlsundar

Hi Phil-

I would also like to know, any component available in data flow for using SQL scripts.

Thanks,

Sundar.





Re: Comparing Large Tables

sqlsundar

I am sorry. I didn't see your last post. I will check for the Lookup components.

Thanks,

Sundar.





Re: Comparing Large Tables

Phil Brammer

sqlsundar wrote:

Hi Phil-

I would also like to know, any component available in data flow for using SQL scripts.

Thanks,

Sundar.



You'd write the SQL in the OLE DB Source component to join the two tables together.