Richard Hough

Hello. I am using a DataReader to query a large database (millions of rows) and perform some upkeep on each row. Our DBA complains my app keeps a connection open and locks the database for a long time, and wants me to save the data internally, close the connection, then issue UPDATE commands.

I am reluctant to do this, as the query could potentially return megabytes of data and I only want to access each row in sequence. What is the SQL-Server-friendly .Net way to step through a large query and potentially perform an update on each row

Re: .NET Framework Data Access and Storage Disconnected DataReader?

Jeff Wharton

Without knowing what is is that you are doing that requires you to update millions of rows by processing them one-by-one in sequence, it is difficult to provide an answer. perhaps you could provide a little more info on what it is you are actually trying to achieve.

But, as a starter, you can either:

  • Break your updates down into batches,
  • Structure your sql query to return rows to update based on better defined 'where' clauses

Re: .NET Framework Data Access and Storage Disconnected DataReader?

Bill Lin - MSFT

Are you sure you need absolutely every row returned back from the server to process Can you change your query so that it only returns interesting rows which you might want to update. If the query is too complicate, you might consider using storedproc (or even SQL/CLR) to help you filter the rows.

Re: .NET Framework Data Access and Storage Disconnected DataReader?

Richard Hough

Just to be clear; while the database is millions of rows we typically update only a few rows at a time. We have constructed queries that prunes out rows which definitely will not need updating.

The application is part of a printer workflow. The database contains steps that need human intervention; loading casettes or rollers, changing the bindery, loading/unloading pallets, QA, etc.

Suppose we get a load of hologram stickers. We run the app and it puts all job orders that need the stickers next in line, updates their processing status, and emails everyone who needs to know this. It's a little more complicated because we do things like batch up orders that require a special configuration and stat orders can bump otherwise unassociated orders. But basically that's it.

Typically a query will only return a dozen or so records and we need to update all of them. But occasionally, we could get something that throws off a thousand steps, or potentially could but it turns out we only need to change the status of a few stages.

A potential optimization is that the updates are stateless; it doesn't matter which order we do them in, and no data is carried over from one update to the next.

Re: .NET Framework Data Access and Storage Disconnected DataReader?

Richard Hough

OK, we have implemented the following logic and it works well:

SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
DataSet dataSet = new DataSet();
dataAdapter.Fill(dataSet, "Pending");
return dataSet.Tables[0];

Where "command" is like:

SELECT TOP 1000 whatever...

If there are more than 1000 records we need to consider, they will get handled the next time the command runs.