the_chad


I am using SS05 64 bit standard edition, SP2 on a 2.8 GHz dual core Xeon 2 GB RAM box with a 146GB RAID 10 data partition and a 36GB mirrored OS partition.

I have 13 publications over 5 databases. Some of these publications are transactional, some are merge with updating subscribers.

I have 35 pull subscribers that use Windows XP SP2 and Sql Express SP2, and connect through wireless broadband, wired broandband, or a dial-up connection. I am not using web synchronization.

I am experiencing a high failure rate during synchronization with a variety of error messages, such as "The merge process could not replicate one or more INSERT statements to the 'Subscriber'. " Another error would be "The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active.".

.Some of these errors seem to be the result of a hiccup in the connection. These hiccups cause synchronization to fail where a normal file transfer would succeed.

I know this is a vague question, but is there any way to make this process more reliable

Thanks in advance for your suggestions.




Re: Reliability during synchronization

the_chad


Does anyone have any ideas




Re: Reliability during synchronization

Rob Schripsema

Hey!

Don't have an answer for you, but if it's any comfort, I'm having almost the identical problem. Only one database, merge replication to about a dozen subscribers; some work fine always, some work fine sometimes, some fail consistently (especially after they've failed once).

I'm opening a MS support incident to try to resolve the issue. If I learn anything, I'll let you know....






Re: Reliability during synchronization

Li Zhang

Try to increase -QueryTimeout parameter of the merge agent and see if that makes a difference.



Re: Reliability during synchronization

the_chad

I did increase the querytimeout, and adjusted other parameters such as batchcommitsize, uploadgenerations, downloadgenerations, and none of these made a difference. as a matter of fact, the last three slowed replication down quite a bit, and I still got the same error "The replication agent has not logged a progress message in n minutes. This might indicate an unresponsive agent or high system activity..." By the way, n was adjusted from 10 to 15.

Any ideas





Re: Reliability during synchronization

Greg Y

You're right in this is a very vague question Smile Definitely need way more info, such as what is the average duration per synchronization, how many changes are being replicated per sync, there are questions needed about your publication/articles such as are you using dynamic filtering, etc. A simple file transfer is different than a synchronization, anything on wireless or dial-up will is considered unreliable. The good news is that merge agent has improved retry logic, hopefully that alleviates some of the problems.

Long sync times - you'll have to do some profiling to see what's taking so long - is the publisher the bottleneck Are merge agent sending too much data back and forth over the slow link Are there deadlocks Maybe there are some publication or article properties that can be enabled/disabled to improve things.

The simplest test would be to run your scenario on a LAN, if you see your problems go away then you know it's network issues. If they still persist then it's merge replication issue. You'll have to do your homework here. You can also try using the merge agent slow-link profile, this might help some.

SOrry there's no simple answer here, but if you can pinpoint a bottleneck or problem, then we can suggest a better solution.





Re: Reliability during synchronization

the_chad

Thanks, Greg! Here is some detailed information:

  • MajorDB1_Merge
    1 Filtered table based on SUSER_SNAME()
    8 joined tables, based on a simple join (e.g., FilterTbl.CustID = JoinTbl.CustID)
    Includes 1 updateable table and 7 download only
  • MajorDB1_Trans
    Includes stored procedures, views, functions
  • MinorDB1_Merge
    4 Filtered tables based on SUSER_SNAME()
    Filtered Table A:
    1 simple joined table
    Filtered Table B:
    10 simple joined tables
    Filtered Table C:
    4 simple joined tables
    Filtered Table D:
    6 simple joined tables
  • MinorDB1_Trans
    stored procs, views, functions
  • MajorDB2_Merge
    2 Filtered tables based on SUSER_SNAME()
    Filtered Table A:
    7 simple joined tables, including 2 updateable tables
    1 joined table (InvoiceHdr) with an additional date join statement
    This table has 3 child tables
    1 joined table with an additional date join statement
    This table has 4 child tables, 1 of which has 1 child table
    Filtered Table B:
    1 joined table (InvoiceHdr) based on different criteria than above
  • MajorDB2_Trans1
    64 relatively small tables for codes and such
    3 filtered tables, 1 based on a literal "in" list (e.g., ('a', 'b', 'c'), 2 based on an in list over a subquery
  • MajorDB2_Trans2
    stored procs, views, functions
  • MajorDB3_Merge
    4 Filtered tables based on SUSER_SNAME()
    Filtered Table A:
    1 simple joined table, download only
    Filtered Table B (all updateable):
    7 simple joined tables
    3 joined tables with additional date join
    1 simple joined table with a child table
    Filtered Table C (all updateable):
    3 simple joined tables
    Filtered Table D (all updateable):
    1 simple joined table with 8 child tables that also exist under Filtered Table B
  • MajorDB3_Trans1
    13 relatvely small tables for codes
  • MajorDB3_Trans2
    stored procs, views, functions
  • MinorDB2_Merge
    7 Filtered tables based on SUSER_SNAME(), 1 of which has a child table
    6 updateable tables, 2 download only
  • MinorDB2_Trans1
    23 tables for codes
  • MinorDB2_Trans2
    stored procs, views, functions
    This publication contains most of the stored procedures (200 or so) for the application

Error Summary
The error "The replication agent has not logged a progress message in

n minutes..." occurs most often in the transactional publications for stored

procedures. In some of these cases, there will be an action message "A total

of 2 transaction(s) with 2 command(s) were delivered" followed by "A DDL

change has been replicated", which seems to indicate a transaction success

message is received after the syncrhonization success message, thereby making

the distributor think the synchronization is still in progress. This error

also occurs when there seem to be no transactions available for

synchronization, and does occasionally occur for merge publications.

I also have lock problems and performance issues when updating the filtered

table(s) in the major databases. Could this be because there are too many

tables joined off of the filtered table If so, how might I split them up







Re: Reliability during synchronization

Greg Y

Hard to say without doing any debugging. Have you had a chance to run any profiler traces or perfmon gathering



Re: Reliability during synchronization

the_chad

I Have not run profile on the server, but synchronization almost never fails when connected directly to the network, even via wireless. Query and login timeouts have been increased, batch and generation sizes have been decreased, but problems persist.



Re: Reliability during synchronization

Greg Y

If you can sync successfully when connected directly to the network, then it's most likely not a publication problem. You may want to double-check your connection to see what's causing it to hiccup.



Re: Reliability during synchronization

the_chad

But Is there something else I can do on the distributor that will help prevent these "timeout" errors



Re: Reliability during synchronization

Mahesh Dudgikar - MSFT

Is each subscriber subscribing to all the publications you listed

How much data are you sending across

I would put some testing on that and measure the sweet spot since you have a slow network.

Also use the slow network profile as Greg suggested.

Also are your subscribers synchronizing concurrently






Re: Reliability during synchronization

the_chad

Every Subscriber exists in each publication.

What do you mean by how much The largest current snapshot is more than 18MB, while the smallest is 18KB.

I have changed the parameters as indicated by the slow network profile.

All the subscribers have pull subscriptions.





Re: Reliability during synchronization

Mahesh Dudgikar - MSFT

Since the network is unrelaible for you, you should continue to run the agents in a loop or in cntinous mode or on a schedule to maximize the success of the merge agent.