boulder_bum

I implemented a publisher/subscriber demo, but ran into a problem. In particular, when a client registers as a subscriber, but then shuts down/crashes without unsubscribing, the service locks up.

For the notification, I use the below:

[OperationContract]
public void Notify(NotificationMessage message)
{
lock (_lock)
{
    for( int i = _subscribers.Count - 1; i >= 0; i-- )
    {
        try
        {
            _subscribersIdea.ReceiveMessage(message);
        }
        catch( Exception ex )
        {
            Console.WriteLine(ex.Message);
            _subscribers.RemoveAt(i);
        }
    }
}
}

This works great until there is a subscriber in the system that no longer exists. At that point everything hangs while trying to connect to the orphaned subscriber and a deadlock occurs.

Is there an easy way to automatically do some housekeeping to remove the orphaned subscribers proactively The examples I've seen use this simple try/catch block, but that isn't working out too well.



Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

Brian McNamara - MSFT

Is it really a deadlock, or just a hang for one minute until the call times out If the latter, you might be able to adequately resolve the issue by setting the SendTimeout on the binding to a smaller value.

You could also hook the .Faulted and .Closed events on each subscribed channel to proactively remove them from the collection if the channel goes bad. (Some binding will provide 'quick' feedback when a client disconnects and proactively fire these events on the channel; other bindings might do this only reactively. Which binding are you using )






Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

boulder_bum

Well, the weird thing is that after a minute timeout, the service stops working. I'm not quite sure why. Here's the entire code/configuration for the service. Another factor that might be coming into play is that I have two clients on the same client machine using the same ports (one client sends messages and disgregards notifications, while the other listens for notifications).

Any Ideas where I'm messing this up

[ServiceBehavior(
InstanceContextMode =
InstanceContextMode.Single,
ConcurrencyMode=
ConcurrencyMode.Reentrant)]
[
ServiceContract(
CallbackContract =
typeof(INotificationCallbackContract),
SessionMode =
SessionMode.Required)]
public class Notifier
{
private static readonly object _lock = new object();
private static List<INotificationCallbackContract> _subscribers = new List<INotificationCallbackContract>();

[OperationContract]
public void Notify(NotificationMessage message)
{
lock (_lock)
{
for( int i = _subscribers.Count - 1; i >= 0; i-- )
{
try
{
_subscribersIdea.ReceiveMessage(message);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
_subscribers.RemoveAt(i);
}
}
}
}

[OperationContract(IsOneWay = true)]
public void Subscribe(MessageType messageType, MessageInfo[] criteria)
{
INotificationCallbackContract subscriber =
OperationContext.Current.GetCallbackChannel<INotificationCallbackContract>();

lock (_lock)
{
if (_subscribers.Contains(subscriber))
_subscribers.Remove(subscriber);

_subscribers.Add(subscriber);
}
}
}
}

< xml version="1.0" encoding="utf-8" >
<
configuration>
<system.serviceModel>
<bindings>
<
wsDualHttpBinding>
<
binding name="shorterTimeout" sendTimeout="00:00:10"/>
</
wsDualHttpBinding>
</
bindings>

<
services>
<
service
name="NotificationService.Notifier"
behaviorConfiguration="NotificationServiceBehavior">
<
endpoint address="http://localhost:7777/NotificationService"
binding="wsDualHttpBinding"
bindingConfiguration="shorterTimeout"

contract="NotificationService.Notifier" />
</service>
</
services>
<behaviors>
<
serviceBehaviors>
<
behavior name="NotificationServiceBehavior">
<
serviceMetadata httpGetEnabled="True"/>
<
serviceDebug includeExceptionDetailInFaults="True" />
</
behavior>
</
serviceBehaviors>
</
behaviors>
</
system.serviceModel>
</
configuration>





Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

boulder_bum

By the way, to describe what is happening for me, I have a Windows client which creates two connections to the service (one to observe and one to notify). The service is hosted in a console app, and remains open while I build and rebuild the Windows app to debug.

The first time I start the Windows app, everything works as it should until I shut it down. The second time, I seem to receive three notifications (don't know why only three), then I get an exception in the notification-sending component:

An unhandled exception of type 'System.TimeoutException' occurred in mscorlib.dll

Additional information: This request operation sent to http://localhost:7777/NotificationService did not receive a reply within the configured timeout (00:00:05).  The time allotted to this operation may have been a portion of a longer timeout.  This may be because the service is still processing the operation or because the service was unable to send a reply message.  Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.

If I try to start the Windows app up again, the notification service times out every time I try to use the service.

It is only after I close, then re-start the console app host that the Windows app can reconnect again. Oddly enough, the console host never throws an exception and never shuts down. I can also see the service help page by navigating to the URL configured for my service, so at least part of it appears to be running.

Any ideas





Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

Brian McNamara - MSFT

No ideas yet. Can you share the client code Do the clients call Close() before going away Have you considered hooking the Faulted, Closed, and Aborted events on the stored subscriber channels and removing them from the collection when you receive the event




Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

Brian McNamara - MSFT

Any luck solving this If not, can you share the client code and answer the other questions from my previous message




Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

boulder_bum

Sorry for the delay. I was on an anniversary weekend.

I actually just left the company where I was working on this particular prototype, but I was able to find out something else interesting.

In my setup, I recently arranged things so there are three actors running in separate processes:

  1. The service host (console app).
  2. The notification sender (console app set to send messages every 5 seconds).
  3. The notification observer (Windows app that writes notification messages to the UI).

Previously, I had the notification sender/observer running in the same process, so it hid the true nature of my bug.

When I rearranged everything, I found that the service itself could run indefinately without crashing. Likewise the notification sender could be stopped and restarted at will and the messages would get sent to the observer like you'd expect.

The problem was when you started the observer, then closed it. The notification sender would then timeout trying to connect to the service (even when configured to use different ports). This would make sense if the sender timeout was less than the service's timeout when trying to connect to an observer, but I think there was more to it than that. I'll see if I can replicate the situation here in the next few days to get more detail.

Again, sorry for the delay. I really do appreciate the help.





Re: Windows Communication Foundation (Indigo) Publisher/Subscriber Problems: Orphaned Subscribers

Brian McNamara - MSFT

I'll be interested to hear the 'more to it than that' bit... in any case, an option to consider is to mark the Notify operation as IsOneWay=true, so that delays in the notification callbacks don't cause the Notify method itself to time out.