R Raghu

We have been testing a simple scenario with different WCF bindings to see the performance benefits. The scenario is that client makes a call to a web service with numer of customer objects to return in an array. The service simply creates as many objects as required by the client and send them to client. The customer object is a simple class with 2 public properties, firstname and last name. Each of these names is set to a distinct value in each of the customer objects before returning to client.

We set maximum times for send and receive timeouts. We also made sure that there are no message size, maximum depth related problems etc. We used a System.Diagnostics.StopWatch class to measure the time from the time client initiates the request to the server to the time client receives request. We know that this type of time meausrement may not be very accurate. But we are simply looking for overall performance of the bindings.

When we tested the netTcpBinding with streaming enabled, the round trip measurements started increasing when compared to netTcpBinding with buffering (default). These timings are simply off by a huge margin. For example, when the client requested 500,000 customer objects with netTcpBinding (with buffering), the total time was around 4 seconds. The same scenario with streaming took 24.5 seconds (i.e. 6 times slower). These timings were obtained when the client and service are deployed on the same machine. So we tested further by moving client and service to different machines. The problem persisted. The streaming timings were slower by at least 3 times.

The tcp streaming is so bad that it lost even to BasicHttpBinding (buffered) by at least 2 to 3 times. In terms of performance, following order prevailed (top one took less time):

netTcpBinding (Buffered)

basicHttpBinding (Streamed)

basicHttpBinding (Buffered)

netTcpBinding (Streamed)

I expected the streaming to be little bit less performant than the buffering but not at this level. Is there an explanation for this behavior

Here is how we are creating the netTcpBindings:

NetTcpBinding netTcpBinding = new NetTcpBinding();
netTcpBinding.Security.Mode = SecurityMode.None;
netTcpBinding.MaxReceivedMessageSize = int.MaxValue;
netTcpBinding.ReaderQuotas.MaxArrayLength = int.MaxValue;
netTcpBinding.SendTimeout = new TimeSpan(1, 0, 0);
netTcpBinding.ReceiveTimeout = new TimeSpan(1, 0, 0);

netTcpBinding.TransferMode = TransferMode.Streamed; //for Streaming

We tested on machine with following specifications:

OS: Windows XP SP2

Memory: 1GB

When the service is moved to different machine, the service machine has 2 GB memory (same OS).

Any suggestions are appreciated.




Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

BenK

These times are huge , your not using RM

I wouldnt expect streaming to be faster in a single client test , but i would expect it to use less memory and hence be faster when moving lots of data. However if you just fetch the data as a result set ( array) and then stream it , it would provide no benefit .

From the doco http://msdn2.microsoft.com/en-us/library/ms789010.aspx

To stream data, the OperationContract for the service must satisfy two requirements:

  1. The parameter that holds the data to be streamed must be the only parameter in the method. For example, if the input message is the one to be streamed, the operation must have exactly one input parameter. Similarly, if the output message is to be streamed, the operation must have either exactly one output parameter or a return value.

  2. At least one of the types of the parameter and return value must be either Stream, Message, or IXmlSerializable.

I dont see how an array satisfies this unless its going through some Xml parsing ( certainly not what was intended) which may be why you are getting those results. Hence you really want to retreive your data in pages and then add this to a stream , this results in lower memory usage. You also probably just want to have streamed response.

Can you post your operation contract

Were you using multiple threads to test your code

Regards,

Ben





Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

I understand that there are some rules to follow when enabling the streaming. However the effects of streaming can be seen even without binding to those rules. Here is the my service contract anyway:

[ServiceContract(Namespace="http://xxxxx.com")]
public interface MyInterface
{
[OperationContract]
CustomerData[] HelloCustomer(int number);
}

Here are some numbers to prove my point (both client and service running on same machine):

Test# Trial# Scenario #Objects Round Trip Client Information Server Information
Client Mode Server Mode Protocol FI Time (s) Peak Mem Size Page Faults VM Size Num Peak Mem Size Page Faults VM Size Num
(MB) (K) (MB) Threads (MB) (K) (MB) Threads
1 1 B B Http N 250,000 2.37 79 21 80 4 102 29 116 5
2 1 B B Http N 500,000 4.74 130 37 143 3 187 55 221 5
3 1 S S Http N 250,000 3.15 45 12 47 6 42 17 49 6
4 1 S S Http N 500,000 6.60 75 23 78 6 72 28 82 6
5 1 B B Tcp N 250,000 1.96 57 16 62 3 61 22 84 6
6 1 B B Tcp N 500,000 3.72 94 27 108 3 110 39 147 6
7 1 S S Tcp N 250,000 12.05 49 14 49 4 43 17 49 6
8 1 S S Tcp N 500,000 23.80 80 25 83 4 76 31 80 6
1 2 B B Http N 250,000 2.21 76 21 79 3 95 31 116 5
2 2 B B Http N 500,000 4.70 130 37 143 4 187 55 221 5
3 2 S S Http N 250,000 3.13 45 12 47 6 42 17 49 6
4 2 S S Http N 500,000 6.58 75 23 78 6 72 28 82 6
5 2 B B Tcp N 250,000 1.84 57 16 62 3 61 22 84 6
6 2 B B Tcp N 500,000 3.80 94 27 109 3 110 39 147 6
7 2 S S Tcp N 250,000 11.94 48 15 48 5 43 17 49 6
8 2 S S Tcp N 500,000 23.76 80 25 83 5 76 31 80 6

B for buffer and S for streaming in the above table. Please note that buffer or streaming is enabled at both ends. There is no mixing of buffer with streaming on either side.

With Http protocol (basicHttpBinding), the round trip increased from 4.74 (buffer) to 6.60 (streaming) seconds (in trial 1 with 500,000 objects). That is an increase of 39%. However with TCP (netTcpBinding), the timing increased from 3.72 to 23.8 seconds (an increase of 539% or 5.3 times). Similar effect can be seen in trial 2 as well.

This clearly tells me that some thing is wrong with Tcp/Streaming in the WCF unless I am doing some thing very very wrong.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

Forgot to mention that I am not doing any multi-threading myself in the code. I am sure WCF is doing it especially in the case of streaming.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

Clemens Vasters - MSFT

You can't ignore the rules for a feature when you are measuring the feature. Streaming is only works if the requirements outlined in http://msdn2.microsoft.com/en-us/library/ms789010.aspx are satisfied as BenK points out.

There is no real point in using streaming in your scenario since you don't have a stream in hands. Streaming is specifically built for scenarios where you have a lot (several megabytes or more) of data and need to pump that across the wire and where buffering all that data is either not an option or where the data is not fully available at the time the communication starts.

Streaming is not built for how you are using it. You are forcing a buffer over a transport mode that's built for streams.

I am not convinced of how realistic your test scenario is as a representation of how the product is used, since I assume your goal is to give guidance to others.Transferring a flat, buffered array of 500000 objects of what I am gathering (from your memory footprint numbers) to be just a few bytes in size each is something that's not very common and much less commonly is such a transfer done on all on the same machine. There doubtlessly are scenarios where a very large number of objects are indeed transferred between parties, but these would use the documented streaming functionality that BenK and myself were pointing to in the docs, not a buffered contract with a streaming transport.

Best Regards
Clemens






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

Thanks for the reply.

I had to stop at 500,000 because HTTP buffered mode simply errored after this number. However HTTP streaming mode worked with higher numbers but I wanted to compare the numbers to those from buffered mode. We were trying to use as much of 1 GB memory as possible because we have a requirement of pumping large number of objects in web service response. (I just upgraded my machine to 2 GB memory today). As I said in my first post, we also tested this scenario cross machine. Same behavior persists.

Even if I am forcing the buffer over a transport that is built for streams, should the performance be this bad This is a severe punishment for misconfiguration. If guidelines need to be followed, it would have been helpful to report an error during runtime when operation contract failed to match streaming rules (rather than making it work erroneously).

One can not help but wonder how HTTP transport (which is built on top of TCP) can perform better than TCP itself even in this mis-configured situation...!

Any way, we will have to test this scenario again by changing the contract to suite streaming mode. Reading the xml from a file is out of question as we need to create objects based on the data from the data base. Also if you have any internal performance studies for WCF, we would love to review them.

Thanks.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

I have modified the buffer contract to streaming contract as shown below:

[ServiceContract(Namespace = "http://xxx.com")]
[XmlSerializerFormat]
public interface MyInterface2
{
[OperationContract]
CustomerData2[] HelloCustomer(int number);
}


public class CustomerData2 : IXmlSerializable
{
....
}

When we tested this cross-machine and locally, we found that Tcp streaming was not the problem but the Http streaming was. It appears that Http streaming was optimized heavily for local scenario compared to remote scenario. Where as the TCP streaming behaved as if it does not care about either local or remote scenario. Here are some numbers:

Test# Trial# Contract Locality Scenario #Objects Round Trip Client Information Server Information
Type Client Mode Server Mode Protocol FI Time (s) PeakVirMem PeakPagedMem PeakWorkingSet Num PeakVirMem PeakPagedMem PeakWorkingSet Num
(MB) (MB) (MB) Threads (MB) (MB) (MB) Threads
1 1 S Local B B Http N 250,000 3.12 216 114 95 7 294 182 146 8
1 1 S Remote B B Http N 250,000 7.12 216 114 95 7 306 182 146 7
2 1 S Local B B Tcp N 250,000 2.72 226 116 85 6 297 184 124 9
2 1 S Remote B B Tcp N 250,000 5.49 212 113 82 6 310 186 124 10
3 1 S Local S S Http N 250,000 4.19 142 52 52 8 139 52 50 9
3 1 S Remote S S Http N 250,000 25.45 143 49 47 9 151 52 51 8
4 1 S Local S S Tcp N 250,000 16.45 138 51 49 8 140 52 51 9
4 1 S Remote S S Tcp N 250,000 15.40 138 50 49 8 153 53 52 9
5 1 S Local B B Http N 500,000 ERR
5 1 S Remote B B Http N 500,000 ERR
6 1 S Local B B Tcp N 500,000 5.22 308 212 150 6 474 351 228 9
6 1 S Remote B B Tcp N 500,000 10.93 308 211 150 6 484 350 229 9
7 1 S Local S S Http N 500,000 8.31 175 79 77 9 172 86 84 9
7 1 S Remote S S Http N 500,000 49.33 175 79 77 9 183 86 85 8
8 1 S Local S S Tcp N 500,000 31.75 171 84 81 8 173 86 85 8
8 1 S Remote S S Tcp N 500,000 30.49 171 81 79 7 185 87 86 9
9 1 S Local S S Http N 1,000,000 16.20 228 141 136 9 224 138 134 9
9 1 S Remote S S Http N 1,000,000 104.00 227 137 132 7 236 138 134 8
10 1 S Local S S Tcp N 1,000,000 62.65 238 143 138 9 225 138 135 9
10 1 S Remote S S Tcp N 1,000,000 60.40 223 143 138 9 235 139 136 10

There is always some thing new to learn...!






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

Continuing our work, we used fast infoset library from noemax to reduce the size of xml on the wire. Here are the results from that study as well:

Test# Trial# Contract Locality Scenario #Objects Round Trip Client Information Server Information
Type Client Mode Server Mode Protocol FI Time (s) PeakVirMem PeakPagedMem PeakWorkingSet Num PeakVirMem PeakPagedMem PeakWorkingSet Num
(MB) (MB) (MB) Threads (MB) (MB) (MB) Threads
1 1 S Local B B Http Y 250,000 3.52 263 138 125 9 255 132 105 12
1 1 S Remote B B Http Y 250,000 4.49 263 138 124 9 252 141 117 10
2 1 S Local B B Tcp Y 250,000 3.58 259 137 124 8 257 133 106 13
2 1 S Remote B B Tcp Y 250,000 4.78 259 137 124 8 149 22 23 7
3 1 S Local S S Http Y 250,000 3.53 223 93 86 11 197 98 97 12
3 1 S Remote S S Http Y 250,000 10.14 223 101 95 11 207 97 96 12
4 1 S Local S S Tcp Y 250,000 4.14 219 93 86 10 197 98 98 12
4 1 S Remote S S Tcp Y 250,000 3.33 219 100 94 10 209 99 99 12
5 1 S Local B B Http Y 500,000 6.69 400 264 237 9 318 207 195 11
5 1 S Remote B B Http Y 500,000 8.72 400 264 237 9
6 1 S Local B B Tcp Y 500,000 6.73 407 269 234 8 321 209 196 13
6 1 S Remote B B Tcp Y 500,000 8.65 407 268 234 8 331 209 196 12
7 1 S Local S S Http Y 500,000 7.25 313 171 156 11 263 162 158 12
7 1 S Remote S S Http Y 500,000 23.80 329 185 171 11 289 164 161 11
8 1 S Local S S Tcp Y 500,000 8.27 309 165 152 10 280 162 160 12
8 1 S Remote S S Tcp Y 500,000 6.67 325 188 175 10 291 163 162 11
9 1 S Local B B Http Y 1,000,000 14.24 586 422 369 9 604 465 360 11
9 1 S Remote B B Http Y 1,000,000 17.44 586 422 369 9 598 452 364 11
10 1 S Local B B Tcp Y 1,000,000 14.10 582 421 368 8 607 467 361 11
10 1 S Remote B B Tcp Y 1,000,000 17.51 582 421 368 8 600 453 365 11
11 1 S Local S S Http Y 1,000,000 14.05 381 238 225 11 395 293 285 12
11 1 S Remote S S Http Y 1,000,000 43.05 394 255 229 11 405 292 286 11
12 1 S Local S S Tcp Y 1,000,000 16.09 374 243 222 10 412 293 287 12
12 1 S Remote S S Tcp Y 1,000,000 13.20 383 241 226 11 424 295 288 12

As I indicated previously, the Http streaming was the cultprit in remote testing. However TCP streaming numbers improved.

Comparing these numbers to those from previous post (i.e without using fast infoset library), it is clear that Tcp binding by itself is not giving the best performance. We were hoping that we do not have to use external library as we expected Tcp binding (by itself) give us the best performance. Obviously this is not the case. It is interesting to note that memory size increased from non-FI mode to FI mode.

We are disappointed to learn that Tcp binding does not give us the best possible results.

Thanks.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

Clemens Vasters - MSFT

Sorry, but you obviously don't read the docs.

[ServiceContract(Namespace = "http://xxx.com")]
[XmlSerializerFormat]
public interface MyInterface2
{
[OperationContract]
CustomerData2[] HelloCustomer(int number);
}


public class CustomerData2 : IXmlSerializable
{
....
}

This is not streaming compatible, because the returned type is an array. It is irrelevant whether the array members are IXmlSerializable.

Nothing in WCF is optimized for a local scenario except, of course, the NetNamedPipeBinding.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

I am simply trying to make this streaming work. We are the liberty to modify the types any way want.

So, if I lump the array elements into another type (say, a holder object for array elements) and have that type implement IXmlSerializable, should it work Please take a look at following modified contract:

[ServiceContract(Namespace = "http://xxx.com")]
[XmlSerializerFormat]
public interface MyInterface2
{
[OperationContract]
CustomerData2List HelloCustomer(int number);
}


public class CustomerData2List : IXmlSerializable
{
public CustomerData2[] Customers;

}

Please let me know if this should work in theory. If you think it should work, then I will proceed with more testing. If not, please give me the guidance on how I should I modify this contract in such a way it complies with documentation.

Thanks.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

Clemens Vasters - MSFT

Yes, that's one way to make it compatible. However, I am under the strong impression that you don't fully understand what the streaming mode is for. I don't think that jamming 500,000 or even a million objects into an array and wrapping it into an object that overrides the serialization behavior is anywhere near what a developer would do in their apps. If you throw a million, 500,000 or even 250,000 business objects across the wire in one go, you likely have a design issue at hand. Any reasonably designed application that I know of would employ paging for these amounts of structured data.

In addition to that, you test case is flawed in that your 500,000 "customer data objects" arrays can't have any serious data in them, because you'd be promptly out of memory in the buffered case if they had. 500,000 * 1 KB (which is at the low end of realistic for anything named "customer", not counting technical overhead) results in 500 MB of net payload. The resulting message infoset plus the buffer for the encoded XML wire data would cause any 32-bit box to give up. And at that data volume, you are measuring a lot of things, just not what you want to measure here. So my guess is that you are shipping empty objects around and measure a good deal of the various side effects of creating and destroying lots of objects. I wouldn't know any WCF application use-case for which it'd make sense to throw hundreds of thousands of empty objects across wires.

But I digress. The streaming mode is -- and I am repeating myself here -- for incrementally transmitting large data streams and/or for transmitting data that is not readily available when the communication is initiated. Typical operation signatures are:

void Transfer( {Stream|Message} stm ) for client-to-service transfers
{Stream|Message} Transfer(T0 arg0, T1 arg1) for service-to-client transfers
{Stream|Message} Exchange({Stream|Message} stm) for bidirectional transfers

I'd call IXmlSerializable is an edge case for streaming, which is enabled because it's compatible (because the objects write into and read from XmlWriter/XmlReader), but the type of data that is usually transferred via streaming is typically not IXmlSerializable enabled; if it is IXmlSerializable enabled, it likely does more than merely replacing the XmlSerializer work. An example for that would be a class that represents a parts catalog for an electronics or car part reseller. These catalogs can easily be several hundred megabytes large and thus you can't keep them in memory at one time. Therefore, IXmlSerialzable would act right over the database connection in this case and write the data into the XmlWriter as it is being pulled from the DB.

The stream signature is optimal for data streams which are already available in stream form. This includes files, live-encoded media, and forwarded network streams. Typically, this data is not XML text-data but rather good-old binary information, such as an MPEG2 video stream.

The bottom line here is that your "benchmark" is trying to make an Apples-to-Apples comparison between two transfer modes for which the realistic use-cases are so vastly different that one of your Apples will always very suspiciously look like an Orange. It would be incredibly hard for anyone to come up with a single service contract that's a fair comparative use-case for buffered and streamed transfers; I actually can't think of one, but can surely say that yours isn't one of them. And since you are using the product against proper practice, your numbers unfortunately don't mean much for anyone, at all. And I am not even starting to mention the various tuning knobs on the bindings that you'd likely have to adjust in any heavy load scenario for optimum performance.

I am also very interested in how you employ the Fast Infoset library specifically; I looked at the vendor's website and unless they're not advertising it, I haven't seen a FI encoder from them.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

Thanks for the lesson on streaming:-) In fact, I do understand what streaming is all about. That is beside the point.

Let me state our use case so we can attack the problem from there (Lets not get into whether this is valid or not because it is valid for us):

1) Lots of customer like objects need to be sent across from the server to client.

2) These objects will be in-memory until they are transported across the wire.

3) Use a web service solution that gives the maximum performance along with less memory footprint with 1 and 2.

So (2) establishes that we will be needing lots of memory. I recognize that at some point I need to worry about running of out of physical memory. The trick is to come up with a number that we can work with given the amount of physical memory. Right now, we are using 2GB (granted that all of this memory is not available to only process on the machine). If we do not have to worry about performance, I can persist the objects to a file and start reading from it when streaming is enabled. Since I need to come up with the fastest time, I did not want to do that.


Next logical thing for us to do is to minimize the memory required for serialization of xml that gets on the wire. If we use the default WCF buffered mode, then we may be doubling the required memory (x MB for objects + y MB for serialized xml) unless I am mistaken. By using XmlWriter on the service side, we are hoping to reduce the amount of memory required for serialization and at the same time increase the performance because the objects are in-memory.

Now that I stated my use case and gave reasons on my approach. I would appreciate if you can give me any guidance on how I can get this job done. If there are several ways on improving the performance for this use case, I am willing to try all of them.

BTW: I tested the scenario that I outlined in my previous post. The numbers improved some but not a whole lot.

Here is the reality. The WCF client-WCF service timings are off by a quite bit when compared to those from WCF client-Java Service (with project tango on glassfish). We will be re-testing just to make sure as this was done some time ago. If you know about StAX specification in java, you would know where I am coming from.

The reason you don't see the FI library for WCF on vendor's site is because we are beta testing their WCF component. I don't think that they released it yet.

Thanks.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

Clemens Vasters - MSFT

Ok. Now we're on the same page. IXmlSerializable is an option for you in this scenario, another is for you to build a Stream-derived class that you layer over your data, or an XmlReader that you layer over your data and put into a custom Message-derived class.

I am interested in looking at the numbers with you and figure out where the differences come from, but I doubt that the forum is a good place for this mostly because it's a bit difficult to share larger amounts of code here. We should take this offline and come back here with the findings afterwards. If you are willing to share your test code with me, please send me an email to clemensv at microsoft dot com. Thank you!






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

R Raghu

I am glad to here that we are on the same page. I will be contacting you soon.

Thanks.






Re: Windows Communication Foundation (Indigo) TCP Streaming - Performance Issue (RTM)

BenK

Let me know what is happening , but at a guess it looks to me that forcing streaming is forcing it to serialize via the xml serializer and still using buffered.

Over tcp it is using Ixmlserializable to generate an xml set. Overt http it can dump this directly to the wire but tcp will leave it in memory until sent. This will give you worse figures than streaming or buffered.

Eg

Buffered Uses Datacontract formatter which is much more efficient and will use less memory .

Im not sure whether the XmlSerializer will pass it 1 element at the time - I think form experience it will not - most code will just have a read till end. If you had a custom Xmlserializer this may work I( but i still doubt it) - curious to find out.

Http forcing IxmlSerializer - Build xml from data , send all data. ( note it is probably doing it buffered but you have teh additional serializton overehad compared to teh formatter)

Tcp forcing IxmlSerializer - Build xml from data , read to end ( note the entire xml is in memory) , pack the xml as binaryxml , to do this it probably needs the whole Xml message ( else how does it substitute the tags with the long value) , when binary xml message is created it will release the memory xml message and send.

Hence it is likely that it is still using buffered - just using more memory to do so ( and much slower) .

Regards,

Ben