Christer Kaivo-oja

Hi,

We have a .net 1.1 application that is displaying very strange behaviour with respects to garbage collection. The application is designed to pull data from a SQL database and generate a report out of this. The database is relatively large (roughly 30G of data) and the application needs to go through fairly large portions of this data to generate the output. We use a cache with weak links to reduce the actual SQL queries that are being run, which is a huge increase in performance under normal circumstances.

Now, the issue we are having is that the application starts to garbage collect excessively (on the order of 10 gen-0 GCs, and around 2 gen-1 GCs per second) randomly during execution. Normally, the application can end up using around 1GB of memory, but once this garbage collection issue appears, the application stops at around 100MB.

When this GC issue crops up, our cache with weak references is dropped and the application starts hammering the SQL server with queries and ends up slowing down to a crawl.

I have included a dump of "!eeheap -gc" from WinDbg when the server is in this state at the bottom of this post. As you can see, there are 8 cores on this server, and each core has about 1MB of data in the heap (all in gen-2 due to this excessive GC).

There is about 3GB of free physical RAM on the server, and the .net process is, as I mentioned above, using about 100MB in total.

Doing a "!dumpheap -stat" shows that a lot of the data in the heap is strings (on the order of 100,000 of them) which to me doesn't seem too excessive.

My question is, why does the application enter this state of frenzied GC calls when there should be plenty of memory left for the application At times it runs fine, and grows up to the expected 1GB of memory usage while at other times it goes straight into this GC-frenzy mode.

Any insight into what may cause this would be appreciated. If you have suggestions on what to read to figure out what's going on and hopefully fix the issue, I'd be happy for a link to it or an ISBN for a book describing it.

Thank you in advance!



0:045> !eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (0x000d2468)
generation 0 starts at 0x0286ed40
generation 1 starts at 0x02841914
generation 2 starts at 0x027b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x027b0000 0x027b0030 0x0286ed4c 0x000bed1c(781,596) 0x01c2f000
Large object heap starts at 0x127b0030
segment begin allocated size reserved
0x127b0000 0x127b0030 0x128051a0 0x00055170(348,528) 0x007aa000
Heap Size 0x255524(2,446,628)
------------------------------
Heap 1 (0x000d2b88)
generation 0 starts at 0x04852d2c
generation 1 starts at 0x048220fc
generation 2 starts at 0x047b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x047b0000 0x047b0030 0x04852d38 0x000a2d08(666,888) 0x01c1f000
Large object heap starts at 0x12fb0030
segment begin allocated size reserved
0x12fb0000 0x12fb0030 0x12fcd9e0 0x0001d9b0(121,264) 0x007df000
Heap Size 0xcca7c(838,268)
------------------------------
Heap 2 (0x000d3698)
generation 0 starts at 0x06839064
generation 1 starts at 0x06834014
generation 2 starts at 0x067b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x067b0000 0x067b0030 0x06839070 0x00089040(561,216) 0x01c9f000
Large object heap starts at 0x137b0030
segment begin allocated size reserved
0x137b0000 0x137b0030 0x137b0030 0x00000000(0) 0x007a4000
Heap Size 0x89040(561,216)
------------------------------
Heap 3 (0x000d4180)
generation 0 starts at 0x088949a0
generation 1 starts at 0x0887662c
generation 2 starts at 0x087b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x087b0000 0x087b0030 0x088949ac 0x000e497c(936,316) 0x01c5f000
Large object heap starts at 0x13fb0030
segment begin allocated size reserved
0x13fb0000 0x13fb0030 0x13fb0030 0x00000000(0) 0x007e2000
Heap Size 0xe497c(936,316)
------------------------------
Heap 4 (0x000d4c68)
generation 0 starts at 0x0a8759e4
generation 1 starts at 0x0a87097c
generation 2 starts at 0x0a7b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x0a7b0000 0x0a7b0030 0x0a8759f0 0x000c59c0(809,408) 0x01bef000
Large object heap starts at 0x147b0030
segment begin allocated size reserved
0x147b0000 0x147b0030 0x147b0030 0x00000000(0) 0x007c2000
Heap Size 0xd3d84(867,716)
------------------------------
Heap 5 (0x000d5750)
generation 0 starts at 0x0c8675b0
generation 1 starts at 0x0c826948
generation 2 starts at 0x0c7b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x0c7b0000 0x0c7b0030 0x0c8675bc 0x000b758c(750,988) 0x01caf000
Large object heap starts at 0x14fb0030
segment begin allocated size reserved
0x14fb0000 0x14fb0030 0x14fb0030 0x00000000(0) 0x007c2000
Heap Size 0x20cfe8(2,150,376)
------------------------------
Heap 6 (0x000d6028)
generation 0 starts at 0x0e8aee0c
generation 1 starts at 0x0e8a1ffc
generation 2 starts at 0x0e7b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x0e7b0000 0x0e7b0030 0x0e8aee18 0x000fede8(1,043,944) 0x01bff000
Large object heap starts at 0x157b0030
segment begin allocated size reserved
0x157b0000 0x157b0030 0x157b0030 0x00000000(0) 0x007c2000
Heap Size 0x106de8(1,076,712)
------------------------------
Heap 7 (0x000d6b10)
generation 0 starts at 0x1087d144
generation 1 starts at 0x10852cf8
generation 2 starts at 0x107b0030
ephemeral segment allocation context: none
segment begin allocated size reserved
0x107b0000 0x107b0030 0x1087d150 0x000cd120(839,968) 0x01baf000
Large object heap starts at 0x15fb0030
segment begin allocated size reserved
0x15fb0000 0x15fb0030 0x15fb0030 0x00000000(0) 0x007df000
Heap Size 0xcd120(839,968)
------------------------------
Reserved segments:
------------------------------
GC Heap Size 0x9445d0(9,717,200)



Re: Common Language Runtime Excessive garbage collection in .net 1.1

Feng Chen - MSFT

Hi Christer,

To troubleshoot this issue, we really need the source code to reproduce the problem, so that we can investigate the issue in house. It is not necessary that you send out the complete source of your project. We just need a simplest sample to reproduce the problem. You can remove any confidential information or business logic from it.

Thanks!






Re: Common Language Runtime Excessive garbage collection in .net 1.1

Feng Chen - MSFT

Hi Christer,

We are changing the issue type to ¡°Comment¡± because you have not followed up with the necessary information. If you have more time to look at the issue and provide more information, please feel free to change the issue type back to ¡°Question¡± by editing your initial post and changing the radio button at the top of the post editor window. If the issue is resolved, we will appreciate it if you can share the solution so that the answer can be found and used by other community members having similar questions.

Thank you!






Re: Common Language Runtime Excessive garbage collection in .net 1.1

Christer Kaivo-oja

Hello,

The problem here is that the issue happens only with this particular large application that accesses a lot of data. Providing the code would be useless without the large database to run it on I'm afraid. Due to the nature of the code and database, I am unable to provide them in their complete form, and I do not know exactly what part of the code causes this problem, so it is incredibly difficult for me to extract a minimal example that produces it.

The issue itself is very curious. Let me provide some more details on what happens:

The application runs under IIS, and has been tested on Windows 2003 Server (standard edition) (with the w3wp.exe process running it), Windows 2003 Server Enterprise and Windows XP Pro (aspnet_wp). I have tried all three versions of the GC (server, concurrent true/false).

The problem seems to happen more frequently if there are processes running that use up a lot of RAM (such as MSSQL with lots of stuff in its cache), but this is simply a "gut feeling" and not based on actual measurements. Running this particular app twice in a row under the same worker process will practically always make it go into this state.

Again, physical RAM is not the issue here since this happens when the machine has more than 3GB of free physical RAM. The worker process peaks at somewhere aroud 100-200MB, so it's not an OS process size limit either.


The most perplexing fact here is that the problem doesn't appear all the time. Sometimes, I am able to run the application without the GC going wild but other times it goes into this state.

And that's another thing.. Once the GC starts going wild like this, the worker process stays in this mode until it's restarted.

Structurally, the application does something like this (in ad-hoc pseudocode):

foreach client {
read client data
cache commonly used data (using a System.Web.Caching.Cache object)
process data
produce result
}

The data in this case consists of large amounts of small objects (on the order of 20-50k objects in total per client, counted in C# object instances). All data is read from a MSSQL server.

When the GC "behaves", the number of GCs in generations 0, 1 and 2 are roughly linear and increase at the same rate (1:1:1 pretty much). I do realize that this is not what .net applications should "normally" have but given how many objects are created and kept around for each client, this doesn't seem that wrong to me. GCs will occurr, and objects will promote to gen2 because of this. However, once the application switches to the next client in the list, the previous client's data can and will be collected (by this time, it's likely to be in gen-2).

When the GC misbehaves, the GC generation ratio is more like 100:10:1, which is more in line with what's considered normal around the net. However, when the GC is in this mode, it will collect the entire cache and all in-memory session objects all the time. This, of course, makes the application query the database hundreds of times per second for the commonly used data that should be in the cache.

My theory is that the CLR has some form of state internally that goes wrong in this case, and once this state is reached, the CLR will never come out of it without a restart. Maybe it sees the large amounts of objects being promoted to gen1 and 2, and increases the heap sizes for these too much, which causes gen-0 to be reduced in size to accomodate With a smaller gen-0 heap, there would be more GCs which would worsen the problem since the GC will not be able to collect the objects for the client that is being processed.

Another possibility would be that the CLR has some form of "low memory" flag internally that is erronously enabled even though there is plenty of memory to spare. I personally find this possibility less likely though.

I hope that this information will help you getting an idea of what is going on here. The strange part in my opinion is that it's random and that this issue happens randomly (let's say 75% of the time the app is run). It also "feels" like the nonconcurrent GC is less likely to produce this problem than the concurrent and the server version. The app itself has been tested on multiple machines, with 8, 4 and 2 CPU kernels and varying amounts of physical RAM (from 1 to 5 GB).




Re: Common Language Runtime Excessive garbage collection in .net 1.1

Muhammad Adel

I think it could be a memory fragmentation problem, which means having memory that is empty but not usable.

You say that you are creating a lot of objects for many clients and keeping them around. This can easily cause a memory framgentation problem if these objects are always in use (such as response buffers in case of asp.net) as the garbage collector cannot move objects which are in use by clients, specially when used for a long time due to slow connection.
A similar problem with asp.net response buffer is desribed here.

If you have any tool that can test memory fragmentation then you can use it to see if this assumption is true.

I have three ideas:

1-Why are you using a weak reference for the cache why don't you use an ordinary reference to prevent the garbage collector from deleting the cache

2-Can you find a time where you gurantee there are no request on your applications or there are no objects in use by clients if so, you can call GC.Collect() to free the memory and defragment it in this idle time, so it will not cause delayed response for the users. Doing this defragmentation periodically there will be no reason for the garbage collector to trigger insanly.

3-Can you configure your application to use .NET framework 2.0 to make sure it is not a bug that was already fixed in .NET framework 1.1






Re: Common Language Runtime Excessive garbage collection in .net 1.1

Christer Kaivo-oja

Thank you for your reply!

The nature of the app is that we have one instance of it running at any one time. The objects that are created for a client in the loop are mostly discardable after the client has been processed, but we do have a number of basic objects that are stored on the cache to prevent excessive SQL queries for data that is common over the entire system.

1. The cache object itself is not a weak reference. The garbage collector simply chooses to collect _all_ objects that are stored in the cache object (ie, the weak references) when it goes into this mode.

2. I have tried modifying my work loop to include an explicit GC in between processing each client. However, this doesn't affect anything at all.

3. The problem with .NET 2.0 is that this application runs on a production server and does critical work on a daily basis. There are also third-party modules that are used, which are untested in .NET 2.0. Taking the time to test everything under .NET 2.0 is simply too expensive at this time.


However, memory fragmentation is quite possible since the application allocates a large number of objects, in what could appear as random order. A number of these objects are thrown onto the cache, and I guess they could well be scattered around the heap more or less randomly.

I'm a bit confused as to what happens with the cached data when a GC is run. According to everything I've read, the CLR will simply collect all weak references in the generation it's processing at the moment, but that would essentially make our cache pointless at all times. This is clearly not the case though (since the cached data stays in memory just fine when the app "works properly"). So the Cache object must be doing something else as well internally. It has support for priorities as well, which is something that the things I've read about the GC doesn't talk about at all.

What tools would I use to investigate memory fragmentation I have WinDBG installed, but can't say I'm very used to working with it. Pointers to where I can start reading would be appreciated.