Tergiver

To start off, I would like to make a request to the C# language folks: Can we please have a where clause in the foreach statement
Examples:
foreach (Ship ship in Objects where ship is Ship) ...
foreach (Ship ship in Ships where ship.IsVisible) ...
foreach (Ship ship in DockedShips where ship.QueueIndex < queueThreshold) ...

I've heard about Linq, but have not yet had the chance to check it out. I would be interested to know where it would benchmark in the code here (below).

The code below creates a Sector object which contains a list of GameObjects. Ship and Station derive from GameObject. It addes 50,000,000 objects to the list, randomly created as a Ship or a Station. It then runs four tests:

T1
Enumerates the Ship objects by enumerating the Object list directly, testing for Ship type, and casting.

T2
Enumerates the Ship objects through the ShipEnumerable class.

T3
Enumerates the Ship objects through the use of a yield iterator.

T4
Enumerates the Ship objects through the TypedEnumerable class which is a generic version of the ShipEnumerable class.

The tests are each timed and the results displayed. These are the results I get:

T1 baseline
T2 baseline * 1.52
T3 baseline * 1.70
T4 baseline * 2.14

I would like to have my Sector class provide enumerators for various types of objects as I have done here in the code below (there are more then just Ships and Stations). I need to keep them in a single container as an object can implement many interfaces (Ship, Dockable, Commodity, Mountable, etc.). That is to say, there are no separate lists that make sense unless I maintain a whole bunch of lists, in which case an object would appear in multiple lists.

Is there another way to return an enumerator that fetches only a specific type of object without suffering the added processing cost That is, can I provide an enumerator without the test and cast, but get the performance results of the test and cast

The source code should clear up any confusion about what my question is:

Source Code


using System;
using System.Collections;
using System.Collections.Generic;
using System.Windows.Forms;
static class Program
{
  [STAThread]
  static void Main()
  {
    Random rand = new Random();
    Sector sector = new Sector();
    for (int n = 0; n < 50000000; n++)
    {
      if (rand.NextDouble() > 0.5)
        sector.Objects.Add(new Ship());
      else
        sector.Objects.Add(new Station());
    }
    
    QueryPerfCounter qpf = new QueryPerfCounter();
    // T1
    qpf.Start();
    foreach (GameObject obj in sector.Objects)
    {
      if (obj is Ship)
        Test(((Ship)obj).Engine);
    }
    qpf.Stop();
    double t1 = qpf.DurationMilliSeconds;
    double baseline = t1;
    // T2
    qpf.Start();
    foreach (Ship ship in sector.ShipsCustom)
    {
      Test(ship.Engine);
    }
    qpf.Stop();
    double t2 = qpf.DurationMilliSeconds;
    string t2per = (t2 / baseline).ToString("F4");
    // T3
    qpf.Start();
    foreach (Ship ship in sector.ShipsYield)
    {
      Test(ship.Engine);
    }
    qpf.Stop();
    double t3 = qpf.DurationMilliSeconds;
    string t3per = (t3 / baseline).ToString("F4");
    
    // T4
    qpf.Start();
    foreach (Ship ship in sector.ShipsGenericCustom)
    {
      Test(ship.Engine);
    }
    qpf.Stop();
    double t4 = qpf.DurationMilliSeconds;
    string t4per = (t4 / baseline).ToString("F4");
    Console.WriteLine(string.Format("T1 = {0}ms, T2 = {1}ms ({2}), T3 = {3}ms ({4}), T4 = {5}ms ({6})", t1, t2, t2per, t3, t3per, t4, t4per));
  }
  static void Test(string s)
  {
  }
}
public class Sector
{
  List<GameObject>
  objects = new List<GameObject>();
  public List<GameObject> Objects { get { return objects; } }
  public IEnumerable<Ship> ShipsYield
  {
    get
    {
      foreach (GameObject obj in objects)
      if (obj is Ship)
      yield return (Ship)obj;
    }
  }
  public IEnumerable<Ship> ShipsCustom { get { return new ShipEnumerable(objects); } }
  public IEnumerable<Ship> ShipsGenericCustom { get { return new TypedEnumerable<Ship>(objects); } }
}
public class ShipEnumerable : IEnumerable<Ship>, IEnumerator<Ship>
{
  IEnumerator list;
  public ShipEnumerable(IEnumerable list) { this.list = list.GetEnumerator(); }
  public IEnumerator<Ship> GetEnumerator() { return this; }
  IEnumerator IEnumerable.GetEnumerator() { return this; }
  public void Dispose() { }
  public bool MoveNext()
  {
    bool retval = list.MoveNext();
    while (retval && !(list.Current is Ship))
    retval = list.MoveNext();
    return retval;
  }
  public void Reset() { list.Reset(); }
  public Ship Current { get { return (Ship)list.Current; } }
  object IEnumerator.Current { get { return Current; } }
}
public class TypedEnumerable<T> : IEnumerable<T>, IEnumerator<T>
{
  IEnumerator list;
  public TypedEnumerable(IEnumerable list) { this.list = list.GetEnumerator(); }
  public IEnumerator<T> GetEnumerator() { return this; }
  IEnumerator IEnumerable.GetEnumerator() { return this; }
  public void Dispose() { }
  public bool MoveNext()
  {
    bool retval = list.MoveNext();
    while (retval && !(list.Current is T))
    retval = list.MoveNext();
    return retval;
  }
  public void Reset() { list.Reset(); }
  public T Current { get { return (T)list.Current; } }
  object IEnumerator.Current { get { return Current; } }
}
public class GameObject
{
}
public class Station : GameObject
{
  public string Dock { get { return "dock"; } }
}

public class Ship : GameObject
{
  public string Engine { get { return "engine"; } }
}

QueryPerfCounter is a wrapper for the API


using System;
using System.ComponentModel;
using System.Runtime.InteropServices;
public class QueryPerfCounter
{
  [DllImport("KERNEL32")]
  private static extern bool QueryPerformanceCounter(out long lpPerformanceCount);
  [DllImport("Kernel32.dll")]
  private static extern bool QueryPerformanceFrequency(out long lpFrequency);
  private long start;
  private long stop;
  private long frequency;
  Decimal multiplier = new Decimal(1.0e9);
  public QueryPerfCounter()
  {
    if (QueryPerformanceFrequency(out frequency) == false)
    {
      // Frequency not supported
      throw new Win32Exception();
    }
  }
  public void Start()
  {
    QueryPerformanceCounter(out start);
  }
  public void Stop()
  {
    QueryPerformanceCounter(out stop);
  }
  public double AvgDuration(int iterations)
  {
    return ((((double)(stop - start) * (double)multiplier) / (double)frequency) / iterations);
  }
  public double DurationNanoSeconds
  {
    get { return (((double)(stop - start) * (double)multiplier) / (double)frequency); }
  }
  public double DurationMicroSeconds
  {
    get { return (((double)(stop - start) * (double)multiplier) / (double)frequency) / 1000; }
  }
  public double DurationMilliSeconds
  {
    get { return (((double)(stop - start) * (double)multiplier) / (double)frequency) / 1000000; }
  }
  public double DurationSeconds
  {
    get { return (((double)(stop - start) * (double)multiplier) / (double)frequency) / 1000000000; }
  }
}


Re: Visual C# Language Enumerating objects by type

TaylorMichaelL

If you are maintaining multiple objects in the same collection and you need to filter them out then whether you write the code or the compiler does is irrelevant. You still have a conversion to handle. You can't get around the performance issues in this situation. The most efficient way to test for a type is this:

MyObject obj = someObject as MyObject

This does the check and returns the object or null. It is faster and generally recommended over doing an is followed by a cast. If you're using FxCop then it'll generate a warning about this case. Still it has some overhead as the runtime has to look at the type. It is minimal since internally type handles are used so a single comparison should be enough.

Ideally you should break your objects out into separate collections if: performance of enumeration is critical, you can't afford the overhead of a type check and the objects are dissimilar. For example I'd probably keep my static objects separate from my dynamic objects simply because I would rarely care about static objects except when rendering.

Irrelevant you can create a single generic enumerator to enumerate the objects in your collection and return only those that match the appropriate type. This will give you the fastest performance possible for this particular block of code. You could then, ideally, create a simple wrapper class to hide the fact that you're dealing with generics.

public class MyEnumerator<T> where T : MyBaseClass, IEnumerable<T>
{
public MyEnumerator ( MyCollection items ) { }

public bool MoveNext ( )
{
//Check for end of collection

while (not end of collection)
{

current = items[index++] as T;

if (current != null)

return true;

};

return false;
}
}

You'll have to fiddle with the code a little to get it to compile. Alternatively you can use an iterator an expose the enumerator(s) directly off your collection.

public class MyCollection : ICollection, ...
{
public IEnumerable<MySpecialObject> GetSpecialObjects ( )
{

foreach(BaseObject obj in Items)
{

MySpecialObject spc = obj as MySpecialObject;

if (spc != null)

yield return spc;

};

}
}

You could probably make the above method generic to make it a little easier to use.

As for the initial request of modifying foreach that won't happen anytime soon. Foreach relies on IEnumerable to work. That interface is immutable. Therefore foreach would have to be extended to support another interface and languages would have to be extended to use the new syntax. Ultimately LINQ would be a better approach. Note though that you still won't gain anything performance wise. There is still going to be a runtime type check. You could also look into extension methods in v3 but that won't allow you to modify the syntax of the foreach. It also wouldn't solve the performance issue.

Short of separate collections there is no workaround to this issue. Again, though, performance should be high. If this is the performance bottleneck of your app then you're going to have to redesign your solution to optimize how the collection is managed. The general space vs. speed tradeoff comes into play here. You could cache the list of objects returned by an enumerator to optimize performance at the cost of space. This only helps if your collection remains relatively stable after enumeration. If the collection changes you either need to trap the change and update the cached copy or simply drop the cache. There are tradeoffs to both approaches.

Michael Taylor - 10/17/07

http://p3net.mvps.org





Re: Visual C# Language Enumerating objects by type

Tergiver

The type test isn't the issue. I'm not looking to do better then the 'baseline' test (T1). I wanted to provide a simple interface to the various types of objects as the other three tests demonstrate. What I found out was that all three of those implementations are significantly more expensive then using the form of T1. So the question was, is there another form whose performance is close to that of T1.

This inquiry is just a matter of curiosity. If I cannot find a form of enumeration that doesn't add significantly to the cost of the T1 form, I will simply use the T1 form.

I will test the two forms you provided and I am also going to download VC# 2008 Express to check out a Linq example.

As for using is vs. as:

I had a T0 test in there which used the as form but removed it as it's performance was identical to T1. I like the is form because it's more intuitive. If however, I needed to make more then one cast following the is, I would use the as form.

As far as a where clause in foreach, there is nothing to extend. It's just some syntactic sugar:

foreach (Ship ship in Ships where ship.IsVisible) { ... }

would be syntactic sugar for:

foreach (Ship ship in Ships) if (ship.IsVisible) { ... }

foreach (Ship ship in Ships where ship is Ship) { ... }

would be syntactic sugar for:

e=Ships.GetEnumerator();

while (e.MoveNext())

{ ship = e.Current as Ship;

if (ship != null)

{ ... }

}





Re: Visual C# Language Enumerating objects by type

Nimrand

The only added cost with T2, T3, and T4 that can see is the cost of making an additional method call. One thing you might be able to do is to seal your ShipEnumerator and TypeEnumerator classes. This allows the CLR to perform more optimizations because it can optimize virtual calls into non-virtual calls. Don't know if it will work in this case, but worth a try.





Re: Visual C# Language Enumerating objects by type

TaylorMichaelL

For the where clause it isn't as simply as you suggest. In your specific example of the equivalent code your if loop is superfluous. It does nothing when you say:

foreach(Ship ship in Ships)

If the object returned by the enumerator is not of type Ship or a derived type then an exception will occur. It implicitly does a typecast before doing the assignment. Your if statement doesn't accomplish anything. To make the code equivalent you'd have to do this:

foreach(object obj in Ships)
{
Ship ship = obj as Ship;

if (ship != null)

{

...

};
}

So to get the equivalent functionality with a where-type claused you'd have to do this:

foreach(object obj in Ships where obj is Ship)
{

..
}

But now obj is Object rather than Ship. I think LINQ will provide a better alternative for this case.

However you can get a similar behavior today. You can do something like this:

foreach(Ship ship in Ships.Where(delegate { return x is Ship; }))

{

...
};

It's messier but it should work (after fixing compilation errors). This is effectively how LINQs is implemented. Ultimately the delegate is really a predicate which takes an object and returns true or false. If it returns true then it is returned from the enumerator otherwise it is skipped. Of course calling a method for each object will have a detrimental impact on performance as well.

Moving back to your timing. I think you are drawing a few incorrect assumptions from your data. Your data does the enumeration each time but it doesn't take into account the overhead of some operations. In general a generic type should perform the same as a non-generic type. However there is overhead the first time a generic type is used. You need to eliminate this overhead. In your tests I would have expected the non-generic and generic enumerators to perform the same AFTER the generic type is created at least once. Did you run your test through multiple iterations within the same run If not then the initial conversion from the generic type to a concrete type is going to throw off your numbers.

I'm not surprised that the yield keyword takes more time. It has to save off the stack frame for later use so it will be slower than a normal iteration. I think you can optimize your concrete/generic enumerator though. Rather than having your enumerator call the underlying list's enumerator just iterate manually. This eliminates a function call or two which will increase performance slightly. All you really need to track inside your enumerator is the current index within the list. Each invocation increments the index, does the check and loops until you run out of items or find a match.

As a final note be aware that the Stopwatch class was added to v2 for performance gathering. It uses QPF under the hood. You can use this in lieu of your custom implementation.

Michael Taylor - 10/17/07

http://p3net.mvps.org





Re: Visual C# Language Enumerating objects by type

Tergiver

Using the form that you suggested in your first reply (the first one) does produce results that are close to the times of T1. I think you may be right in suggesting that my timing code might not be reflecting what's really going on. Why would the generic form of the same class produce such different results over 50,000,000 iterations I did indeed try creating an instance of the generic prior to the timing loop and that did not affect the results.

Thank you for the Stopwatch info, that's handy to know.

As for the foreach/where statement, I believe you are thinking about it from too high a level. The type check you are refering to is produced as MSIL by the compiler. The compiler can move/alter the type check at will. The foreach statement itself is syntactic sugar. Take it from someone who has written a complier, a compiler which processed a language that included a foreach statement with a where clause using a GetEnumerator-like format, I see no issue adding the clause to the language.

Of course it's a moot point. No one "in charge" is going to hear my plea and if they did, they would at this point probably say something along the lines of, "You don't need 'where', you can use Linq." I would still like to see it added to the language as it's a handy shortcut for all those times when I enumerate a collection with a simple evaluation filter.





Re: Visual C# Language Enumerating objects by type

TaylorMichaelL

Glad to hear from a fellow compiler writer!! I use to write C++ compilers for real-time control for a living. I miss those days...

In .NET the first time you use a generic type the CLR generates a concrete type to back it. Subsequent uses of the same type for the same generic class result in the use of the same class. There is probably some overhead for the lookup involved in this but otherwise it should perform the same as a normal class. However a slight optimization is used. Each distinct value type gets its own concrete type (as needed). So if you create Class<int>, Class<bool> and Class<float> instances then you have 3 concrete types floating around. This prevents a performance hit when using boxing. For reference types (since no boxing is needed) a single concrete type is created. So if you create Class<string>, Class<Form> and Class<Control> then only a single concrete type is used for all references.

So, you should see a performance hit the very first time you use the generic type but thereafter you shouldn't really notice it. So, in theory, the timing should be the same for a regular strongly-typed class as a generic type after it has been instantiated at least once. I've never actually benchmarked it so I can't say for certainty that it is true.

Why you still see such a difference in the foreach may be related to something else. Maybe the generated code is using the IEnumerator interface rather than the strongly typed version. You'd have to look at the generated IL code to figure that out.

Michael Taylor - 10/17/07

http://p3net.mvps.org





Re: Visual C# Language Enumerating objects by type

Tergiver

I just tried a Linq version by adding the following property to the Sector class and compiling it with VC# 2008 Express Beta 2:

Code Block

public IEnumerable<Ship> ShipsLinq
{
get
{
return from obj in objects
where obj is Ship
select (Ship)obj;
}
}

It fared the same as the generic TypedEnumerator (baseline * 2). However, if my benchmarking is itself suspect..





Re: Visual C# Language Enumerating objects by type

Tergiver

I hope you are not responible for the Intel 80C196 C/C++ compiler (I have forgotten the company that produced it), circa 1995. That was the buggiest compiler I have ever encountered!





Re: Visual C# Language Enumerating objects by type

TaylorMichaelL

No, no. The compilers I was responsible for were used in a propretiary product and not as a general use compiler.

One thing I should clarify is that a generic type -> concrete type conversion might happen during JIT of the containing function rather than when the generic type is instantiated. I don't know that MS has ever officially given the timing rules for the JIT/creation process beyond saying that it will occur before the line in question is executed. We logically assume that it happens when we execute the line but this might not necessarily be true. The CLR might even optimize the code based upon heuristics. For example the CLR might decide to generate the concrete type when it compiles the containing method if the type will be needed within the method. It might, however, decide to wait until it hits the line in question if the type might not be needed (contained within an if statement, for example). This allows the CLR some flexibility without breaking existing code. This could be influenced by whether the type is contained in an assembly that hasn't been loaded yet. There are many "experts" floating around that like to state how it "really" works but few can provide any proof beyond sampling which itself can be faulty. Only the CLR team can say for certain.

In this simple test (using VS2008) I compared the timing of instantiation for a generic type. What I expected to have happen was that the first timing interval should take longer than the second, and it did. This would seem to indicate that the conversion (in this case) took place when the line was executed and follows along with the general CLR philosophy of delay loading types only when needed.

Code Block

static void CallFoo ()
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int nIdx = 0; nIdx < 10000; ++nIdx)
{
SomeGenericType instance = new SomeGenericType();
};
sw.Stop();
long time1 = sw.ElapsedTicks;

sw.Reset();
sw.Start();
for (int nIdx = 0; nIdx < 10000; ++nIdx)
{
SomeGenericType instance2 = new SomeGenericType();
};
sw.Stop();
long time2 = sw.ElapsedTicks;
}

Calling this particular function twice should result in the first timing being high while the remaining three being low but that isn't what happened. Instead the first and last timings were higher. This brings up another potential issue for you, the GC. I have 2GB of process space to work with so it is odd that the GC would kick in other than perhaps for stack space. Inserting a call to Collect between the method invocations brings the fourth timing back down to what I expected.

Another test I tried is seeing if I could detect if a generic type is converted to a concrete type during JIT of the containing method or when the line is executed.

Code Block

static void TestGeneric (bool ret)
{
for (int nIdx = 0; nIdx < 100000; ++nIdx)
{
};

if (ret)
return;
}

static void TestGeneric2 (bool ret)
{
for (int nIdx = 0; nIdx < 100000; ++nIdx)
{
};

if (ret)
return;

{

SomeGenericType<int> instance = new SomeGenericType<int>();
};
}

These two functions are identical except for the generic type in the second one. Because the if statement might or might not be true the compiler can't optimize out the call (although it could optimize out the instance creation). My timing reveals that the second call is slower than the first even though ret is always true. This would seem to indicate that the generic type -> concrete type conversion occurs when the method is executed rather than when the instantiation occurs. If the compiler were to move the declaration above the for loop (like C use to do) then that would invalidate the results but that is not happening here. While the stack allocation will occur I don't it is as expensive as the timing would indicate.

A slight modification to the test where I create an instance of the generic type before calling either function reveals that the second function still takes longer but not as much.

So, I said all that to clarify that using generics has some overhead even if the generic type isn't necessarily used. You might or might not be able to use generics and get better performance by pre-instantiating them. I remember a discussion back when v2 was first released where someone said they heard generics were faster than concrete types. Ultimately it was clarified that nobody ever said they would execute faster but rather they would speed up coding and development. They might speed up execution if you were previously using value types with object collections but that is about it. If you are really concerned about performance then using generics is probably not a good idea. Sticking with custom classes is the better (albeit slower) approach for you. Good luck.

Michael Taylor - 10/18/07

http://p3net.mvps.org





Re: Visual C# Language Enumerating objects by type

Tergiver

It turns out I am going to roll my own collection and enumerator class anyhow as I will often have need to remove the currently enumerated object from the collection, which you can't do with one of the built-in collections/enumerators.

That's not to say that this was a waste of time however, I learned a great deal and I place a great deal of value on learning new things.

Upon further reflextion about the foreach/where clause I do see the sticky point you were alluding to. I wouldn't call it "not simple" though.

This is the form of the current foreach:

Code Block

// foreach (Ship ship in Objects) { ... }
e = GetEnumerator()
while (e.MoveNext())
{
Ship ship = e.Current
...
}

In this case, an exception occurs if an item in the collection cannot be cast to Ship.

Alternately the foreach statement could have this form:

Code Block

// foreach (Ship ship in Objects) { ... }
e = GetEnumerator()
while (e.MoveNext())
{
Ship ship = e.Current as Ship
if (ship != null)
{
...
}
}

If this were the case, we could filter out objects by type using the current syntax. But it is not the case, and we cannot break the existing functionality of the foreach statement, so to get the second form, we need new syntax. That's where the where clause comes in. We introduce a new grammer rule:

foreach '(' type identifier in expression where expression ')' block

This gives us a 3rd form:

Code Block
// foreach (Ship ship in Objects where ship.IsVisible) { ... }
e = GetEnumerator()
while (e.MoveNext())
{
Ship ship = e.Current
if (ship.IsVisible)
{
...
}
}

The sticky bit is what you brought up. This form can't get us to the 2nd form.

The solution is simple: Add a new grammer rule:

foreach '(' type identifier1 in expression where identifier1 is type ')' block

This gets us the 2nd form.





Re: Visual C# Language Enumerating objects by type

Tergiver

Blast it! Before you can throw a monkey wrench into my plans, I'll do it myself.

Although the above is a solution, it's not a very good one. It doesn't allow for complex expressions in the where clause like "where ship is Ship and ship.IsVisible".

We would need a new foreach keyword to get the 2nd form.

foreachtype '(' type identifier in expression where expression ')' block

Code Block
// foreachtype (Ship ship in Objects where expression) { ... }
e = GetEnumerator()
while (e.MoveNext())
{
Ship ship = e.Current as Ship
if (ship != null && expression)
{
...
}
}

That gets us the 2nd form. The where clause is still useful in both forms, the compiler would have to use AND in the conditional for the foreachtype statement.





Re: Visual C# Language Enumerating objects by type

Tergiver

Ok, so you were right all along. This is me wiping the egg off my face.





Re: Visual C# Language Enumerating objects by type

TaylorMichaelL

But I didn't get a chance to toss the egg yet :}

At any rate I think you can achieve your goal today using the same technique that LINQs uses. Adding a predicate to the enumerator allows for filtering on the fly. If the predicate is tightly coupled with the enumerator you'll get good performance. Alternatively you could create a decorator to wrap an existing enumerator with a predicate. In this case performance will be no better than before but it provides more reusability. Ultimately, though, your best performance will be with a hand optimized enumerator.

Michael Taylor - 10/18/07

http://p3net.mvps.org





Re: Visual C# Language Enumerating objects by type

Tergiver

Ok, I'm going to let this go, I promise. Just one last post.

I wondered why I originally thought this was so easy as I have, as I said earlier, done this in a scripting language I wrote many years ago.

So dug out that old project and discovered that I solved the problem by cheating.

Objects in this script language wear their types on their sleeves so to speak. Just as they do in C#. So the script code looks like this:

foreach (Ship ship in Objects where ship.CType == "Ship")

The language doesn't do type checking on user-defined objects. It's similar in some ways to Lua's tables. Any object member that doesn't exist has the value of null.