AndrewEames

Hi,

I am not a game developer (at least not yet) but I have been experimenting with Xna for a WinForms application I have been building. I've built a very nice example that uses Xna for rendering an image with an overlay plane created with GDI+ (System.Drawing). The performance is really good when I use PresentationInterval.Immediate but when I use PresentationInterval.Default, the call to GraphicsDevice.Present() can take quite a while, presumably because it is waiting for the vertical blank.

Is there a way to render during the vertical blank without my call to GraphicsDevice.Present blocking for up to a frame time

Thanks

Andrew Eames



Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

Shawn Hargreaves - MSFT

I don't think there's any easy way to do this.

It's possible you might be able to call Present from a background worker thread (and make sure you don't issue any other D3D calls from the main thread until this completes) but I haven't tried that.





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

AndrewEames

Thanks for the quicjk response - I just tried this and it seems to work. Do you happen to know if GraphicsDevice.Present does a busy wait or a blocking wait If it does a busy wait, there may not be much benefit to doing this in a background thread.

I haven't done any DirectX development before and I guess I'm a bit surprised that this functionality isn't built in. If I'm going to all this effort to get my pixels onto the screen quickly, why would I want to wait around for the vertical blank - my CPU could be doing something more interesting. In the kind of applications I am building, 15 ms or so is a colossal amount of time

Andrew





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

Shawn Hargreaves - MSFT

I suspect the details of the wait implementation are up to the driver.

Generally the DirectX idea is that you should just call Present, and the driver should do the right thing from there. I'm actually surprised that you are seeing this block: more commonly I'd expect the driver to just queue up the request, return immediately, and perform the operation in the background whenever it can.

I wonder if it's the windowed nature of your app that is preventing this It seems like people most commonly use immediate mode for windowed games, or the retrace synced ones for fullscreen. I imagine the driver has quite a different code path for fullscreen buffer flipping compared to windowed blit-to-window presenting, so maybe it is less efficient about doing the wait-for-retrace in the windowed case





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

AndrewEames

I just ran a quick test and it appears to busy wait in Graphics.Present.

Am I really trying to do something that uncommon - people really use immediate mode in windowed games

My XNA based renderer is still somewhat faster than my GDI-based one but I was really hoping to get rid of the video tearing too but it looks like the CPU cost is just too high (unless I'm doing something silly which is quite likely since this is my first XNA app)

Andrew





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

AndrewEames

I just integrated my Xna based display code into my application and I was extremely disappointed with the performance I got. While my micro benchmarks seemed ok, the application as a whole performed pretty badly.

The application is essentially displaying video at 15 fps with an overlay plane - and I essentially converted it from using GDI BitBlt to using 2 Xna Texture2D objects (1 for the image and 1 for the overlay) and GraphicsDevice.Present.

Running a profiler on the application reveals that GraphicsDevice.Present and Texture2D.SetData are the culprits. Are there any obvious gotchas I should be looking for I had assumed I would get much better performance than GDI but maybe this is not the case

Andrew





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

Shawn Hargreaves - MSFT

I would expect Present to take a while, because this is the main method that kicks off the GPU and actually does the rendering.

Texture2D.SetData shouldn't be showing up that high in a well written app, though. This is a slow method, but for good performance with D3D you shouldn't be doing this inside your rendering loop. Ideally that would all happen up front while loading the app.





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

AndrewEames

I have an overlay image which is different for every single frame I display (computed on the fly) so I need to call Texture2D.SetData twice for every frame - once for the image and once for the overlay

Andrew





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

androidi

AndrewEames wrote:

I have an overlay image which is different for every single frame I display (computed on the fly) so I need to call Texture2D.SetData twice for every frame - once for the image and once for the overlay

Andrew

That sounds almost like a video then Unless you have some requirements that are not mentioned here, that type of thing could be done with the VMR but mixing XNA with all that DirectShow stuff is quite likely painful. Practically I believe it'd involve creating two source filters that have your image data and then mixing them in the video mixing renderer. If you'd need to bring the image into XNA app you'd probably need a custom-allocator-presenter. What kind of interop taxes this kind of operation would involve I have no idea. There should be samples for all of these in the sourceforge site for directshow.net.





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

Shawn Hargreaves - MSFT

AndrewEames wrote:

I have an overlay image which is different for every single frame I display (computed on the fly) so I need to call Texture2D.SetData twice for every frame - once for the image and once for the overlay



This is starting to smell of "you're doing it the wrong way" to me. It's rarely a good idea to call SetData this often as the core part of your rendering strategy.

What exactly are you trying to achieve here If you give some more details of the app, someone might be able to suggest a better way of rendering this.





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

AndrewEames

The app is essentially displaying video with overlay graphics in a WinForms window.

The images may be displayed at up to about 30fps - every frame is different. The images may either be scaled by approx 4x or not scaled at all depending on the video source - the size of the destination window is approx 500x400.

The overlay plane consists of graphics - lines (including dotted and dashed) , rectangles, circles, text, semi-transparent fills. The graphics may be different for each frame - graphics are 32 bit

I'm currently implementing this using WinForms with some custom double-buffering and some custom image scaling for performance reasons. The current performance is actually pretty good but I was looking into DIrectX / Xna as maybe a better future direction to go for even better performance and to eliminate video tearing altogether.

As mentioned in a previous post, I implemented an Xna version of this app (The Xna part is essentially just composing 2 images - the video plus the overlay plane) and it's functionally ok but the performance is way worse than my current implementation. It may be that I just dont know what I am doing (after all it's my first Xna app) or maybe Xna isn't an appropriate solution for this kind of application.

Any insight anyone has into the best technology for this kind of application is much appreciated. I haven't looked at DirectShow at all yet. It may also be that my current implementation is actually the best way of doing things too

Thanks

Andrew





Re: XNA Framework non-blocking GraphicsDevice.Presentredmon

Shawn Hargreaves - MSFT

To make this sort of thing fast in DirectX, the main advice is to avoid setting dynamic data into textures. For your overlay graphics, you could look at drawing these as textured polygons using D3D calls, which would avoid the need to SetData at all. Generally you want to get to a place where each frame, you are just setting renderstates, setting textures and shaders, and calling the various Draw* methods.

That obviously doesn't work for video. To make that fast, you can look at setting the Dynamic usage flag on your textures, which means SetData will go directly into video memory, bypassing the usual D3D resource management. Because the GPU runs asynchronously to the CPU, you will get stalls if you try to SetData on a dynamic pool texture that the GPU is still tryign to render from, so you will need to make 3 or 4 of these textures in a ring buffer, so each frame you can SetData and then render the one that was previously used a number of frames ago, making sure the GPU really will be finished with it to avoid a stall.

SetData is still going to be your slowest thing, so it is worth avoiding using this for your overlays, but using a number of dynamic pool textures in a cycle should get it fast enough to do the job.