The ZMan

None of the code DX framework code in that tutorial is relevant to the picking example.

The code does exactly what Etienne explained but I will try a different way.

When you draw 3d objects you set a world, view and projection matrix. These 3 matrices determine how an object (which has its own coordinate sytem - usually 0,0,0 is in the center) gets drawn on the screen.

An object starts in object space (0,0,0 in it center usually)

the GPU multiplies it by the world matrix to place it in the world at world space

the GPU multiplies this by the view matrix to position the world so that it looks like it is being viewed from a camera - view or camera space

the GPU multiplies by the projection matrix to 'flatten' the 3d world into 2d space (plus depth) so that it can be drawn on the screen - screen or clip space

Note that screen space is -1<x<1 and -1<y<1 not PIXEL coordinates

So to determine if you have clicked on an object you start with PIXEL coordinates - Convert the pixels to screen space. Remember that this is a 3d world so your 2d click represents a 3d ray going directly into the screen. So construct a vector that goes from the near plane to the far plane. YEs this means you could be clicking on more than one thing along that ray.

Divide by (or multiply by the inverse) of the projection, view matrix and then for each object in your scene divide this ray by the world matrix for that object. Now the ray is in object space and you can compare it to the actual object.

If more than one collision is found then sort byt distance from the camera since you usually want the front one.

You should get a good book on game math or graphics to explain all of these concepts - its only going to get harder from here and if you can't follow a siple example like that you will have problems.