The simple answer to your problem is to use a 1 byte per colour format like D3DFMT_L8 for your texture. Unfortunately, doing a palette lookup like this in a pixel shader may not produce the results you're expecting. You'll need to disable filtering (ie. use point sampling) otherwise your texture's index values will get interpolated before being sent to the pixel shader. If you want filtering you'll need to do it yourself in the shader.
As for the complexity of colour indexed textures, video card manufactures actually dropped support for them because they were too expensive, in terms of space on the chip, to implement. You should consider whether it's possible for you to use one of the DXTn compressed texture formats (eg. D3DFMT_DXT1) instead.
You said before you were trying get four indexes out of a 32-bit ARGB colour, now you're saying you want to get two indexes out of 16-bit colour. If your texture data is in fact really just 8-bit indexes, one per byte, then Format.L8 is the solution to your problem. You get one 8-bit unsigned value per 8-bit "colour" which you can then look up in a 1D texture in your pixel shader.
Also, converting your texture data to an uncompressed format wouldn't be slow using managed code, especially if you use the texture loading methods to do it for you.
You would use the tex2D() intrinsic to sample your 2D Format.L8 texture containing the colour indexes and tex1D() to lookup the index in the 1D "palette" texture. Both intrinsics return 4 element float vector, representing the colour sampled. In the case of sampling from the 2D Format.L8 texture, the unsigned value in the range of 0 to 255 will be scaled to a floating-point value in the range of 0.0 to 1.0 and replicated in all 4 elements (RGBA) of the returned colour.
Your vertex data would actually have a value from 0 to 255 for each component, assuming you're using the standard 32-bit RGBA diffuse colour values. When passed to your vertex shader the diffuse colour would be converted into a 4 element float vector with a value of 0.0 to 1.0 for each component. Your vertex shader would then presumably output the colour unmodified and after being interpolated it would be passed to your pixel shader. You would normally apply the diffuse colour by multiplying (modulating) it with the colour sampled from the texture.
btw. D3DFMT_L8 is the unmanaged equivilent of Format.L8.
Ross Ridge wrote:
You would use the tex2D() intrinsic to sample your 2D Format.L8 texture containing the colour indexes and tex1D() to lookup the index in the 1D "palette" texture. Both intrinsics return 4 element float vector, representing the colour sampled. In the case of sampling from the 2D Format.L8 texture, the unsigned value in the range of 0 to 255 will be scaled to a floating-point value in the range of 0.0 to 1.0 and replicated in all 4 elements (RGBA) of the returned colour.
Your vertex data would actually have a value from 0 to 255 for each component, assuming you're using the standard 32-bit RGBA diffuse colour values. When passed to your vertex shader the diffuse colour would be converted into a 4 element float vector with a value of 0.0 to 1.0 for each component. Your vertex shader would then presumably output the colour unmodified and after being interpolated it would be passed to your pixel shader. You would normally apply the diffuse colour by multiplying (modulating) it with the colour sampled from the texture.
btw. D3DFMT_L8 is the unmanaged equivilent of Format.L8.