Plan for gfx directx

Jump to: navigation, search

This article contains the todo list discussion for the gfx_directx backend. It also presents ideas for hardware accelerated graphics operations.


The input is largely gathered from window messages. Currently, the engine asks for snapshots of current state of each input device. It is the task of the backend to interpret the messages into a block of current state and events. The engine may also attempt to manipulate the position of the mouse cursor within the window. The backend should respect those manipulations unless overridden.



The WM_KEYDOWN message fires every time a key press occurs. It continues firing if that key is held down. It contains information about the actual key being pressed, including scan code, whether it's an extended key, and the virtual key it represents.

The continuous firing for the same key being held down should not generate new events for the engine, since the engine manages that anyway. The backend ignores continuous firing.

The WM_KEYUP message fires every time a key release occurs. It contains information about the actual key being released, including scan code, whether it's an extended key, and the virtual key it represents. There is a bug with the SHIFT keys. It can be recreated as follows:

  • Hold down both SHIFT keys
  • Release one SHIFT key

The one SHIFT key that was released did not have a WM_KEYUP message sent to the window because the other SHIFT key is still depressed. When the other SHIFT key releases, the window receives the WM_KEYUP message for that SHIFT key, but not the one that was released first.

The workaround for this case was using GetAsyncKeyState(VK_LSHIFT) and GetAsyncKeyState(VK_RSHIFT) right before copying all the OHR scancode values to the engine's keyboard state buffer. The VK_SHIFT virtual key is manually toggled, as discussed in L/R Keys.

L/R Keys[edit]

The Ctrl, Alt, Win, Shift, and NumPad keys all have duplicates. Currently, they are all being checked for left vs. right state. This is done by checking the 0x1000000 bit (extended key bit), or the scan code for Shift keys.

Using this feature, NumLock no longer needs to be enabled to differentiate between the NumPad keys and their duplicate counterparts. The NumPad 5 key is a duplicate of the CLEAR key (VK_CLEAR).

gfx_sdl fails to note when both Ctrl, Alt, Shift, and Enter keys are pressed. With Enter, it updates it's state after the other Enter is released. The other said keys do not update state. gfx_directx should ignore this type of behavior, but for compatibility, it will only fire the generic Ctrl, Alt, and Shift once while either of the L/R keys are depressed until both are released.

The VK_CONTROL(Ctrl), VK_MENU(Alt), and VK_SHIFT(Shift) keys are the generic virtual key codes. They are tested accordingly:

...after evaluating a keypress for a L/R vk code, it's generic being VK_*...

if(m_virtualKeys[ VK_* ] == 0x0)
   m_virtualKeys[ VK_* ] = 0x80;
   KB_CREATE_KEYPRESS(m_ohrScans[ c_vk2fb[ VK_* ] ]);

And then for the key release:

...after evaluating a key release for a L/R vk code, it's generic being VK_*...

if(m_virtualKeys[ VK_L* ] == 0x0 && m_virtualKeys[ VK_R* ] == 0x0)
   m_virtualKeys[ VK_* ] = 0x0;
   KB_CREATE_KEYRELEASE(m_ohrScans[ c_vk2fb[ VK_* ] ]);

Toggle Keys[edit]

Toggle keys are reported as actual keyboard state, possible with GetKeyState() right before copying the OHR scancodes to the engine's keyboard state buffer.

Leaving the Window[edit]

Keyboard state, aside from toggle keys, should be cleared when the window becomes inactive instead of maintaining its last state. This has been implemented.

System Keys[edit]

Some key combinations spawn system messages, such as Alt-Tab, Alt-F4, Alt-Space, etc. An option is available to allow those system key presses to be ignored. By default, though, the option should allow system messages.



The window receives many messages related to the mouse: WM_NCMOUSEMOVE, WM_MOUSEMOVE, WM_NC*BUTTONDOWN, WM_*BUTTONDOWN, WM_NC*BUTTONUP, WM_*BUTTONUP, WM_MOUSEWHEEL, and WM_NCHITTEST. (The '*' wildcard represents L, R, or M, for Left, Right, and Middle mouse buttons.) The NC in the message name refers to the non-client area of the window, such as the window title, close button, borders, etc.

Clipping to the Window[edit]

The mouse will clip to the window only when a mouse button is being held down or io_clipcursor() is called from the engine. The mouse should be free-able (which it currently is), as discussed in Toggling Dead/Live Input. Haven't stumbled across any bugs here.

Engine Controlling Cursor[edit]

The engine makes calls to io_setmouse(), which tries to set the position of the cursor on the client area of the ohr window. The engine does not need to know any other information about the screen or client dimensions, just simply requesting the cursor move. The bug here was fixed.

Leaving the Window[edit]

Input state should be cleared when the window loses focus. This includes dropping the clipped cursor area. No bugs here.

Showing and Hiding the Cursor[edit]

The ShowCursor() function is used to show or hide the cursor. Bug fixed here.

Toggling Dead/Live Input[edit]

The backend should allow the user to override the engine's request for the mouse cursor by disabling input to the engine. This is currently done by using the ScrollLock toggle key. This toggle has no effect in fullscreen mode. Setting the input state to 'dead' allows the cursor to be free of the window (in case io_clipcursor() was called by the engine) and prevents any mouse movement being updated.

It has been discussed to use a different key / key combination for disabling/enabling input. ScrollLock is a good choice because it's a toggle key, and it's unlikely to be used by anybody in game implementations. Other than that, no bugs.


Joystick input is supported by DirectInput8. Any number of joystick inputs may be attached to the computer. XInput is not currently supported. The discussion here is for configuration menus available in the Options dialog. There are now discussions for planned improvements for joysticks.

Configuration Menus[edit]

There are currently no configuration menus. A menu should provide the ability to map buttons/axises to OHR engine input (which I think is 4 buttons and 2 scalars for the x and y axis.) Because it is an extra configuration menu, it may not be supported by the backend, and the window will change dynamically for each input device (that is, in listing available buttons/axises), the configuration menu should exist in a separate menu that can be opened from the Options dialog.

Assigning Joysticks to input slots[edit]

Whatever order DirectInput finds the joysticks in, that is the order they are in for input slots. This should be configurable under a separate menu opened from the Options dialog.

Mapping Button presses to Keyboard/Mouse input[edit]

As an extension to the Configuration Menus for buttons/axises, it'd be convenient to add the option to perform macro commands for either mouse or keyboard input. But this may be a little too ambitious. (Though it sounds nice, who cares? The mouse and keyboard aren't even set up to manage anything like that.) This item will probably be scrapped before I think about it too much more.


This section discusses the graphical operations performed by gfx_directx.


The current graphical help of the backend is simply pushing the precomputed image to the window client area. Essentially, in gfx_directx this is by using a surface lock and writing, then using IDirect3DDevice9::StretchRect() onto the backbuffer surface. But there are many scenarios where the driver could malfunction, the device could be lost, or the window surface area could change. The backend is responsible for appropriately recovering and adapting to these changing settings, remaining transparent to the engine.

Lost/Reset devices[edit]

A lost device needs to be reset using IDirect3DDevice9::Reset(). After a device is reset, all the unmanaged resources need to also be reset, which includes the offscreen surface that has the precomputed image copied to it. A need to reset the device occurs anytime a resolution changes, the system is locked, a driver fails and recovers, the system goes to sleep then wakes.

There may be situations in which the device and resources are not appropriately recovering, including the following topic.

Disconnected monitors[edit]

Hot-swapping monitors is a situation in which the device will completely fail and become an invalid memory location. DirectX 9 does not provide a mechanism for recovering from a hot-swapped device. This would also be an issue for laptops which close the lid, effectively stopping the display adapter. The IDirect3D9 object can continue functioning, but there needs to be a safe mechanism for recovering from hot-swapped monitors.

Mike Caron sent an email addressing this point a while ago, but I've lost track of which email that was, and have yet to dig through the 2500 messages to solve this issue.

Aspect ratio preservation[edit]

The aspect ratio of the precomputed image is intended to be preserved, no matter the dimensions of the window. There is an option to disable this in the Options dialog. There is a bug currently exhibited by disabling ARP (aspect ratio preservation) and re-enabling it. The client area will show residual garbage in the black borders that are supposed to be cleared every presentation with IDirect3DDevice9::Clear(). I do not know why this is failing.

Window resizing[edit]

The window resize causes client area calculations to be re-evaluated (like mouse position and clipping rects), and changes the backbuffer dimensions appropriately. By dragging a corner, one can jump to multiples of the native resolution. By grabbing the other borders, non-native resolutions can be attained. Pressing Alt-Enter resizes the window to full screen or back to windowed mode. Haven't experienced any issues here.

Hardware support[edit]

The device will be created with either hardware or software vertex processing. More information as to what models are supported will become available, but earlier iterations of the backend have successfully functioned on DirectX 8.1 hardware.

Testing and Debug Messages[edit]

Initialization debug messages have just been implemented, but further debug messages are desired. This would require the engine to provide the backend with access to a function to write debug information to a file. Otherwise, the backend could just generate it's own debug output file.

Hardware Accelerated Rendering[edit]

This is something not currently exposed to the engine, though could provide greater efficiency and more options in rendering. New interfaces would have to be created to expose these features.

Multiple Windows[edit]

Each window contains a swap chain. All the resources on the same device can be shared on all the swap chains created on that same device. This is not a presentation of multiple-monitor support (DirectX 9 is too hard to add support for that!), but at least multiple windows.

A window can be described as:

  • Title
  • Menu bar
  • System buttons (minimize, restore, close, system menu)
  • Non-client area (entire window area)
  • Client area (inside window area)
  • Message processor

The engine will create windows with certain properties. Even if only the title, system buttons, non-client and client area was being processed, certain messages will have to be processed by the engine. These include:

  • Window activation
  • Window resizing
  • Window closing
  • Window creation
  • Window destruction

The engine can also control more aspects about the windows, such as:

  • Switching between active windows
  • Changing the title
  • Changing the size and position of the window

The client area of a window will usually be used for rendering specialized surfaces called swap chains. A swap chain is uniquely differentiated from regular surfaces. A swap chain contains a collection of surfaces that rotate in presentation. Usually, a front buffer and a single back buffer are all that's required for smooth presentation.

Swap chains must also be tied to a window. They are render targets, but cannot be used as input resources. For example, they cannot be used as input in another window's swap chain.

So when the engine is going to create a window, it will need to provide appropriate callback procedures for each event it processes. Window references created can be treated as the swap chain when dealing with the rendering algorithms, discussed in the interfaces section.


A structure describing a surface would need to contain at least the following information:

  • Identifier
  • Dimensions
    • width
    • height
  • Usage, can be:
    • immutable
    • default
    • dynamic
  • Format, can be:
    • 32bit a8r8g8b8
    • 16bit a1r5g5b5
    • 8bit palette (for back-compat)
    • 128bit floating point a32r32g32b32
    • 64bit floating point a16r16g16b16
    • 32bit floating point r32
    • 16bit floating point r16

Regarding dimensions, the engine should not be aware of the actual width, height, and pitch of the surface. For example, on certain graphics cards, the surface must be a power of 2. On some cards, the surface must be square. The backend will abstract the actual dimensions from the engine, leaving only the "requested" width and height as information.

For usage, the immutable type is created once, never changed. The default type is those surfaces that sit in video memory that can be updated from system memory. The dynamic type is for render targets.

Regarding formats, most hardware natively supports the 32bit a8r8g8b8 type. The 8bit palette type is for any surface that will use the engine's palette index. Because of the engine's current design, all surfaces are technically paletted. There is a potential problem with handling walkabout/etc. graphics that use secondary palettes into the primary palette. For those cases (which includes any object drawn with secondary palettes), the surface should be the default usage type. This would account for the user rapidly changing the secondary palette either in plotscript or in editing. Any surface that directly uses one of the master palettes can be immutable if there's no need to edit it. The floating point surface types are less likely supported on older hardware. Floating point surfaces are useful for special effects, such as bloom and hdr rendering.


Any surface can contain an alpha value which can be used to blend a source and destination pixel color. There are many ways to blend a source alpha surface with a destination alpha surface (too many ways to discuss right this moment.) The most popular blending equation is:

(srcColor x srcAlpha) + (destColor x (1-srcAlpha))

In order for this to look right, transparent surfaces must be drawn in back-to-front order. The engine is responsible for drawing order, so must consider this ordering.

Sketched interfaces[edit]

The following possible interfaces are discussed:

gfx_WindowCreate( szTitle, width, height, callbacks, pWindowOut )
gfx_WindowDestroy( pWindowIn )
gfx_WindowSetTitle( szTitle, pWindowDest )
gfx_WindowGetTitle( pWindowIn, szName, numCharacters )
gfx_WindowSetSize( width, height, pWindowDest )
gfx_WindowSetPosition( x, y, pWindowDest )
gfx_WindowGetRect( pWindowIn, pRect )
gfx_WindowGetClientRect( pWindowIn, pRect )
gfx_WindowCenter( pWindowDest )
gfx_WindowSetActive( pWindowIn )
gfx_WindowGetActive( pWindowOut )
gfx_WindowSetFullscreenMode( pWindow )
gfx_WindowSetWindowedMode( pWindowActivate )
gfx_WindowGetSwapChain( pWindowIn, pSurfaceOut )

gfx_RenderPresent( pWindow )
gfx_RenderSetTarget( pSurface )
gfx_RenderGetTarget( pSurfaceOut )

gfx_SurfaceCreate( width, height, usage, format, pInitialData, pSurfaceOut )
gfx_SurfaceCreateFromFile( szFileName, usage, format, pSurfaceOut )
gfx_SurfaceDestroy( pSurfaceIn )
gfx_SurfaceUpdate( pInputData, pSurfaceDest )
gfx_SurfaceFill( color, pSurfaceDest )
gfx_SurfaceStretch( pRectSrc, pSurfaceSrc, pRectDest, pSurfaceDest )
gfx_SurfaceDraw( pSurfaceSrc, pCornersVec3, argbModifier, pRectSrc )
gfx_SurfaceGetCopy( pSurface, pRectSrc, pOutputData )
gfx_SurfaceSaveToFile( szFileName, format, pSurfaceIn, pRectSrc )


The engine will perform all the transformations on the corners of the objects (vertices). The transformed vertices must be in client coordinates, which defines the upper-left corner of the client as (0, 0), and the lower-right corner as (width-1, height-1).

Two transforms are needed for older games restricted to 320x200 resolution:

  • From local to (0,0)-(319,199) coordinate system
  • From (0,0)-(319,199) coordinate system to client size (0, 0)-(width, height)

Only one transform is needed for future games:

  • From local to client size (0,0)-(width, height)

If represented in a matrix, the first transform is calculated as (which applies to both local-to-old coordinates and local-to-client coordinates):

| cos(theta)*xScale       sin(theta)        0     |
|     -sin(theta)      cos(theta)*yScale    0     |
|    xPosition            yPosition         1     |

where theta is the angle of rotation, xScale and yScale are the scaling factors in the x and y direction, xPosition and yPosition are the position of the object either from (0,0)-(319,199) or (0,0)-(width-1, height-1).

The second transform specific to the old games is calculated as (no integer division):

| width / 320       0            0     |
|     0         height / 200     0     |
|     0             0            1     |

They can be multiplied together, and then used as the transformation matrices for vertices. If A is the first transform, and B is the second, the resulting matrix C is:

C = A * B

Then each of the corners of the object can be transformed as (in C++):

for(int i = 0; i < 4; i++)
  cornerC[i] = cornerL[i] * C;

where cornerC is the coordinates in client coordinates, and cornerL is the local coordinates. The vector-matrix operation would be calculated as:

output.x = local.x * C._11 + local.y * C._12 + local.w * C._13
output.y = local.x * C._21 + local.y * C._22 + local.w * C._23
output.w = local.x * C._31 + local.y * C._32 + local.w * C._33

Each corner of the objects that will be transformed must contain 3 values:

  • X-coordinate
  • Y-coordinate
  • 1

The third value should always be 1, specifying a position vector. If it were 0, the vector would be interpreted as a normal vector, which in this case wouldn't be helpful.

For the transforms to appear correctly, the local corners must be centered on the origin. It must be clear the upper-left corner should not be used as the origin, or rotations will be a mess. This can easily be calculated as:

cornerVec3[0].x = -objectWidth/2;
cornerVec3[0].y = -objectHeight/2;
cornerVec3[0].w = 1;
cornerVec3[1].x = objectWidth/2;
cornerVec3[1].y = -objectHeight/2;
cornerVec3[1].w = 1;
cornerVec3[2].x = -objectWidth/2;
cornerVec3[2].y = objectHeight/2;
cornerVec3[2].w = 1;
cornerVec3[3].x = objectWidth/2;
cornerVec3[3].y = objectHeight/2;
cornerVec3[3].w = 1;

All of this functionality could be designated to different procedures, such as (in C++):

void matrixLocalTransform( float3x3* pMatrixOut, float angle, const float2& scale, const float2& position );
void matrixOldClientTransform( float3x3* pMatrixOut, float clientWidth, float clientHeight );
void matrixMultiply( float3x3* pMatrixOut, const float3x3& A, const float3x3& B );
void vec3Transform( float3* pVec3ArrayOut, int destSize, const float3* pVec3ArrayIn, int srcSize, const float3x3& transformMatrix );
void vec3GenerateCorners( float3* pVecArrayOut, int destSize, const RECT& surfaceRect );

where float3 is a vector struct of 3 floats, float2 is a vector struct of 2 floats, float3x3 is a matrix struct of 9 floats, and RECT is a rectangle struct of 4 integers.

Here's a possible implementation:

struct float2 {
   float x, y;
struct float3 {
   float x, y, w;
struct float3x3 {
   float _11, _12, _13,
         _21, _22, _23,
         _31, _32, _33;
struct RECT {
   long left, top, right, bottom;

void matrixLocalTransform( float3x3* pMatrixOut, float angle, const float2& scale, const float2& position )
   if( pMatrixOut == NULL )
   memset( pMatrixOut, 0, sizeof(float3x3) );

   pMatrixOut->_11 = cos(angle) * scale.x;
   pMatrixOut->_12 = sin(angle);
   pMatrixOut->_21 = -sin(angle);
   pMatrixOut->_22 = cos(angle) * scale.y;
   pMatrixOut->_31 = position.x;
   pMatrixOut->_32 = position.y;
   pMatrixOut->_33 = 1.0f;

void matrixOldClientTransform( float3x3* pMatrixOut, float clientWidth, float clientHeight )
   if( pMatrixOut == NULL )
   memset( pMatrixOut, 0, sizeof(float3x3) );

   pMatrixOut->_11 = clientWidth / 320.0f;
   pMatrixOut->_22 = clientHeight / 200.0f;
   pMatrixOut->_33 = 1.0f;

void matrixMultiply( float3x3* pMatrixOut, const float3x3& A, const float3x3& B )
   if( pMatrixOut == NULL )
   memset( pMatrixOut, 0, sizeof(float3x3) );

   pMatrixOut->_11 = A._11 * B._11 + A._12 * B._21 + A._13 * B._31;
   pMatrixOut->_12 = A._11 * B._12 + A._12 * B._22 + A._13 * B._32;
   pMatrixOut->_13 = A._11 * B._13 + A._12 * B._23 + A._13 * B._33;

   pMatrixOut->_21 = A._21 * B._11 + A._22 * B._21 + A._23 * B._31;
   pMatrixOut->_22 = A._21 * B._12 + A._22 * B._22 + A._23 * B._32;
   pMatrixOut->_23 = A._21 * B._13 + A._22 * B._23 + A._23 * B._33;

   pMatrixOut->_31 = A._31 * B._11 + A._32 * B._21 + A._33 * B._31;
   pMatrixOut->_32 = A._31 * B._12 + A._32 * B._22 + A._33 * B._32;
   pMatrixOut->_33 = A._31 * B._13 + A._32 * B._23 + A._33 * B._33;

void vec3Transform( float3* pVec3ArrayOut, int destSize, const float3* pVec3ArrayIn, int srcSize, const float3x3& transformMatrix )
   if( pVec3ArrayOut == NULL || pVec3ArrayIn == NULL )
   memset( pVec3ArrayOut, 0, sizeof(float3) * destSize );

   for(int i = 0, maxCount = min(srcSize, destSize); i < maxCount; i++)
      pVec3ArrayOut[i].x = pVec3ArrayIn[i].x * transformMatrix._11 + pVec3ArrayIn[i].y * transformMatrix._21 + pVec3ArrayIn[i].w * transformMatrix._31;
      pVec3ArrayOut[i].y = pVec3ArrayIn[i].x * transformMatrix._12 + pVec3ArrayIn[i].y * transformMatrix._22 + pVec3ArrayIn[i].w * transformMatrix._32;
      pVec3ArrayOut[i].w = pVec3ArrayIn[i].x * transformMatrix._13 + pVec3ArrayIn[i].y * transformMatrix._23 + pVec3ArrayIn[i].w * transformMatrix._33;

void vec3GenerateCorners( float3* pVecArrayOut, int destSize, const RECT& surfaceRect )
   if( pVecArrayOut == NULL || destSize < 4 )
   memset( pVecArrayOut, 0, sizeof(float3) * destSize );

   int width = surfaceRect.right - surfaceRect.left;
   int height = surfaceRect.bottom -;

   pVecArrayOut[0].x = (float)-width / 2.0f;
   pVecArrayOut[0].y = (float)-height / 2.0f;
   pVecArrayOut[0].w = 1.0f;

   pVecArrayOut[1].x = (float)width / 2.0f;
   pVecArrayOut[1].y = (float)-height / 2.0f;
   pVecArrayOut[1].w = 1.0f;

   pVecArrayOut[2].x = (float)-width / 2.0f;
   pVecArrayOut[2].y = (float)height / 2.0f;
   pVecArrayOut[2].w = 1.0f;

   pVecArrayOut[3].x = (float)width / 2.0f;
   pVecArrayOut[3].y = (float)height / 2.0f;
   pVecArrayOut[3].w = 1.0f;

Configuration File[edit]