From fcd758868b494411247ef94e0ea7cbac3e3ed79f Mon Sep 17 00:00:00 2001 From: Ian Romanick Date: Mon, 9 Oct 2006 18:26:03 +0000 Subject: Added dispatch.html. This new file documents Mesa's implementation of GL function dispatch. It's not quite complete, but I think there's enough there to make it useful. --- docs/contents.html | 1 + docs/dispatch.html | 274 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 275 insertions(+) create mode 100644 docs/dispatch.html (limited to 'docs') diff --git a/docs/contents.html b/docs/contents.html index 7a91aa62f0..9f09df9737 100644 --- a/docs/contents.html +++ b/docs/contents.html @@ -73,6 +73,7 @@ a:visited {
  • fbdev/DRI Environment
  • glFBDev Driver
  • Custom Development +
  • GL Dispatch Links diff --git a/docs/dispatch.html b/docs/dispatch.html new file mode 100644 index 0000000000..b9ea8822e6 --- /dev/null +++ b/docs/dispatch.html @@ -0,0 +1,274 @@ + + +GL Dispatch in Mesa + + + + +

    GL Dispatch in Mesa

    + +

    Several factors combine to make efficient dispatch of OpenGL functions +fairly complicated. This document attempts to explain some of the issues +and introduce the reader to Mesa's implementation. Readers already familiar +with the issues around GL dispatch can safely skip ahead to the overview of Mesa's implementation.

    + +

    1. Complexity of GL Dispatch

    + +

    Every GL application has at least one object called a GL context. +This object, which is an implicit parameter to ever GL function, stores all +of the GL related state for the application. Every texture, every buffer +object, every enable, and much, much more is stored in the context. Since +an application can have more than one context, the context to be used is +selected by a window-system dependent function such as +glXMakeContextCurrent.

    + +

    In environments that implement OpenGL with X-Windows using GLX, every GL +function, including the pointers returned by glXGetProcAddress, are +context independent. This means that no matter what context is +currently active, the same glVertex3fv function is used.

    + +

    This creates the first bit of dispatch complexity. An application can +have two GL contexts. One context is a direct rendering context where +function calls are routed directly to a driver loaded within the +application's address space. The other context is an indirect rendering +context where function calls are converted to GLX protocol and sent to a +server. The same glVertex3fv has to do the right thing depending +on which context is current.

    + +

    Highly optimized drivers or GLX protocol implementations may want to +change the behavior of GL functions depending on current state. For +example, glFogCoordf may operate differently depending on whether +or not fog is enabled.

    + +

    In multi-threaded environments, it is possible for each thread to have a +differnt GL context current. This means that poor old glVertex3fv +has to know which GL context is current in the thread where it is being +called.

    + + +

    2. Overview of Mesa's Implementation

    + +

    Mesa uses two per-thread pointers. The first pointer stores the address +of the context current in the thread, and the second pointer stores the +address of the dispatch table associated with that context. The +dispatch table stores pointers to functions that actually implement +specific GL functions. Each time a new context is made current in a thread, +these pointers a updated.

    + +

    The implementation of functions such as glVertex3fv becomes +conceptually simple:

    + +
      +
    • Fetch the current dispatch table pointer.
    • +
    • Fetch the pointer to the real glVertex3fv function from the +table.
    • +
    • Call the real function.
    • +
    + +

    This can be implemented in just a few lines of C code. The file +src/mesa/glapi/glapitemp.h contains code very similar to this.

    + +
    + + +
    +void glVertex3f(GLfloat x, GLfloat y, GLfloat z)
    +{
    +    const struct _glapi_table * const dispatch = GET_DISPATCH();
    +    
    +    (*dispatch->Vertex3f)(x, y, z);
    +}
    Sample dispatch function
    +
    + +

    The problem with this simple implementation is the large amount of +overhead that it adds to every GL function call.

    + +

    In a multithreaded environment, a niave implementation of +GET_DISPATCH involves a call to pthread_getspecific or a +similar function. Mesa provides a wrapper function called +_glapi_get_dispatch that is used by default.

    + +

    3. Optimizations

    + +

    A number of optimizations have been made over the years to diminish the +performance hit imposed by GL dispatch. This section describes these +optimizations. The benefits of each optimization and the situations where +each can or cannot be used are listed.

    + +

    3.1. Dual dispatch table pointers

    + +

    The vast majority of OpenGL applications use the API in a single threaded +manner. That is, the application has only one thread that makes calls into +the GL. In these cases, not only do the calls to +pthread_getspecific hurt performance, but they are completely +unnecessary! It is possible to detect this common case and avoid these +calls.

    + +

    Each time a new dispatch table is set, Mesa examines and records the ID +of the executing thread. If the same thread ID is always seen, Mesa knows +that the application is, from OpenGL's point of view, single threaded.

    + +

    As long as an application is single threaded, Mesa stores a pointer to +the dispatch table in a global variable called _glapi_Dispatch. +The pointer is also stored in a per-thread location via +pthread_setspecific. When Mesa detects that an application has +become multithreaded, NULL is stored in _glapi_Dispatch.

    + +

    Using this simple mechanism the dispatch functions can detect the +multithreaded case by comparing _glapi_Dispatch to NULL. +The resulting implementation of GET_DISPATCH is slightly more +complex, but it avoids the expensive pthread_getspecific call in +the common case.

    + +
    + + +
    +#define GET_DISPATCH() \
    +    (_glapi_Dispatch != NULL) \
    +        ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key)
    +
    Improved GET_DISPATCH Implementation
    +
    + +

    3.2. ELF TLS

    + +

    Starting with the 2.4.20 Linux kernel, each thread is allocated an area +of per-thread, global storage. Variables can be put in this area using some +extensions to GCC. By storing the dispatch table pointer in this area, the +expensive call to pthread_getspecific and the test of +_glapi_Dispatch can be avoided.

    + +

    The dispatch table pointer is stored in a new variable called +_glapi_tls_Dispatch. A new variable name is used so that a single +libGL can implement both interfaces. This allows the libGL to operate with +direct rendering drivers that use either interface. Once the pointer is +properly declared, GET_DISPACH becomes a simple variable +reference.

    + +
    + + +
    +extern __thread struct _glapi_table *_glapi_tls_Dispatch
    +    __attribute__((tls_model("initial-exec")));
    +
    +#define GET_DISPATCH() _glapi_tls_Dispatch
    +
    TLS GET_DISPATCH Implementation
    +
    + +

    Use of this path is controlled by the preprocessor define +GLX_USE_TLS. Any platform capable of using TLS should use this as +the default dispatch method.

    + +

    3.3. Assembly Language Dispatch Stubs

    + +

    Many platforms has difficulty properly optimizing the tail-call in the +dispatch stubs. Platforms like x86 that pass parameters on the stack seem +to have even more difficulty optimizing these routines. All of the dispatch +routines are very short, and it is trivial to create optimal assembly +language versions. The amount of optimization provided by using assembly +stubs varies from platform to platform and application to application. +However, by using the assembly stubs, many platforms can use an additional +space optimization (see below).

    + +

    The biggest hurdle to creating assembly stubs is handling the various +ways that the dispatch table pointer can be accessed. There are four +different methods that can be used:

    + +
      +
    1. Using _glapi_Dispatch directly in builds for non-multithreaded +environments.
    2. +
    3. Using _glapi_Dispatch and _glapi_get_dispatch in +multithreaded environments.
    4. +
    5. Using _glapi_Dispatch and pthread_getspecific in +multithreaded environments.
    6. +
    7. Using _glapi_tls_Dispatch directly in TLS enabled +multithreaded environments.
    8. +
    + +

    People wishing to implement assembly stubs for new platforms should focus +on #4 if the new platform supports TLS. Otherwise, implement #2 followed by +#3. Environments that do not support multithreading are uncommon and not +terribly relevant.

    + +

    Selection of the dispatch table pointer access method is controlled by a +few preprocessor defines.

    + + + +

    Two different techniques are used to handle the various different cases. +On x86 and SPARC, a macro called GL_STUB is used. In the preamble +of the assembly source file different implementations of the macro are +selected based on the defined preprocessor variables. The assmebly code +then consists of a series of invocations of the macros such as: + +

    + + +
    +GL_STUB(Color3fv, _gloffset_Color3fv)
    +
    SPARC Assembly Implementation of glColor3fv
    +
    + +

    The benefit of this technique is that changes to the calling pattern +(i.e., addition of a new dispatch table pointer access method) require fewer +changed lines in the assembly code.

    + +

    However, this technique can only be used on platforms where the function +implementation does not change based on the parameters passed to the +function. For example, since x86 passes all parameters on the stack, no +additional code is needed to save and restore function parameters around a +call to pthread_getspecific. Since x86-64 passes parameters in +registers, varying amounts of code needs to be inserted around the call to +pthread_getspecific to save and restore the GL function's +parameters.

    + +

    The other technique, used by platforms like x86-64 that cannot use the +first technique, is to insert #ifdef within the assembly +implementation of each function. This makes the assembly file considerably +larger (e.g., 29,332 lines for glapi_x86-64.S versus 1,155 lines for +glapi_x86.S) and causes simple changes to the function +implementation to generate many lines of diffs. Since the assmebly files +are typically generated by scripts (see below), this +isn't a significant problem.

    + +

    Once a new assembly file is created, it must be inserted in the build +system. There are two steps to this. The file must first be added to +src/mesa/sources. That gets the file built and linked. The second +step is to add the correct #ifdef magic to +src/mesa/main/dispatch.c to prevent the C version of the dispatch +functions from being built.

    + + +

    3.4. Fixed-Length Dispatch Stubs

    + +

    To implement glXGetProcAddress, Mesa stores a table that +associates function names with pointers to those functions. This table is +stored in src/mesa/glapi/glprocs.h. For different reasons on +different platforms, storing all of those pointers is inefficient. On most +platforms, including all known platforms that support TLS, we can avoid this +added overhead.

    + +

    If the assembly stubs are all the same size, the pointer need not be +stored for every function. The location of the function can instead be +calculated by multiplying the size of the dispatch stub by the offset of the +function in the table. This value is then added to the address of the first +dispatch stub.

    + +

    This path is activated by adding the correct #ifdef magic to +src/mesa/glapi/glapi.c just before glprocs.h is +included.

    + +
    +

    4. Automatic Generation of Dispatch Stubs

    + + + -- cgit v1.2.3