wiki:FeaturesMultiThreading

Multi-Threading

Multi-threading (MT) is a big topic in software development, especially since processor manufacturers have hit the clock speed wall and are focusing on more parallel processing units per chip to gain more performance. MT has been used for many applications for a long time, but scenegraphs pose some special challenges due to their size and nature. Different scenegraph system have tried to address this in different ways.

Partial Solutions

The easiest is to ignore it. Many systems chose to do that to avoid the complexity, and it is an acceptable solution for simple applications that are not limited by scenegraph processing or drawing.

The second option is to develop special cases and copy the data. Performer was the pioneering system for mutli-threading in scenegraphs, and it used the method of transforming the scenegraph into a flat display list that just contained the commands to send to the graphics hardware in a separate thread. For culling and intersection Performer also introduced a more general, multi-buffered solution (see below).

Given that the main problem in multi-threading is mixing reading and writing, another solution is to disallow writing for certain operations that need to be handled in parallel. A typical operation that needs limited writing is the rendering of the scene. Some systems, most prominently OpenSceneGraph, use this approach. It allows a limited multi-threading where multiple graphics cards can be fed in parallel. This can be very beneficial for an application that is limited by the drawing speed (e.g. a very small number of high-polygon count nodes), but it doesn't help at all for scenes where culling is the limiting factor, where only one graphics card is available, or for asynchronous processing.

An alternative to avoid reading while writing is to use locking (aka mutual exclusion) on a Node-by-Node basis (as demonstrated by OpenRM. The problem is that it doesn't ensure coordination between nodes, so for processes running asynchronously partial results will be visible (e.g. only the first few nodes of a simulation have moved). Just for rendering that might be annoying but acceptable, but for other operations like simulations it can make the results useless.

But to really allow multiple threads to access and work with the scenegraph, each one of them needs to have a separate copy of the graph. In the worst case it has to be possible for one thread to completely destroy and rebuild the graph from scratch without interfering with any of the other threads. Of course the brute-force implementation that actually makes a copy for each thread is infeasible, as state-of-the-art application tax the memory resources as is, for just a single copy of the scene.

Another aspect of multi-threading is application development. If developing an MT application is significantly more complicated than developing a single-threaded one developers will be compelled not to do it. Adding threading to an application should be as invisible as possible, and synchronisation as well as data transfer between threads should be simple.

In addition to all of that, it also has to be efficient. There should be minimal overhead compared to a single-threaded system, so complicated and expensive synchronisation mechanisms like locks and mutexes need to be avoided as far as possible. Also the actual data access needs to be efficient, no matter in which thread the application is or how data is transfered between threads.

OpenSG's approach

So how do you make a system that pretends to have completely independent multiple buffers of the complete scenegraph, allows concurrent asynchronous access of them and hides the complexity of the implementation from the user and from the

processor? The ideas employed on OpenSG are explained in the EGPGV 2002 paper, a summary is given below.

The core idea is sharing what makes sense, to reduce the memeory overhead, and replicating the rest. For a scenegraph there is a wide split between the data that is kept in vectors (or vector-like structures), like vertex positions, normals, colors, textures coordinates etc. and image data like textures, and the data that is only scalar like the color of a material or the transformation of a Transform node. For the most common scenes, the relationship (in size) between these data types is around 10:1, i.e. there is 10 times as much vector than scalar data. As a consequence OpenSG shares the vector data between threads and replicates the scalar data. This reduces the memory overhead of multi-buffering to a fraction of the scene size.

But doesn't the sharing introduce the possibility for threads to influence each others data? To prevent that OpenSG uses a copy on write strategy. Whenever a thread starts changing a shared vector, a private copy is created, and that one is changed. This way changed data is private to a thread (or rather aspect, see below), and all other threads are not influenced. A positive side-effect of this approach is that the thread that actually does the change pays the price for making the copy, all other threads just keep on working without any need for locking or synchronisation.

Even with the sharing optimization there is some overhead in terms of memory. To further reduce this overhead OpenSG separates the threads from the buffers, so that a buffer can be used by an arbitrary number of threads. As long as the application can guarantee that the threads don't interfere, there is zero memory overhead for a new thread. If the application cannot guarantee that, it can always add a new buffer (called aspect in OpenSG parlance) and use it safely.

So now the data is nicely separated. But at some point the threads will need to get synchronized, so that they all work on a consistent data set. This cannot be done automatically, as the system cannot know when the application has all the data in a consistent state. So the application needs to do it explicitly, and it has to be possible to do this asynchronously, as the refresh rates on VR applications vary widely. haptic rendering needs to run very fast, in the KHz range. Graphical rendering is usually between 5 and 50 Hz, while physical simulation can go down to a few Hz. All of these tasks need consistent data, and need to be updated regularly and efficiently about changes to the scene.

To keep the data transfer at synchronization time to a minimum, OpenSG keeps track of all changes that have been made to the scenegraph in a given thread, and only transfers the changed data from one aspect to the other. This change list avoids the need to traverse the whole graph on synchronization, as in most cases only small parts of the graph are changed in any given thread. This is also the core mechanism behind the clustering support in OpenSG.

One problem with this apporach is that the system needs to know when the application is about to start changing things. It needs to know before changes are done, so that the private copy can be made. In OpenSG 1 the application has to do this manually using special calls, in OpenSG 2 that is handled behind the scenes. The maintenance of the change list is also handled here.

The final aspect (no pun intended) of this multi-buffer approach is how to give the user access to it.

In OpenSG 1 we use private pointer types that are used to hide the fact that there are multiple copies of the data. No matter in which thread the code runs, dereferencing the private pointer automatically returns the appropriate pointer for this thread. The computational cost of this dereferencing is small (1 multiplication and 1 addition). However, it does add complication to the code as a separate, parallel class hierarchy of pointer types needs to be maintained to allow type casting and other, pointer-style operations. Therefore OpenSG 2 is currently experimenting with an alternative approach that uses normal C pointers and some template helper classes to make the multi-thread safety more explicit.

Conclusion

Multi-threading is important, but difficult to solve in general. There are a number of partial solutions to allow asynchronous, multi-thread applications, but to really solve it a virtual multi-buffered approach as the one implemented in OpenSG seems to be the most general and most promising solution at a surprisingly small cost in terms of memory and processing power.

Last modified 7 years ago Last modified on 01/17/10 01:11:44