wiki:FeaturesClustering

Clustering

With the advent of cheap COTS PCs with powerful graphics cards using clusters of them for graphics has become a viable and affordable approach. Typical applications include driving small or large tiled displays, or combining the power of the cluster to render very large scenes. Those ideas have been around for a while, but implementing a cluster-aware application is normally much more work than a standalone one.

Problem

The main problem is that anything the application does on one machine needs to be reflected on all other machines, so that each one of them can create its own part, either as part of the image or as part of the scene, of the consolidated image. This means that all running instances need to be synchronized.

There are a number of different ways of doing that, with different tradeoffs between code complexity and impact, network bandwidth and performance. A more detailed description can be found in this EGPGV 2002 paper. The following is a quick summary.

Solutions

A lowest bandwidth, high-code impact solution is employed by VRJuggler. It is based on the idea of distributing the user's input over the network and running the same application with slightly different viewing parameters on each node. In addition to inputs times and random number generators need to be synchronized to provide consistency between nodes. This uses very little bandwidth, but the application needs to be written against the input abstraction layer and it has to be totally deterministic to ensure consistency between nodes. Given that each node runs the same application they all have to have the power to run the full application, and if external resources like file servers are accessed, they will have to be able to sustain parallel accesses, which can lead to performance bottlenecks. But for applications that use VRJuggler anyway it is a good, efficient solution.

The other end of the spectrum is embodied by Chromium. Chromium is an OpenGL abstraction layer. It is realized as a replacement for the system's dynamic OpenGL library that does not forward the OpenGL commands of an application to the graphics system but instead pumps them over the network to receiver machines that actually execute them. The big advantage of Chromium is that it can work with unmodified OpenGL applications, even if no source code is available at all. The disadvantage is the amount of bandwidth consumed. Unless the application is very explicitly optimized to reduce the protocol overhead (e.g. by heavily using display lists), the bandwidth penalty can become substantial.

A middle ground can be realized with any system, by explicitly packaging information about what changed in the scenegraph into a network-compatible format, sending it over the wire and reproducing the results on the other nodes. Because any possible event needs to be individually coded many systems limit themselves to a small number of changes possible in the system. The absolute minimum is transmitting the viewer position and orientation to be able to move through the scene (cf. OpenSceneGraph, which uses this approach in a clustering example). But to support real applications the amount of effort required to capture and encapsulate all possible changes to the scenegraph can become very significant.

OpenSG

OpenSG can automate the process completely and transparently for the application. Because it keeps a list of all changes to the system during a frame anyway (see FeaturesMultiThreading), the most difficult part is already done. To make this work in a cluster environment the main problem is to remove a dependency on pointers and replace them by numerical IDs. Also the actual data of a field must be packed in a format that is network transparent. These ID/data records are sent over the network, executed on receiption and the node at the other end has a perfect copy of the scenegraph. Thanks to the reflection? capabilities of OpenSG this works not only for nodes in the scenegraph, but for any class in the system---including user-defined classes and data that is not connected to the scenegraph at all.

All of this can be hidden from the application. An application developer only needs to open a special ClusterWindow, and from the on all changes on the data are automatically distributed to the cluster nodes. The code change usually affects just a few lines of code!

To simplify application development OpenSG includes a number of variants of the aformentioned ClusterWindow. The easiest is the MultiDisplayWindow that is able to drive a tiled wall directly by splitting up a large virtual viewport into smaller pieces that are automatically assigned to individual cluster nodes. More complicated operations, including image transfer, splitting up the scene into pieces that are rendered on indvidual cluster nodes and then composited for the final image, or redistributing some pieces of the images between cluster nodes to achieve a balanced load, thereby maximizing throughput, are just as easy by opening the right kind of Window.

Because it is very easy to do these things in OpenSG, it has been used as a basis to develop new algorithms for cluster-supported rendering. This includes the mentioned sort-first load balancing solution as well as new methods to reduce the bandwidth needed for image composition in sort-last rendering.

Conclusion

OpenSG's mechanisms make it possible to support a wide variety of graphics clusters with minimal effort on the part of the application developer, and at a very competitive network bandwidth hit as only the data that has really changed is transfered. This includes sort-first, tiled window opportunities as well as sort-last methods to manage larger scenes at higher frame rates.

Last modified 7 years ago Last modified on 01/17/10 01:11:44