wiki:HDI/DebianCluster

Debian GNU/Linux Cluster Setup

WORK IN PROGRESS

  1. Foreword
  2. Prerequisites
  3. Setting up ssh
  4. Build and Install OpenSG
    1. Install packages needed
    2. Configure and make
    3. Let ld find your libs
    4. Copying the libs
  5. Configuring the network
    1. Add IP address and hostname to /etc/hosts
  6. Setting up your application
  7. Troubleshooting
  8. Special Needs
    1. Multi-Head Clients
    2. Autostart
  9. Tutorial Discussion

Foreword

This is a small tutorial on setting up a (dedicated) OpenSG Cluster using Debian/GNU Linux and derivatives (mainly (K)Ubuntu). Nevertheless, most things should be applicable to other Linux distributions, too. It is mainly based on my personal experience and therefore may be sometimes too complicated or just plain wrong. You have been warned. If you find any errors please correct them. If you have questions or suggestions regarding this tutorial, don't hesitate to add them in the Tutorial Discussion section at the end of the page. Thanks, Dominik Rau.

Prerequisites

  • Although it's getting better, ATI support for GNU/Linux still sucks. I use Nvidia cards on Linux since 1999 and never had any problems, so if you're using ATI, you're on your own and may have to change some things here. Sorry for that.
  • As we build a render cluster, I assume that SECURITY IS NO CONCERN! . If you're storing your customers credit card data or any other sensible information on your computers GO AWAY NOW!
  • In this tutorial, I assume that you are working on a computer called node1 (this is where your client runs, where you compile OpenSG and maybe your application) and all your render servers are in the same network, called node2, node3, ..., nodeN.
  • I assume that you are working as user and that your client and servers are all using the same user.
  • I assume that you are using static IP addresses, not DHCP, and that all your nodes are in the /etc/hosts file.
  • I also assume that you have a working basic installation of Debian / (K)Ubuntu.

Setting up ssh

First, install openssh-server on all your machines.

apt-get install openssh-server

ssh knows diferent ways of authentication. Most people use the interactive method, where you have to type in your password if you connect to an other machine, but this is quite inconvenient for our purpose. A smarter way is to use a public key. To setup an automatic authentication using keypairs, type in the following (just press enter if you're asked for filenames / passphrases)

user@node1:/home/user$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/user/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/user/.ssh/id_rsa.

Your public key has been saved in /home/user/.ssh/id_rsa.pub.

The key fingerprint is:

42:20:09:7e:e4:57:29:7e:9a:77:bc:34:fd:03:0c:88 user@node1

If you look into ~/.ssh now, there are two new files, id_rsa and id_rsa.pub. The latter is the one we need. For all your nodes do the following:

user@node1:~$ scp .ssh/id_rsa.pub nodeX:/tmp

The authenticity of host 'nodeX (192.168.1.X)' can't be established.

RSA key fingerprint is 16:fa:88:d2:f7:84:00:e5:57:f8:2a:50:76:8a:b3:b0.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'nodeX' (RSA) to the list of known hosts.

user@nodeX's password:

id_rsa.pub                                                                  100%  395     0.4KB/s   00:00    

Now, login (the last time typing a password) to your machines and add the key to your authorized_keys file (don't mind if it doesn't exist. Nevertheless, it is very important, that it is only readable by user, or it will not be accepted. That's why we call the chmod 600 command):

user@nodeX:~$ cat /tmp/id_rsa.pub >> .ssh/authorized_keys

user@nodeX:~$ chmod 600 .ssh/authorized_keys

user@nodeX:~$ rm /tmp/id_rsa.pub

user@nodeX:~$ logout

If you login again, there should be no need for a password now, as your host is authorized via the public key.

Build and Install OpenSG

Install packages needed

apt-get install flex bison libjpeg62-dev libpng12-dev libfreetype6-dev libtiff4-dev freeglut3-dev nvidia-glx-dev ...

Configure and make

Install to /usr instead of /usr/local

Let ld find your libs

In the previous step, we installed the OpenSG libraries to /usr/lib/opt. The dynamic linker searches by default all the libraries needed in /lib and /usr/lib. That the OpenSG libraries can be found you have to change (or if it does not exist, create) /etc/ld.so.conf. Just add the line

/usr/lib/opt 

and run

ldconfig

afterwards. You must be root to this. Check man ld for more details.

Copying the libs

copy everything via scp to the other clients

Alternative: use NFS

Configuring the network

Setting up multicast

Add IP address and hostname to /etc/hosts

This step is quite important and caused a lot of headache to me. When OpenSG starts a connection it

  • Broadcasts a request to all hosts in the network, checking if there is a server.
  • If there is a server, the server takes its hostname (hostX), resolves it and returns the IP address.
  • If you don't have you own DNS server running, this name resolution is done via the /etc/hosts file.

By default, your /etc/host looks like this:

127.0.0.1 localhost.localdomain localhost hostX

The result is that all the servers return 127.0.0.1 as their network address, which is, of course, crap. Therefore, change your /etc/host files that only localhost is resolved as 127.0.0.1 and the hostname returns the network IP address:

127.0.0.1 localhost.localdomain localhost

192.168.0.X hostX.yourDomain hostX

Setting up your application

If you have a cluster, it is very inconvenient to start your render servers manually on every node, so we need to automate it. Basically there are to ways:

  1. Automatically start the servers at system boot.
  1. Let the client start the servers.

Although the first variant might be a little bit simpler at the first glance, it has some drawbacks:

  • You have to configure the all your servers locally on the nodes.
  • You have to write a script that restarts the servers if you kill the client.

If you have some experience with Linux, it should be no problem for you to set this up correctly if you really want. In this tutorial, we use the second approach. This is some code I use for years now.

First, we need a data structure, that includes all the information we need for a server

#!cpp



#include <string>

using std::string;

#include <vector>

using std::vector;



struct ServerProps{

  string name;            //The name of the server

  string ip;              //hostname or IP-address (ssh doesn't care)

  string displayNum;      //In most cases it is :0.0

  string thisServerArgs;  //Additional arguments for this server

};



string defaultArgs;       //Arguments for all servers.

string appName;           //The name of your server application (like ./12ClusterServer or /path/to/renderServer)

vector<ServerProps> renderServers;

#!cpp

#include <sstream>

using std::stringstream;



void startServers()

{

  vector<ServerProps>::iterator it;



  for(it=renderServers.begin();it!=renderServers.end();++it){

    stringstream c;

    c<<"nohup ssh "<<(*it).ip<<" "<<appName<<" "<<defaultArgs<<" "<<(*it).thisServerArgs;

    c<<" "<<(*it).name<<" -display :"<<(*it).displayNum<<" &";

    system(c.str().c_str());

  }  

}

Troubleshooting

Special Needs

Multi-Head Clients

Autostart

Tutorial Discussion

Last modified 7 years ago Last modified on 01/17/10 01:11:44