I updated this on jul 21 using comments below
Torque is a batch job queuing system that is used on clusters. But I find it handy to use it on my multi-core workstation as well. It allows jobs that need to be run to be schedule by multiple users. The scheduler will make sure that not too many jobs are run simultaneously which could cause high system loads or memory issues.
I previously posted how to install torque on ubuntu hardy from the torque source package. However, torque is now in the repositories of lucid and here are the steps that I had to take to get it to work on my workstation.
For this setup I kept the server host name 'torqueserver' which is the default in the package. You can do the same or use a fully qualified domain name. In that case, you will have to adept the steps somewhat.
My workstation has 8 cores, and I only want to give 6 of them to the que. Please adapt your numbers accordingly.
0) open root terminal
1) add torqueserver as an alias to /etc/hosts.
Code:
gedit /etc/hosts
change 127.0.1.1 myHostName to 127.0.1.1 myHostName torqueserver
*) see post by drlemon. Alternatively use a resolvable host name (check with 'host $HOSTNAME') in the file: /var/lib/torque/server_name and whereever torqueserver is used below, use that host name
2) install torque from repositories
Code:
apt-get install torque*
3) stop torque
4) check torque is not running (otherwise you can kill it)
5) create missing directory
Code:
mkdir /var/lib/torque/server_priv/arrays
6) add torqueserver as serverhost
Code:
echo "SERVERHOST localhost" >> /var/lib/torque/torque.cfg
7)setup nodes:
Code:
echo "torqueserver np=8" >> /var/lib/torque/server_priv/nodes
echo "pbs_server = 127.0.1.1" >> /var/lib/torque/mom_priv/config
8) setup database
Code:
pbs_server -t create
9) create que and set server settings in database
Code:
qmgr torqueserver
create queue batch
set queue batch queue_type = Execution
set queue batch max_running = 6
set queue batch resources_max.ncpus = 8
set queue batch resources_max.nodes = 1
set queue batch resources_default.ncpus = 1
set queue batch resources_default.neednodes = 1:ppn=1
set queue batch resources_default.walltime = 24:00:00
set queue batch max_user_run = 6
set queue batch enabled = True
set queue batch started = True
set server default_queue = batch
set server scheduling = True
exit
10) restart server and scheduler and node server
Code:
qterm
pbs_server
pbs_sched #this will give some warning about missing files
pbs_mom
11) check that the nodes are up
12) exit the root terminal and as a normal user test the que
Code:
exit
qstat -q
echo "sleep 30" | qsub
qstat
13) see drlemon: do a gedit /etc/init.d/torque* and change in all three files the pidfile= line so that it points to /var/lib instead of /var/spool. Additionally remove the -t create from the server options in the torque-server file.
This works for me but probably requires more configuration in a demanding computing environment. Check out the torque website for more queue configurations, user management etc.
Bookmarks