Howto : Install Torque/PBS (job scheduler/manager) for a workstation
Disclaimer :
* This is more some quick'n dirty notes than a real Howto, so feel free to dislike the presentation.
* This was done on dapper, it should work mostly on edgy but as I didn't play with it much yet I don't know if upstart (the new init system) is still configured that way (update-rc.d)
* I tend to consider people interested in job scheduling to be CLI friendly and able to know when to be root ... as I might forget to be that precise, please bear with me and feel free to comment on where it bothers you.
Background :
I am working on different clusters on a daily basis some of them I am in charge with. To configure those I am not using Ubuntu or any Debian based distro, I am mostly using Rocks cluster http://www.rocksclusters.org for its quick installation process.
Recently I put my hand on a 4-cores machine (2*dualcore), that I wanted to share with other people for small calculations (so no cluster here). I installed Ubuntu (my distro of choice for desktop - the machine being also used for visualisation of data) and I didn't find any job management system in the repositories (appart from cron (too basic) and drqueue (dedicated to 3D rendering)). I then searched for some Ubuntu howto and didn't find any, hence this post. Finally I searched for some linux job management tools for workstations (on master only) and didn't find any ... if anybody has heard of one I would be happy to know about it.
So I went back and used the beast I knew, I set up Torque/PBS with the upsetting feeling that I was hammering a nail with a sledgehammer.
Howto:
Installing Torque PBS on a workstation
( This is mostly following the quickstart guide : http://www.clusterresources.com/wiki...ickstart_guide)
* Get the latest torque tarball from http://www.clusterresources.com/downloads/torque/
* Compile and install it somewhere it won't bothers you
Code:tar -xzvf torque.tar.gz cd torque ./configure --prefix=/opt/local make make install
Code:torque.setup Admin_USER
* Quick'n dirty configuration for a 4 cpus workstation
(the torque executables should be in your path, if you used the same installation directory as I did you sould have hat the following line to your ~/.bashrc :
export PATH=$PATH:/opt/local/bin:/opt/local/sbin
)
(By default $(TORQUECFG)=/var/spool/torque )
Code:cd $(TORQUECFG)
Code:vi server_priv/nodes
Code:myworkstation np=4
4 : the number of cpus)
Set the client server :
Code:vi mom_priv/config
Code:$pbs_server = 127.0.0.1
Code:pbs_mom
Code:qterm pbs_server
Code:pbs_sched
Code:qmgr -c "list server" qmgr -c "list queue batch"
Code:qmgr -c "set server query_other_jobs = True" qmgr -c "set queue batch resources_max.ncpus=4
Finally you may want the 3 servers to be launched at boot time :
For that purpose, you need to create those 3 files (based on /etc/init.d/skeleton) :
/etc/init.d/pbs_server
/etc/init.d/pbs_mom
/etc/init.d/pbs_sched
###/etc/init.d/pbs_mom###
Code:#! /bin/sh ### BEGIN INIT INFO # Provides: skeleton # Required-Start: $local_fs $remote_fs # Required-Stop: $local_fs $remote_fs # Default-Start: 2 3 4 5 # Default-Stop: S 0 1 6 # Short-Description: Example initscript # Description: This file should be used to construct scripts to be # placed in /etc/init.d. ### END INIT INFO # # Author: Miquel van Smoorenburg <miquels@cistron.nl>. # Ian Murdock <imurdock@gnu.ai.mit.edu>. # # Please remove the "Author" lines above and replace them # with your own name if you copy and modify this script. # # Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl # set -e PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin DESC="PBS MOM Client Daemon" NAME=pbs_mom DAEMON=/opt/local/sbin/$NAME PIDFILE=/var/run/$NAME.pid SCRIPTNAME=/etc/init.d/$NAME # Gracefully exit if the package has been removed. test -x $DAEMON || exit 0 # Read config file if it is present. #if [ -r /etc/default/$NAME ] #then # . /etc/default/$NAME #fi # # Function that starts the daemon/service. # d_start() { start-stop-daemon --start --quiet --pidfile $PIDFILE \ --exec $DAEMON \ || echo -n " already running" } # # Function that stops the daemon/service. # d_stop() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME \ || echo -n " not running" } # # Function that sends a SIGHUP to the daemon/service. # d_reload() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME --signal 1 } case "$1" in start) echo -n "Starting $DESC: $NAME" d_start echo "." ;; stop) echo -n "Stopping $DESC: $NAME" d_stop echo "." ;; #reload) # # If the daemon can reload its configuration without # restarting (for example, when it is sent a SIGHUP), # then implement that here. # # If the daemon responds to changes in its config file # directly anyway, make this an "exit 0". # # echo -n "Reloading $DESC configuration..." # d_reload # echo "done." #;; restart|force-reload) # # If the "reload" option is implemented, move the "force-reload" # option to the "reload" entry above. If not, "force-reload" is # just the same as "restart". # echo -n "Restarting $DESC: $NAME" d_stop # One second might not be time enough for a daemon to stop, # if this happens, d_start will fail (and dpkg will break if # the package is being upgraded). Change the timeout if needed # be, or change d_stop to have start-stop-daemon use --retry. # Notice that using --retry slows down the shutdown process somewhat. sleep 1 d_start echo "." ;; *) echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2 exit 3 ;; esac exit 0
###/etc/init.d/pbs_sched###
Code:#! /bin/sh ### BEGIN INIT INFO # Provides: skeleton # Required-Start: $local_fs $remote_fs # Required-Stop: $local_fs $remote_fs # Default-Start: 2 3 4 5 # Default-Stop: S 0 1 6 # Short-Description: Example initscript # Description: This file should be used to construct scripts to be # placed in /etc/init.d. ### END INIT INFO # # Author: Miquel van Smoorenburg <miquels@cistron.nl>. # Ian Murdock <imurdock@gnu.ai.mit.edu>. # # Please remove the "Author" lines above and replace them # with your own name if you copy and modify this script. # # Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl # set -e PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin DESC="PBS Scheduler Daemon" NAME=pbs_sched DAEMON=/opt/local/sbin/$NAME PIDFILE=/var/run/$NAME.pid SCRIPTNAME=/etc/init.d/$NAME # Gracefully exit if the package has been removed. test -x $DAEMON || exit 0 # Read config file if it is present. #if [ -r /etc/default/$NAME ] #then # . /etc/default/$NAME #fi # # Function that starts the daemon/service. # d_start() { start-stop-daemon --start --quiet --pidfile $PIDFILE \ --exec $DAEMON \ || echo -n " already running" } # # Function that stops the daemon/service. # d_stop() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME \ || echo -n " not running" } # # Function that sends a SIGHUP to the daemon/service. # d_reload() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME --signal 1 } case "$1" in start) echo -n "Starting $DESC: $NAME" d_start echo "." ;; stop) echo -n "Stopping $DESC: $NAME" d_stop echo "." ;; #reload) # # If the daemon can reload its configuration without # restarting (for example, when it is sent a SIGHUP), # then implement that here. # # If the daemon responds to changes in its config file # directly anyway, make this an "exit 0". # # echo -n "Reloading $DESC configuration..." # d_reload # echo "done." #;; restart|force-reload) # # If the "reload" option is implemented, move the "force-reload" # option to the "reload" entry above. If not, "force-reload" is # just the same as "restart". # echo -n "Restarting $DESC: $NAME" d_stop # One second might not be time enough for a daemon to stop, # if this happens, d_start will fail (and dpkg will break if # the package is being upgraded). Change the timeout if needed # be, or change d_stop to have start-stop-daemon use --retry. # Notice that using --retry slows down the shutdown process somewhat. sleep 1 d_start echo "." ;; *) echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2 exit 3 ;; esac exit 0
###/etc/init.d/pbs_server###
Code:#! /bin/sh ### BEGIN INIT INFO # Provides: skeleton # Required-Start: $local_fs $remote_fs # Required-Stop: $local_fs $remote_fs # Default-Start: 2 3 4 5 # Default-Stop: S 0 1 6 # Short-Description: Example initscript # Description: This file should be used to construct scripts to be # placed in /etc/init.d. ### END INIT INFO # # Author: Miquel van Smoorenburg <miquels@cistron.nl>. # Ian Murdock <imurdock@gnu.ai.mit.edu>. # # Please remove the "Author" lines above and replace them # with your own name if you copy and modify this script. # # Version: @(#)skeleton 2.85-23 28-Jul-2004 miquels@cistron.nl # set -e PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/local/bin:/opt/local/sbin DESC="PBS Server" NAME=pbs_server DAEMON=/opt/local/sbin/$NAME PIDFILE=/var/run/$NAME.pid SCRIPTNAME=/etc/init.d/$NAME # Gracefully exit if the package has been removed. test -x $DAEMON || exit 0 # Read config file if it is present. #if [ -r /etc/default/$NAME ] #then # . /etc/default/$NAME #fi # # Function that starts the daemon/service. # d_start() { start-stop-daemon --start --quiet --pidfile $PIDFILE \ --exec $DAEMON \ || echo -n " already running" } # # Function that stops the daemon/service. # d_stop() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME \ || echo -n " not running" } # # Function that sends a SIGHUP to the daemon/service. # d_reload() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME --signal 1 } case "$1" in start) echo -n "Starting $DESC: $NAME" d_start echo "." ;; stop) echo -n "Stopping $DESC: $NAME" d_stop echo "." ;; #reload) # # If the daemon can reload its configuration without # restarting (for example, when it is sent a SIGHUP), # then implement that here. # # If the daemon responds to changes in its config file # directly anyway, make this an "exit 0". # # echo -n "Reloading $DESC configuration..." # d_reload # echo "done." #;; restart|force-reload) # # If the "reload" option is implemented, move the "force-reload" # option to the "reload" entry above. If not, "force-reload" is # just the same as "restart". # echo -n "Restarting $DESC: $NAME" d_stop # One second might not be time enough for a daemon to stop, # if this happens, d_start will fail (and dpkg will break if # the package is being upgraded). Change the timeout if needed # be, or change d_stop to have start-stop-daemon use --retry. # Notice that using --retry slows down the shutdown process somewhat. sleep 1 d_start echo "." ;; *) echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2 exit 3 ;; esac exit 0
Now update the rc's :
Code:update-rc.d pbs_server defaults 95 update-rc.d pbs_mom defaults 96 update-rc.d pbs_sched defaults 97
A sample script for qsub using lam/mpi would be :
Code:#!/bin/bash #PBS -l ncpus=4 echo $PBS_JOBID echo "Start time :" date lamboot mpirun -np 4 your_mpi_command echo "End Time :" date lamclean lamhalt
Last edited by avelldiroll; February 26th, 2007 at 11:22 AM. Reason: typos
-
February 26th, 2007 #2
Has an Ubuntu Drip
- Join Date
- Mar 2006
- Location
- Stockholm, Sweden
- Beans
- 692
- Distro
- Ubuntu Development Release
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
Great that you explain how to configure autostart of the services! Looking all over for this to no avail.
However, when ran the setup-script after installing, I got a lot of complaints about libtorque.so.0 not being found.
I had to copy the libs manually:
Code:cp ./src/lib/Libpbs/.libs/libtorque.so.0 /usr/lib/libtorque.so.0 cp ./src/lib/Libpbs/.libs/libtorque.so.0.0.0 /usr/lib/libtorque.so.0.0.0
Guides: Jamming and Music production launcher | PPA enabling system-wide JACK support | On the-fly Multiseat
Interested in: MPX for Ubuntu | Ubuntu Cluster -
September 16th, 2009 #3
First Cup of Ubuntu
- Join Date
- Sep 2009
- Beans
- 1
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
when I execute
Code:torque.setup ADMIN_NAME
Code:./torque.setup: 31: pbs_server: not found ./torque.setup: 33: qmgr: not found ERROR: cannot set TORQUE admins ./torque.setup: 39: qterm: not found
-
April 13th, 2010 #4
First Cup of Ubuntu
- Join Date
- Oct 2008
- Beans
- 3
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
You must execute torque.setup as root user and previously you have to redefine the path of torque binaries and libs (root user doesn't use the path variable of normal users), in this way the script can find the pbs_server executable:
Code:sudo su export PATH="$PATH:/opt/torque-2.4.7/bin:/opt/torque-2.4.7/sbin" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/torque-2.4.7/lib" ./torque.setup root localhost
-
May 18th, 2010 #5
First Cup of Ubuntu
- Join Date
- May 2010
- Beans
- 2
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
This was exactly what i was looking for! thanks a lot to the author for this post and also to the above commenter for clearing the air regarding insrtalling ./torque.setup as root.
-
June 17th, 2010 #6
First Cup of Ubuntu
- Join Date
- Sep 2008
- Beans
- 2
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
here's a helpful item: http://www.clusterresources.com/pipe...ly/007737.html
I'm trying to install torque to an Ubuntu 8.04 linux distribution, and would
rather install from source than use the available deb package.
I first wgot torque-2.3.1.tar.gz, and extracted it.
Then I ran the following sequence, per the quickstart guide:
./configure --disable-gcc-warnings
sudo make
sudo make install
All went well up to this point. Then I tried to run:
sudo ./torque.setup myuser
and got this:
initializing TORQUE (admin: myuser at localhost)
pbs_server: error while loading shared libraries: libtorque.so.2: cannot
open shared object file: No such file or directory
qmgr: error while loading shared libraries: libtorque.so.2: cannot open
shared object file: No such file or directory
ERROR: cannot set TORQUE admins
qterm: error while loading shared libraries: libtorque.so.2: cannot open
shared object file: No such file or directory
============
answer:
By default Ubuntu doesn't include /usr/local/lib in the list of
directories to search for dynamic-linked libraries. So, the best
approach is to add that path to the end of /etc/ld.so.conf, and run
"ldconfig" to update. Then try your "torque.setup" again.
============
my /etc/ld.so.conf file wound up being:
include /etc/ld.so.conf.d/*.conf
include /usr/local/lib -
June 25th, 2010 #7
First Cup of Ubuntu
- Join Date
- Jun 2010
- Beans
- 1
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
Hello,
I installed torque on my 'mini cluster' consisting of two machines: sambstor.che.wisc.edu with 2 processors and gordon.che.wisc.edu with one processor. Gordon is mom and server and sambstor is just mom. I can submit jobs to the queue but they always remain in the state Q and never run. Did anyone have similar problems? Is my configuration wrong?
I always start pbs/torque with
Code:qterm pbs_server pbs_sched
Code:root@gordon:/var/spool/torque/server_priv# pbsnodes gordon.che.wisc.edu state = free np = 1 ntype = cluster status = opsys=linux,uname=Linux gordon 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686,sessions=1382 1386 1387 1430 1448 27586 28676 28776 28858,nsessions=9,nusers=3,idletime=95,totmem=2719692kb,availmem=2619952kb,physmem=1024876kb,ncpus=1,loadave=0.02,netload=3201318378,state=free,jobs=,varattr=,rectime=1277484721 sambstor.che.wisc.edu state = free np = 2 ntype = cluster status = opsys=linux,uname=Linux sambstor.che.wisc.edu 2.6.24-27-generic #1 SMP Fri Mar 12 01:10:31 UTC 2010 i686,sessions=5097 13098 28536 5611 14652 14734 14739 14743 14768 14779 14803 14806 15085 15071 16022 16109,nsessions=16,nusers=4,idletime=67974,totmem=4061992kb,availmem=3720692kb,physmem=1033780kb,ncpus=2,loadave=5.07,netload=2823186465,state=free,jobs=,varattr=,rectime=1277484764
Code:root@gordon:/var/spool/torque/server_priv# cat nodes gordon.che.wisc.edu np=1 sambstor.che.wisc.edu np=2 torque/server_priv# qmgr Max open servers: 4 Qmgr: p s # # Create queues and set their attributes. # # # Create and define queue standard # create queue standard # # Create and define queue medium # create queue medium set queue medium queue_type = Execution set queue medium Priority = 1000 set queue medium max_running = 10 set queue medium resources_max.walltime = 50:00:00 set queue medium resources_default.walltime = 00:30:00 set queue medium enabled = True set queue medium started = True # # Set server attributes. # set server acl_hosts = gordon set server log_events = 511 set server mail_from = adm set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server submit_hosts = gordon.che.wisc.edu set server next_job_number = 24 root@gordon:/var/spool/torque# cat mom_priv/ config jobs/ mom.lock root@gordon:/var/spool/torque# cat mom_priv/config $pbsserver gordon.che.wisc.edu $clienthost gordon.che.wisc.edu $logevent 255 $cputmult 1.0 $wallmult 1.0 $max_load 1.0 $ideal_load 1.0 $restricted *.che.wisc.edu root@gordon:/var/spool/torque# cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 gordon .... more stuff .... menduser@sambstor:~$ sudo cat /var/spool/torque/mom_priv/config [sudo] password for menduser: $pbsserver gordon.che.wisc.edu $logevent 255
Last edited by moritzhoefert; June 25th, 2010 at 06:04 PM.
-
November 21st, 2011 #8
First Cup of Ubuntu
- Join Date
- Nov 2011
- Beans
- 2
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
Did anybody have the problem and figure out hot to solve the issue of job stuck in queue? I have installed torqueue 2.5.5 with mostly default settings. I am using pbs_sched. The pbs_server seems to run OK and sees the nodes and assigns them as free. After submitting a job, it stays in queue and does not run. The server_logs file says the job has been submitted by user. But the sched_logs does not say anything about any job! I guess somehow there is no communication between the scheduler and the server. And I cannot figure out how to solve this! qstat -f, pbsserverdb, qmgr -c "p s" commands does not show anything unusual.
Thanks! -
November 21st, 2011 #9
First Cup of Ubuntu
- Join Date
- Nov 2011
- Beans
- 2
Re: Howto : Install Torque/PBS (job scheduler/manager) for a workstation
OK, using Torque 2.5.5, problem with job stuck in queue when everything else seemed to be working OK - in my case turned out to be that the pbs_server was set to Idle. Check qmgr -c "list server". Once the server status is turned to true, that is scheduling, then it starts interacting with the pbs scheduler and the jobs in queue start running. I don't know whether I inadvertantly set the server to Idle status or that is the default after installation.
Post a Comment