I updated this on jul 21 using comments below
Torque is a batch job queuing system that is used on clusters. But I find it handy to use it on my multi-core workstation as well. It allows jobs that need to be run to be schedule by multiple users. The scheduler will make sure that not too many jobs are run simultaneously which could cause high system loads or memory issues.
I previously posted how to install torque on ubuntu hardy from the torque source package. However, torque is now in the repositories of lucid and here are the steps that I had to take to get it to work on my workstation.
For this setup I kept the server host name 'torqueserver' which is the default in the package. You can do the same or use a fully qualified domain name. In that case, you will have to adept the steps somewhat.
My workstation has 8 cores, and I only want to give 6 of them to the que. Please adapt your numbers accordingly.
0) open root terminal
Code:sudo -s
Code:gedit /etc/hosts change 127.0.1.1 myHostName to 127.0.1.1 myHostName torqueserver
2) install torque from repositories
Code:apt-get install torque*
Code:qterm
Code:ps aux | grep pbs
Code:mkdir /var/lib/torque/server_priv/arrays
Code:echo "SERVERHOST localhost" >> /var/lib/torque/torque.cfg
Code:echo "torqueserver np=8" >> /var/lib/torque/server_priv/nodes echo "pbs_server = 127.0.1.1" >> /var/lib/torque/mom_priv/config
Code:pbs_server -t create
Code:qmgr torqueserver create queue batch set queue batch queue_type = Execution set queue batch max_running = 6 set queue batch resources_max.ncpus = 8 set queue batch resources_max.nodes = 1 set queue batch resources_default.ncpus = 1 set queue batch resources_default.neednodes = 1:ppn=1 set queue batch resources_default.walltime = 24:00:00 set queue batch max_user_run = 6 set queue batch enabled = True set queue batch started = True set server default_queue = batch set server scheduling = True exit
Code:qterm pbs_server pbs_sched #this will give some warning about missing files pbs_mom
Code:pbsnodes -a
Code:exit qstat -q echo "sleep 30" | qsub qstat
This works for me but probably requires more configuration in a demanding computing environment. Check out the torque website for more queue configurations, user management etc.Last edited by jouke.postma; July 21st, 2011 at 04:47 PM. Reason: incoporating comment below
-
July 8th, 2010 #2
First Cup of Ubuntu
- Join Date
- Jul 2010
- Beans
- 5
Re: How to Torque on ubuntu 10.04 on a single multicore machine
Thank you, jouke.postma ! In case someone else needs to repeat it, here is my experience with your instructions.
Messing with my /etc/hosts file somehow made me lose internet connection, but I was able to set up torque with the existing hostname. An extra issue that I found is that the file /var/lib/torque/server_name may need to be edited if you end up changing the name of the host.
The contents of my working mom_priv/config file are slightly different (per torque documentation):
Code:drlemon@lynx-desktop:~$ cat /var/lib/torque/mom_priv/config $pbsserver lynx-desktop $logevent 255
-
October 21st, 2010 #3
First Cup of Ubuntu
- Join Date
- Oct 2010
- Beans
- 3
Re: How to Torque on ubuntu 10.04 on a single multicore machine
Thank you for this very helpful guide to getting started!
However, following the queue setup exactly in the 'qmgr' stage, I could only ever run a single job at a time on my multi-core machine. A simpler setup allowed multiple jobs to run at once:
Code:$ qmgr Max open servers: 4 Qmgr: list queue batch Queue batch queue_type = Execution total_jobs = 0 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 max_running = 6 resources_max.ncpus = 8 resources_default.ncpus = 1 mtime = Thu Oct 21 18:35:37 2010 resources_assigned.ncpus = 0 resources_assigned.nodect = 0 enabled = True started = True
-
June 17th, 2011 #4
First Cup of Ubuntu
- Join Date
- Jun 2011
- Beans
- 1
Re: How to Torque on ubuntu 10.04 on a single multicore machine
Dear all
I have a brand new installation of torque 2.4.8, I am currently testing torque and maui for my main cluster. My boss did not let me test it on the actual cluster, so I used a simple linux box, and installated talk server and talk client on both, I have installed the latest snapshot version of maui. maui and torque can talk perfectly, as i can see maui can identify the resources when i do showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 0 of 2 Processors Active (0.00%)
0 of 1 Nodes Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
But all jobs that I submit i do not get the output and error files, and when i do a tracejob I get the following
:
06/16/2011 19:13:45 S enqueuing into batch, state 1 hop 1
06/16/2011 19:13:45 S Job Queued at request of milton@milton-desktop, owner = milton@milton-desktop, job name =
ExampleJob, queue = batch
06/16/2011 19:13:45 S Job Modified at request of Scheduler@milton-desktop
06/16/2011 19:13:45 S Email 'b' to milton@cs.wits.ac.za failed: Child process '/usr/sbin/sendmail -f
milton@cs.wits.ac.za milton@cs.wits.ac.za ' returned 127 (errno 10:No child processes)
06/16/2011 19:13:45 L Job Run
06/16/2011 19:13:45 S Job Run at request of Scheduler@milton-desktop
06/16/2011 19:13:45 S Reject reply code=15001(Unknown Job Id), aux=0, type=JobObituary, from pbs_mom@milton-desktop
06/16/2011 19:13:45 M job was terminated
06/16/2011 19:13:45 M obit sent to server
06/16/2011 19:13:45 A queue=batch
06/16/2011 19:13:45 M scan_for_terminated: job 52.milton-desktop task 1 terminated, sid=29831
06/16/2011 19:13:45 M server rejected job obit - 15001
06/16/2011 19:13:45 A user=milton group=milton jobname=ExampleJob queue=batch ctime=1308244425 qtime=1308244425
etime=1308244425 start=1308244425 owner=milton@milton-desktop exec_host=torqueserver/0
Resource_List.ncpus=1 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1
Resource_List.walltime=00:01:00
06/16/2011 19:14:22 A 06/16/2011 19:14:22 S dequeuing from batch, state EXITING
06/16/2011 19:14:22 S Email 'a' to milton@cs.wits.ac.za failed: Child process '/usr/sbin/sendmail -f
milton@cs.wits.ac.za milton@cs.wits.ac.za ' returned 127 (errno 10:No child processes)
I googled everything about the error and could not find a solution. That happen to everyjob I submit, I really appreciate if any of you could help me with that.
Thank you very much
Milton Lauxande -
February 15th, 2012 #5
First Cup of Ubuntu
- Join Date
- Aug 2008
- Beans
- 9
Re: How to Torque on ubuntu 10.04 on a single multicore machine
Nice post - but I don't manage to get it to work under Oneiric. Is there anything which is missing ion your post here? It sems that the config in /var/lib/torque/ is not picked up by torque:
pbsnodes -a
pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file
after going thropugh yout howto.
Rainer -
March 4th, 2012 #6
First Cup of Ubuntu
- Join Date
- Feb 2012
- Beans
- 8
Re: How to Torque on ubuntu 10.04 on a single multicore machine
I can't get it to work on 11.10 either. qterm returns the error "qterm: Unauthorized request". In addition the package is installed to /var/spool/torque instead of /var/lib but its not like I got far along enough for that to be an issue.
Last edited by GenericPlayer; March 4th, 2012 at 08:04 PM.
-
March 27th, 2012 #7
Spilled the Beans
- Join Date
- Feb 2009
- Location
- Newark, DE
- Beans
- 15
- Distro
- Ubuntu 8.10 Intrepid Ibex
Re: How to Torque on ubuntu 10.04 on a single multicore machine
I added a reply to this post that might be useful in sorting out the "Unauthorized request" issue:
http://ubuntuforums.org/showthread.p...4#post11797874
In short, make sure /etc/torque/server_name is the same as /etc/hostname -
May 10th, 2012 #8
First Cup of Ubuntu
- Join Date
- Apr 2012
- Beans
- 2
Re: How to Torque on ubuntu 10.04 on a single multicore machine
I'm not sure if that's completely true;
The [resources_default.ncpus=1] command is setting the default resources for any job submitted to the queue.
The [max_running = 6] command sets the number of jobs allowed to run simultaneously in the queue.
If you want a user to be able to run multiple simultaneous jobs, you need to set the previous command and have [max_user_run = #] set.
This should let you run submit and simultaneously run multiple jobs.
-
May 13th, 2012 #9
First Cup of Ubuntu
- Join Date
- Apr 2012
- Beans
- 2
Re: How to Torque on ubuntu 10.04 on a single multicore machine
Your tutorial is great - really helped get things going.
Can I configure an up-and-running installation of torque/pbs to send email notifications or do I have to uninstall it all (via aptitude) and install it from their tar ball?
Post a Comment