Warning
You need to add the batch_system.repo file to the master server and compute nodes! Do NOT install the older torque packages from CentOS!
Our torque server will be master
. It will not run calculations and
also serve as a login node for users. This is a common simple setup,
which we do not recommend in real life. More elaborate configurations
use dedicated machines for running the Torque server and dedicated login
machines to submit jobs.
[root@master ~]# yum install torque-server
This will install two services:
trqauthd
manages authentication of all Torque client applications with the Torque server
pbs_server
Torque Server, also known as PBS Server or Resource Manager
The RPM install will do some initialization for us, but we want to
override this and start all daemons in the systemd
way.
First kill any pbs_server
processes:
# stop any running instances of pbs_server
[root@master ~]# systemctl stop pbs_server
[root@master ~]# qterm
Set the PBS server host name by writing it to
/var/spool/torque/server_name
:
[root@master ~]# echo master.hpc > /var/spool/torque/server_name
Initialize the PBS server data base by running the torque.setup
utility. The first parameter is the username of the management user and
the second the torque server name.
# initialize pbs server data base
[root@master ~]# /usr/share/doc/torque-server-6.1.3/torque.setup root master.hpc
# stop pbs_server after this command completes
[root@master ~]# qterm
Note
If there is an error running the torque.setup command, change your hostname using the command hostname master
and try again.
Finally restart the trqauthd
and start pbs_server
service via systemctl
.
[root@master ~]# systemctl enable pbs_server
[root@master ~]# systemctl enable trqauthd
[root@master ~]# systemctl restart trqauthd
[root@master ~]# systemctl start pbs_server
# verify both services are running
[root@master ~]# systemctl status trqauthd
[root@master ~]# systemctl status pbs_server
If you successfully installed and configured the PBS server, you will be able to issue the following command to see the current configuration:
[root@master ~]# qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.walltime = 01:00:00
set queue batch resources_default.nodes = 1
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master.hpc
set server managers = root@master.hpc
set server operators = root@master.hpc
set server default_queue = batch
set server log_events = 2047
set server mail_from = adm
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 300
set server poll_jobs = True
set server down_on_error = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 0
set server moab_array_compatible = True
set server nppcu = 1
set server timeout_for_job_delete = 120
set server timeout_for_job_requeue = 120
set server note_append_on_error = True
Running qstat -q
will also who that there is a single batch queue.
[root@master x86_64]# qstat -q
server: master.hpc
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch -- -- -- -- 0 0 -- E R
----- -----
0 0
Only a limited list of servers will be granted the right to submit new
jobs. This is controlled by the submit_hosts
server variable. Since
we only have a single master server that should also act as login server
for users, let’s add master.hpc
as submit host.
[root@master ~]# qmgr -c 'set server submit_hosts = master.hpc'
# assuming there is multiple login nodes, this is how you would append to a list variable
# qmgr -c 'set server submit_hosts += login1.hpc'
# qmgr -c 'set server submit_hosts += login2.hpc'
https://www.hpc.temple.edu/mhpc/2021/hpc-technology/exercise8/server.html
إرسال تعليق