Warning

You need to add the batch_system.repo file to the master server and compute nodes! Do NOT install the older torque packages from CentOS!

Our torque server will be master. It will not run calculations and also serve as a login node for users. This is a common simple setup, which we do not recommend in real life. More elaborate configurations use dedicated machines for running the Torque server and dedicated login machines to submit jobs.

[root@master ~]# yum install torque-server

This will install two services:

trqauthd

manages authentication of all Torque client applications with the Torque server

pbs_server

Torque Server, also known as PBS Server or Resource Manager

The RPM install will do some initialization for us, but we want to override this and start all daemons in the systemd way.

First kill any pbs_server processes:

# stop any running instances of pbs_server
[root@master ~]# systemctl stop pbs_server
[root@master ~]# qterm

Set the PBS server host name by writing it to /var/spool/torque/server_name:

[root@master ~]# echo master.hpc > /var/spool/torque/server_name

Initialize the PBS server data base by running the torque.setup utility. The first parameter is the username of the management user and the second the torque server name.

# initialize pbs server data base
[root@master ~]# /usr/share/doc/torque-server-6.1.3/torque.setup root master.hpc

# stop pbs_server after this command completes
[root@master ~]# qterm

Note

If there is an error running the torque.setup command, change your hostname using the command hostname master and try again.

Finally restart the trqauthd and start pbs_server service via systemctl.

[root@master ~]# systemctl enable pbs_server
[root@master ~]# systemctl enable trqauthd

[root@master ~]# systemctl restart trqauthd
[root@master ~]# systemctl start pbs_server

# verify both services are running
[root@master ~]# systemctl status trqauthd
[root@master ~]# systemctl status pbs_server

If you successfully installed and configured the PBS server, you will be able to issue the following command to see the current configuration:

[root@master ~]# qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.walltime = 01:00:00
set queue batch resources_default.nodes = 1
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master.hpc
set server managers = root@master.hpc
set server operators = root@master.hpc
set server default_queue = batch
set server log_events = 2047
set server mail_from = adm
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 300
set server poll_jobs = True
set server down_on_error = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 0
set server moab_array_compatible = True
set server nppcu = 1
set server timeout_for_job_delete = 120
set server timeout_for_job_requeue = 120
set server note_append_on_error = True

Running qstat -q will also who that there is a single batch queue.

[root@master x86_64]# qstat -q

server: master.hpc

Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
batch              --      --       --      --    0   0 --   E R
                                               ----- -----
                                                   0     0

Only a limited list of servers will be granted the right to submit new jobs. This is controlled by the submit_hosts server variable. Since we only have a single master server that should also act as login server for users, let’s add master.hpc as submit host.

[root@master ~]# qmgr -c 'set server submit_hosts = master.hpc'

# assuming there is multiple login nodes, this is how you would append to a list variable
# qmgr -c 'set server submit_hosts += login1.hpc'
# qmgr -c 'set server submit_hosts += login2.hpc'

 

 

https://www.hpc.temple.edu/mhpc/2021/hpc-technology/exercise8/server.html 

Post a Comment

أحدث أقدم