Batch Job Management With Torque/Openpbs

High Performance Computing Center of CSU .edu.cn

Batch Job Management with Torque/OpenPBS

The batch system on titan uses OpenPBS, a free customizable batch system. Jobs are submitted by users with qsub from titan.physics.umass.edu, and are scheduled to run using a fair-share algorithm which prioritizes jobs based on all users' recent cpu utilization. This means, for example, that a user who wants to run a few quick jobs will not have to wait for another user who has hundreds of 10-hour long jobs already in the system.

Submitting jobs with qsub

A job, at a minimum, consists of an executable shell script. The script can run other executables or do just about anything you can do from within an interactive shell. The pbs system, runs your script on a batch node as you. By default, torque/openpbs writes all files with permissions which allow only the user to read the files. To allow everyone else to also read the files you make with your jobs running on the batch system, you can add the following line to your shell script on the second line, immediately after the first line containing #!/bin/zsh (or whatever yours is) that invokes the shell:

#PBS -W umask=022

The simplest invocation needed to send a single job myjob.zsh to the batch system is:

> qsub myjob.zsh

All jobs should specify how much cpu time is needed, else they default run in the express queue, which has a cpu time limit of just a few hours. To specify job resource requirements (e.g. time, memory, etc. ), use the '-l' option for qsub.

To send a job myjob.csh requesting 8:00 of cpu time, use:

> qsub -l cput=08:00:00 myjob.csh

List Job Identifiers

Print a list of job identifiers of all jobs in the system by user bbrau:

> qselect -u bbrau

Query the system with qstat

List all jobs:

> qstat

and which nodes they're running on:

> qstat -n

Full disclosure:

> qstat -f

of just one job using its job identifier:

> qstat -f 1908.titan.physics.umass.edu

Learn about the batch system with qmgr

Print the server configuration:

> qmgr -c 'print server'

Find out about node titan12:

> qmgr -c 'print node titan12'

Node Status with qnodes

List them all:

> qnodes

or just the ones that aren't up:

> qnodes -l

* Delete jobs with qdel*

>qdel 1908.titan.physics.umass.edu

or do it with a qselect to pick all of your jobs

> qdel `qselect -u bbrau`

All the gory details are in: [[sciencegrid.org/public/pbs/pbs.v2.3_admin.pdf][The OpenPBS Administrator's Guide]

And of course on titan, you can read the man pages for most of the commands:

> man qstat

High Performance Computing Center of CSU .edu.cn

Job Scheduling with Torque

Introduction

As part of URC’s efforts to provide users with a more user-friendly and efficient environment, we are in the process of transitioning our job scheduler from Condor to Torque/Maui. Torque is an Open Source scheduler based on the old PBS scheduler code.

The following is a set of directions to assist a user in learning to use Torque to submit jobs to the URC cluster(s). It is tailored specifically to the URC environment and is by no means comprehensive. Details not found in here can be found online at:

sterresources.com/resources/documentation.php/

as well as man pages for the various command.

Note:
Some of the sample scripts displayed in the text are not complete so that the reader can focus specifically on the item being discussed. Full, working examples of scripts and commands are provided in the Examples section at the end of this document.

Configuration

Before submitting jobs, it is important to understand how the compute clusters are laid out in terms of Torque scheduling.

Like the Condor configuration, it is replacing, Torque at URC will accept jobs submitted from three hosts:

submit.urc.uncc.edu (General URC users)
mees.urc.uncc.edu (MEES users)
mees10.urc.uncc.edu (MEES users – Dr Uddin)

As with Condor, users will use SSH to connect to one of the hosts above and from there submit the various Torque specific commands as outlined below. Also, submission of jobs to Torque will also be supported by the URC portal.

All jobs submitted from the submit host are funneled to the URC Torque server (m03) which is running the Torque and Maui server processes.

Compute nodes in the cluster(s) are logically grouped and accessed via Torque “queues.” Users and groups of users are granted the rights to submit jobs to specific queues and hence run on specific nodes. The currently defined queues are:

batch - Default queue. (Disabled)
urc - queue for the general URC users.
mees - queue for the MEES users.
mees10- queue for MEES users – Dr. Uddin)
wrf - queue for the WRF project group

To determine if a specific user has been granted rights to submit to a particular queue, use the Torque command qmgr:

$ qmgr -c "list queue QNAME acl_users" m03

where QNAME is one of the queues defined above. Note that all users have access to submit to the “urc” queue. The access control list is NOT enabled for that queue.

Submitting a Job

Scheduling a job in Torque is similar to the method used in URC’s previous scheduler (Condor). It requires creating a file that describes the job (in this case a shell script) and then that file is given as an argument to the Torque command “qsub” to execute the job.

First of all, here is a sample shell script (myjob.sh) describing a simple job to be submitted:

#! /bin/bash

# ==== Main ======

/bin/date

This script simply runs the ‘date’ command. To submit it to the scheduler for execution, we use the Torque qsub command:

$ qsub myjob.sh

This will cause the script (and hence the date command) to be executed on the default queue.

The simple example above can be expanded to demonstrate additional options:

$ qsub -N "MyJob" -q "urc" my_script.sh

In this example, the “-n” switch gives the job a name while the “-q” switch is used to route the job to the “urc” queue.

Many of the command line options to qsub can also be specified in the shell script itself using Torque (PBS) directives. Using the previous example, our script (my_script.sh) could look like the following:

#!/bin/sh

# ===== PBS OPTIONS =====

### Set the job name

#PBS -N "MyJob"

### Run in the queue named "urc"

#PBS -q "urc"

# ==== Main ======

/bin/date

Running the command is now simply:

$ qsub my_script.sh

For the entire list of options, see the man page qsub i.e.

$ man qsub

Standard Output and Standard Error
In Torque, any output that would normally print to stdout or stderr is collected into two files. By default these files are placed in the initial working directory where you submitted the job from and are named:

scriptname.{o}jobid for stdout
scriptname.{e}jobid for stderr

In our previous example (if we did not specify a job name with -n) that would translate to:

My_script.sh.oNNN
My_script.sh.oNNN

Where NNN is the job ID number returned by qsub. If I named the job with -n (as above) and it was assigned job id 801, the files would be:

MyJob.o801
MyJob.e801

Note:
Torque accomplishes this by buffering the output on the execution host until the job completes and then copies it back to the working directory. An unfortunate circumstance of this is that if your job does write information to stdout or stderr, you would be unable to view it until the job completes.

The qsub command does have an option (-k) to “keep” the files from being buffered and spooled in this manner. Unfortunately, it will only allow the files to be created in the user’s home directory ($HOME) rather than in the working directory. This is not very convenient in most cases.

To avoid this type of problem, URC suggests that (rather than using the -k option) that in your shell script (My_script.sh), you should explicity direct stdout and stderr to files. There are many ways to do this in a shell script. Some common ways are:

#! /bin/bash

exec 1>$PBS_O_WORKDIR/out 2>$PBS_O_WORKDIR/err

which will direct stdout and stderr to the current working directory of the qsub command.

Another method is to enclose curly braces {} around the body of text in the script and recirect it:

#! /bin/bash

{

/bin/date

} > $PBS_O_WORKDIR/out 2>$PBS_O_WORKDIR/err

See the examples below for a more detailed example.

Monitoring a Job

Monitoring a Torque job is done primarily using the Torque command “qstat.” For instance, to see a list of available queues:

$ qstat -q

To see the status of a specific queue:

$ qstat "queuename"

To see the full status of a specific job:

$ qstat -f jobid

where jobid is the unique identifier for the job returned by the qsub command.

Deleting a Job

To delete a Torque job after it has been submitted, use the qdel command:

$ qdel jobid

where jobid is the unique identifier for the job returned by the qsub command.

Monitoring Compute Nodes

To see the status of the nodes associated with a specific queue, use the torque command pbs_nodes(1) (qlso refered to as qnodes):

$ pbsnodes :queue_name

where queue_name is the name of the queue prefixed by a colon (:). For example:

$ pbsnodes :urc

would display information about all of the nodes associated with the “urc” queue. The output includes (for each node) the number of cores available (np= ). If there are jobs running on the node, each one is listed in the (jobs= ) field. This shows how many of the available cores are actually in use.

Parallel (MPI) Jobs

Parallel jobs are submitted to Torque in the manner described above except that you must first ask Torque to reserve the number of processors (cores) you are requesting in your job. This is accomplished using the -l switch to the qsub command:

For example:

$ qsub -q urc -l nodes=16 my_script.sh

would submit my script requesting 16 processors (cores) from the “urc” queue. The script (my_script.sh) would look something like the following:

#! /bin/bash

mpirun -hostfile $PBS_NODEFILE my_mpi_prgram

If you need to specify a specify number of processors (cores) per compute host, you can append a colon (:) to the number of specified nodes and then append the number of processors per host. For example, to request 16 total processors (cores) with only 4 per compute host, the syntax would be:

$ qsub -q urc -l nodes=4:ppn=4 my_script.sh

In this example, mpirun is using the environment variable $PBS_NODEFILE as the path to the hosts file that contains the list of nodes that MPI job can run on. This variable is automatically set by Torque and the file it points to is automatically populated as a result of the “-l nodes=16″ argument given to the qsub command.

Note that the syntax of the mpirun command varies across MPI implementations. The $PBS_NODEFILE is set by Torque and so can be used by any implementation of mpirun provided that the proper syntax is used. The example above (–hostfile) is for OpenMPI.

As described previously, options to qsub can be specified directly in the script file. For the example above, my_script.sh would look similar to the following:

#! /bin/bash

# ===== PBS OPTIONS =====

### Set the job name

#PBS -N MyJob

### Run in the queue named "urc"

#PBS -q urc

### Specify the number of cpus for your job.

#PBS -l nodes=4:ppn=4

mpirun -hostfile $PBS_NODEFILE my_mpi_prgram

Examples of Torque Submit Scripts

NOTE: Additional sample scripts can be found online in /apps/torque/examples.

[1] Simple Job

#! /bin/bash

# ===== PBS OPTIONS =====

### Set the job name

#PBS -N MyJob

### Run in the queue named "urc"

#PBS -q urc

# ===== END PBS OPTIONS =====

# Redirect standard out and standard error.

exec 1>$PBS_O_WORKDIR/$PBS_JOBID.out 2>$PBS_O_WORKDIR/$PBS_JOBID.err

# Main Program

/bin/date

[2] Parallel Job – 16 Processors (Using OpenMPI)

#! /bin/bash

# ===== PBS OPTIONS =====

### Set the job name

#PBS -N MyJob

### Run in the queue named "urc"

#PBS -q urc

### Specify the number of cpus for your job.

#PBS -l nodes=16

# ===== END PBS OPTIONS =====

# Redirect standard out and standard error.

exec 1>$PBS_O_WORKDIR/$PBS_JOBID.out 2>$PBS_O_WORKDIR/$PBS_JOBID.err

# =========== Main Program ===========

# Setup to use OpenMPI

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apps/sys/openmpi-1.2.4/rhel5_u1-64/gnu/lib

MPI_RUN=/apps/sys/openmpi-1.2.4/rhel5_u1-64/gnu/bin/mpirun

# Run the program "simplempi" with an argument of "30"

$MPI_RUN --hostfile $PBS_NODEFILE /users/joe/simplempi 30

High Performance Computing Center of CSU .edu.cn

TORQUE Resource Manager的五个命令

1、 pbsnodes

pbs node manipulation

Synopsis

pbsnodes [-{a|x}] [-q] [-s server] [node|:property]

pbsnodes -l [-q] [-s server] [state] [nodename|:property ...]

pbsnodes [-{c|d|o|r}] [-q] [-s server] [-n -l] [-N "note"] [node|:property]

Description

The pbsnodes command is used to mark nodes down, free or offline. It can also be used to list nodes and their state. Node information is obtained by sending a request to the PBS job server. Sets of nodes can be operated on at once by specifying a node property prefixed by a colon. See Node States for more information on node states.

Nodes do not exist in a single state, but actually have a set of states. For example, a node can be simultaneously "busy" and "offline". The "free" state is the absense of all other states and so is never combined with other states.

In order to execute pbsnodes with other than the -a or -l options, the user must have PBS Manager or Operator privilege.

Options

-a

All attributes of a node or all nodes are listed. This is the default if no flag is given.

-x

Same as -a, but the output has an XML-like format.

-c

Clear OFFLINE from listed nodes.

-d

Print MOM diagnosis on the listed nodes. Not yet implemented. Use momctl instead.

-o

Add the OFFLINE state. This is different from being marked DOWN. OFFLINE prevents new jobs from running on the specified nodes. This gives the administrator a tool to hold a node out of service without changing anything else. The OFFLINE state will never be set or cleared automatically by pbs_server; it is purely for the manager or operator.

-p

Purge the node record from pbs_server. Not yet implemented.

-r

Reset the listed nodes by clearing OFFLINE and adding DOWN state. pbs_server will ping the node and, if they communicate correctly, free the node.

-l

List node names and their state. If no state is specified, only nodes in the DOWN, OFFLINE, or UNKNOWN states are listed. Specifying a state string acts as an output filter. Valid state strings are "active", "all", "busy", "down", "free", "offline", "unknown", and "up".

Using all displays all nodes and their attributes.

Using active displays all nodes which are job-exclusive, job-sharing, or busy.

Using up displays all nodes in an "up state". Up states include job-execlusive, job-sharing, reserve, free, busy and time-shared.

All other strings display the nodes which are currently in the state indicated by the string.

-N

Specify a "note" attribute. This allows an administrator to add an arbitrary annotation to the listed nodes. To clear a note, use -N "" or -N n.