The batch system
The batch system is used to distribute computing jobs to the cluster. It can be used in two ways, via scripts and interactively. A typical script could look as following:
#PBS -o /home/u/username/output.dat
#PBS -l walltime=2:00:00,nodes=1
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N Jobname
#PBS -j oe
cd $PBS_O_WORKDIR
./a.out
The lines stand for:
- Name of the standard output file
- Approximated walltime of the computation ( 2 hours), number of nodes (here: 1 core on an arbitrary node)
- Email address of the user
- Email notification, if the job aborts and finishes
- Name of the queue.
- Name of the job
- Put standard output and standard error messages in a single file
- Change directory to the place, where the script has been submitted
- Call a program
Put these commands in a file and submit it via "qsub filename" to enqueue your job in the batch system.
Job monitoring
For a graphical overview of all running jobs, the utility pbstop can be used.
Another possibility is the command line tool "qstat". The following options are useful:
- qstat -a: Shows all queued and jobs
- qstat -u username: Shows only the jobs of the specified user
- qstat -n PID: Shows the nodes that are running the specified job
- qstat -f PID: Shows the full information of the specified job
Choosing the compute nodes
Switch in submit script |
Compute nodes |
Useful statements | |
-l nodes=1:smp:ppn=6 | 6 Cores on zivsmp001 |
-l nodes=1:hpc:ppn=8 | 8 Cores of a single node (e.g. node016). Only with this method it is ensured that you will get a nodes for your own so that your job does not interfere with other jobs! |
-l nodes=node016:ppn=8 | Explicit selection of node016 with all of its cores |
Not recommended statements | |
-l nodes=10 | 10 arbitrary cores of the HPC nodes or ZIVSMP. |
-l nodes=2:smp:ppn=4 |
The jobs will not start since there is only one node with the property "smp" (ZIVSMP) |
The maximum walltime that can be used is 160 hours.