Tags: view all tags

NWZPHI the cluster of the IVV 4

NWZPHI is a cluster equipped with 98 Xeon Phi cards. These are PCIe based accelerators similar to GPUs, but can be used with regular programming languages.

Update: New Centos 7 Installation

Content of this page

NWZPHI the cluster of the IVV 4

Hard- and Software overview

2 developing and debugging servers (24 CPU cores with 2.4 GHz, 64 GB RAM, 1 Xeon Phi 5110p)
12 accelerator nodes (24 cores with 2.4 GHz, 128 GB RAM, 8 Xeon Phi 5110p)
1 SMP node (32 CPU cores, 1.5 TB RAM)
88 TB storage (with FhGFS) for home and scratch
FDR Infinibad as interconnect
The operating system is RedHat Enterprise Linux 6

NWZPHI for the impatient reader

The name of the login-server is NWZPHI. Allowed are all users that are members of the group u0clustr and at least one of the groups starting with p0, q0 or r0. In addition, every user allowed for PALMA may use NWZPHI. You can register yourself for u0clstr at ZIV.MeinZIV (go to “Username (account) and group memberships” / „Nutzerkennung und Gruppenmitgliedschaften“).

The batch and module system are working very similar to PALMA.

Differences to PALMA

If you are familiar with PALMA, starting jobs on NWZPHI is quite easy. There are some differences mentioned here

In the submit file, you do not need the switch "-A"
One node has 24 CPU cores
The node names and properties are different
The operating system has another version, so you have to recompile your code
To use the Xeon Phi accelerators, more work is necessary (see below)

Starting jobs on NWZPHI

Choose your software environment and (optionally) compile your code
Submit your job via the batch system

Environment Modules

Environment variables (like PATH, LD_LIBRARY_PATH) for compilers and libraries can be set by modules:

Command (Short- and Long-form)	Meaning
module av[ailable]	Lists all available modules
module li[st]	Lists all modules in the actual enviroment
module show modulname	Lists all changes caused by a module
module add modul1 modul2 ...	Adds module to the actual environment
module rm modul1 modul2 ...	Deletes module from the actual environment
module purge	Deletes all modules from actual environment

To use the same modules at every login, put the commands in your $HOME/.bashrc. Recommended modules are

module add compiler/intel/14
module add mpi/intel/4.1.0.024
module add tools/mic

Example: Compile a program that uses the FFTW:

module add compiler/intel/14
module add mpi/intel/4.1.0.024
module add lib/fftw/intel/3.3.3

${MPIICC} -I ${FFTW_INCLUDE_DIR} -o program program.c -g ${FLAGS_FAST} -L${FFTW_LIB_DIR} -lfftw_mpi -lfftw -lm

Batch system

The batch system Torque and the scheduler Moab are used to submit jobs. It is not allowed, to start jobs manually. Batch jobs should only be submitted from the server mn02.

Creating submit-files

Example of a submit-file of a MPI-job:

#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=2:ppn=24
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $PBS_O_WORKDIR
mpdboot  -n 2 -f $PBS_NODEFILE  -v
mpirun -machinefile $PBS_NODEFILE -np 48 ./executable

An MPI-job with 48 processes is started.

Further Information:

username: Replace by own username
job_directory: Replace by the path, where the executable can be found
executable: Enter the name of the executable
walltime: The time needed for a whole run. At the moment, maximal 48 hours are possible

When no MPI is needed, the submit-file can be simpler.

Example for a job using openMP:

#PBS -o output.dat
#PBS -l walltime=01:00:00,nodes=1:ppn=24
#PBS -M username@uni-muenster.de
#PBS -m ae
#PBS -q default
#PBS -N job_name
#PBS -j oe
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=12
./executable

Choosing the nodes

The cluster consists of the following nodes:

Name	Hardware	Queue	Annotations	Max Walltime
sl250-01, sl250-02	24 cores, 64 GB RAM, 1 Xeon Phi accelerator	debug	Debugging node, short maximal walltime, so you have less waiting time	4 hours
sl270-01-12	24 cores, 128 GB RAM, 8 Xeon Phi accelerators	default	Production nodes	48 hours
dl560-01	32 cores, 1,5 TB RAM	bigsmp	For large OpenMP computations with very high memory demands	48 hours
sl230-01-03	24 cores, 64 GB RAM	p0doltsi	Reserved for the Doltsinis group	48 hours

To choose the node type that you want to use, you have to use the correct queue. So if you want to run a large computation which needs more than 128 GB of RAM for a single process, the dl560 is right for you. In this case, you have to use the bigsmp queue:

#PBS -q bigsmp

Submitting jobs / Managing the queue

A job is submitted by entering the command

 qsub submit.cmd

, where submit.cmd is the name of the submit-file.

Further commands:

qstat: Shows the current queue
qstat -a: As above, but with the number of requested cores
qstat -n: Shows in detail, which nodes are used
qdel job_number: Deletes jobs from the queue
showbf: Shows the number of free cores

Monitoring jobs

There are different tools for monitoring

qstat -a: Shows the queues with running and waiting jobs
pbstop: Similar to qstat but with a text-based graphical output
Ganglia: Shows detailed information of every node including memory and CPU usage

Storage

There is a 88 TB partition for /home and /scratch using the BeeGFS filesystem (formerly known as FHGfs). Try to store your data like on PALMA: Put your programs in /home and your data in /scratch. Due to the amount of data, there is no backup at the moment.

Using Xeon Phi Accelerators

to be done...

Most of the computational power of NWZPHI stems from the accelerators. To have some numbers: The 24 CPU cores of one node deliver some hundred GFLOPS, a single Xeon PHI accelerator has a peak performance of 1 TFLOP.

You can reserve the Xeon Phi Cards via the batch system. This would be done the following way:

#PBS -l nodes=1:ppn=24:mics=8

Since the operating system cannot see, which cards are already in use, it is a good idea to reserve complete nodes with all of its cards and distribute the jobs by yourself.

When you submit a job, there will be a variable PBS_MICFILE inside your job. In addition to the PBS_NODEFILE, in PBS_MICFILE there will be a list of the Xeon PHIs reserved for you.

To use the Infiniband interfaces, append a "-ib" to the hostnames:

cat $PBS_MICFILE | awk '{print $0"-ib"}' > micfile-ib.$PBS_JOBID

Some useful links

Known Issues

For temporary problems please read the login messages.

Support

In case of questions, please ask Holger Angenent or Martin Leweling via hpc@uni-muenster.de.

-- HolgerAngenent - 2014-07-23

Topic revision: r8 - 2016-06-07 - HolgerAngenent

~~Edit~~
~~Attach~~

Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding ZIVwiki? Send feedback
Datenschutzerklärung Impressum