Support Knowledge Center

Barcelona Supercomputing Center

Nord III User's Guide

Nord III User's Guide

Table of Contents

  1. Introduction
  2. System Overview
  3. Connecting to Nord III
    1. Password Management
  4. File Systems
    1. GPFS Filesystem
    2. Active Archive - HSM (Tape Layer)
    3. Local Hard Drive
    4. Root Filesystem
    5. Quotas
  5. Data management
    1. Transferring files
    2. Active Archive Management
    3. Repository management (GIT/SVN)
  6. Running Jobs
    1. LSF Commands
    2. Interactive Sessions
    3. Job directives
    4. MPI particulars
    5. Jobscript examples
    6. Queues
  7. Software Environment
    1. C Compilers
    2. FORTRAN Compilers
    3. Modules Environment
    4. BSC Commands
    5. TotalView
    6. Tracing jobs with BSC Tools
  8. Getting help
  9. Frequently Asked Questions (FAQ)
  10. Appendices
    1. SSH
    2. Transferring files on Windows
    3. Using X11
    4. Requesting and installing a .X509 user certificate

Introduction ↩

This user’s guide for the Nord III cluster is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with many of the standard features of supercomputing as the Unix operating system.

Here you can find most of the information you need to use our computing resources and the technical documentation about the machine. Please read carefully this document and if any doubt arises do not hesitate to contact us (Getting help).

System Overview ↩

Nord III is a supercomputer based on Intel SandyBridge processors, iDataPlex Compute Racks, a Linux Operating System and an Infiniband interconnection.

The current Peak Performance is 251,6 Teraflops. The total number of processors is 12,096 Intel SandyBridge-EP E5–2670 cores at 2.6 GHz (756 compute nodes) with at least 24.2 TB of main memory. See below a summary of the system:

The operating system is SUSE Linux Enterprise Server 11 SP3.

As we said before, there are 3 different types of nodes, each one with a different amount of memory available.

More information about the request of the different node types can be found in the “Job Directives” section.

Connecting to Nord III ↩

The first thing you should know is your username and password. Once you have a login and its associated password you can get into the cluster through one of the following login nodes:

You must use Secure Shell (ssh) tools to login into or transfer files into the cluster. We do not accept incoming connections from protocols like telnet, ftp, rlogin, rcp, or rsh commands. Once you have logged into the cluster you cannot make outgoing connections for security reasons.

To get more information about the supported secure shell version and how to get ssh for your system (including windows systems) see the Appendices. Once connected to the machine, you will be presented with a UNIX shell prompt and you will normally be in your home ($HOME) directory. If you are new to UNIX, you will need to learn the basics before doing anything useful.

Password Management ↩

In order to change the password, you have to login to a different machine (dt01.bsc.es). This connection must be established from your local machine.

    % ssh -l username dt01.bsc.es

    username@dtransfer1:~> passwd
    Changing password for username.
    Old Password: 
    New Password: 
    Reenter New Password: 
    Password changed.

Mind that the password change takes about 10 minutes to be effective.

File Systems ↩

IMPORTANT: It is your responsibility as a user of our facilities to backup all your critical data. We only guarantee a daily backup of user data under /gpfs/home. Any other backup should only be done exceptionally under demand of the interested user.

Each user has several areas of disk space for storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. There are 3 different types of storage available inside a node:

GPFS Filesystem ↩

The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system providing fast, reliable data access from all nodes of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

An incremental backup will be performed daily only for /gpfs/home.

These are the GPFS filesystems available in the machine from all nodes:

Active Archive - HSM (Tape Layer) ↩

Active Archive (AA) is a mid-long term storage filesystem that provides 15 PB of total space. You can access AA from the Data Transfer Machine (dt01.bsc.es and dt02.bsc.es) under /gpfs/archive/hpc/your_group.

NOTE: There is no backup of this filesystem. The user is responsible for adequately managing the data stored in it.

Hierarchical Storage Management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. At BSC, the filesystem using HSM is the one mounted at /gpfs/archive/hpc, and the two types of storage are GPFS (high-cost, low latency) and Tapes (low-cost, high latency).

HSM System Overview

Hardware

Software

Functioning policy and expected behaviour

In general, this automatic process is transparent for the user and you can only notice it when you need to access or modify a file that has been migrated. If the file has been migrated, any access to it will be delayed until its content is retrieved from tape.

Only the files with a size between 1GB and 12TB will be moved (migrated) to tapes from the GPFS disk when no data access and modification have been done for a period of 30 days.

    /gpfs/archive/hpc

From the user point of view, the deletion will be transparent and have the same behaviour. On the other hand, it is not possible to modify a migrated file; in that case, you will have to wait for the system to retrieve the file and put it back on disk.

If there is not enough space to recover a given file from tape, the retrieve will fail and everything will remain in the same state as before, that is, you will continue to see the file on tape (in the “migrated” state).

You can use the hsmFileState script to check if the file is resident on disk or has been migrated to tape..

Examples of use cases

    $ hsmFileState file_1MB.dat
    resident -rw-rw-r-- 1 user group 1048576 mar 12 13:45 file_1MB.dat

    $ hsmFileState file_10GB.dat
    migrated -rw-rw-r-- 1 user group 10737418240 feb 12 11:37 file_10GB.dat

Local Hard Drive ↩

Every node has a local hard drive that can be used as a local scratch space to store temporary files during executions of one of your jobs. This space is mounted over /scratch/tmp directory and pointed out by $TMPDIR environment variable. The amount of space within the /scratch filesystem is about 500 GB. All data stored in these local hard drives at the compute nodes will not be available from the login nodes. Local hard drive data are not automatically removed, so each job has to remove its data before finishing.

Root Filesystem ↩

The root file system, where the operating system is stored doesn’t reside in the node, this is a NFS filesystem mounted from one of the servers.

As this is a remote filesystem only data from the operating system has to reside in this filesystem. It is NOT permitted the use of /tmp for temporary user data. The local hard drive can be used for this purpose as you could read in Local Hard Drive.

Quotas ↩

The quotas are the amount of storage available for a user or a groups’ users. You can picture it as a small disk readily available to you. A default value is applied to all users and groups and cannot be outgrown.

You can inspect your quota anytime you want using the following command from inside each filesystem:

     % bsc_quota

The command provides a readable output for the quota. Check BSC Commands for more information.

If you need more disk space in this filesystem or in any other of the GPFS filesystems, the responsible for your project has to make a request for the extra space needed, specifying the requested space and the reasons why it is needed. For more information or requests you can Contact Us.

Data management ↩

Transferring files ↩

There are several ways to copy files from/to the Cluster:

Direct copy to the login nodes.

As said before no connections are allowed from inside the cluster to the outside world, so all scp and sftp commands have to be executed from your local machines and never from the cluster. The usage examples are in the next section.

On a Windows system, most of the secure shell clients come with a tool to make secure copies or secure ftp’s. There are several tools that accomplish the requirements, please refer to the Appendices, where you will find the most common ones and examples of use.

Data Transfer Machine

We provide special machines for file transfer (required for large amounts of data). These machines are dedicated to Data Transfer and are accessible through ssh with the same account credentials as the cluster. They are:

These machines share the GPFS filesystem with all other BSC HPC machines. Besides scp and sftp, they allow some other useful transfer protocols:

    localsystem$ scp localfile username@dt01.bsc.es:
    username's password:

    localsystem$ scp username@dt01.bsc.es:remotefile localdir
    username's password:
    localsystem$ rsync -avzP localfile_or_localdir username@dt01.bsc.es:
    username's password:

    localsystem$ rsync -avzP username@dt01.bsc.es:remotefile_or_remotedir localdir
    username's password:
    localsystem$ sftp username@dt01.bsc.es
    username's password:
    sftp> get remotefile

    localsystem$ sftp username@dt01.bsc.es
    username's password:
    sftp> put localfile
    bbcp -V -z <USER>@dt01.bsc.es:<FILE> <DEST>
    bbcp -V <ORIG>  <USER>@dt01.bsc.es:<DEST>
    globus-url-copy -help
    globus-url-copy -tcp-bs 16M -bs 16M -v -vb your_file sshftp://your_user@dt01.bsc.es/~/

Setting up sshfs - Option 1: Linux

        sshfs -o workaround=rename <yourHPCUser>@dt01.bsc.es: <localDirectory>

Setting up sshfs - Option 2: Windows

In order to set up sshfs in a Windows system, we suggest two options:

  1. sshfs-win

    • Follow the installation steps from their official repository.

    • Open File Explorer and right-click over the “This PC” icon in the left panel, then select “Map Network Drive”.

    Menu selection
    Menu selection
    • In the new window that pops up, fill the “Folder” field with this route:
        \\sshfs\<your-username>@dt01.bsc.es
    
    Example
    Example
    • After clicking “Finish”, it will ask you for your credentials and then you will see your remote folder as a part of your filesystem.
    Done!
    Done!
  2. win-sshfs

    • Install Dokan 1.0.5 (is the version that works best for us)

    • Install the latest version of win-sshfs. Even though the installer seems to do nothing, if you reboot your computer the direct access to the application will show up.

    • The configuration fields are:

    % Drive name: whatever you want
    % Host: dt01.bsc.es
    % Port: 22
    % Username: <your-username>
    % Password: <your-password>
    % Directory: directory you want to mount
    % Drive letter: preferred
    % Mount at login: preferred
    % Mount folder: only necessary if you want to mount it over a directory, otherwise, empty
    % Proxy: none
    % KeepAlive: preferred
    
    Example
    Example
    • After clicking “Mount” you should be able to access to your remote directory as a part of your filesystem.
    Done!
    Done!

Data Transfer on the PRACE Network

PRACE users can use the 10Gbps PRACE Network for moving large data among PRACE sites. To get access to this service it’s required to contact “support@bsc.es” requesting its use, providing the local IP of the machine from where it will be used.

The selected data transfer tool is Globus/GridFTP which is available on dt02.bsc.es

In order to use it, a PRACE user must get access to dt02.bsc.es:

    % ssh -l pr1eXXXX dt02.bsc.es

Load the PRACE environment with ‘module’ tool:

    % module load prace globus

Create a proxy certificate using ‘grid-proxy-init’:

    % grid-proxy-init 
    Your identity: /DC=es/DC=irisgrid/O=bsc-cns/CN=john.foo
    Enter GRID pass phrase for this identity:
    Creating proxy ........................................... Done
    Your proxy is valid until: Wed Aug  7 00:37:26 2013
    pr1eXXXX@dtransfer2:~>

The command ‘globus-url-copy’ is now available for transferring large data.

    globus-url-copy [-p <parallelism>] [-tcp-bs <size>] <sourceURL> <destURL>

Where:

All the available PRACE GridFTP endpoints can be retrieved with the ‘prace_service’ script:

    % prace_service -i -f bsc
    gftp.prace.bsc.es:2811

More information is available at the PRACE website

Active Archive Management ↩

To move or copy from/to AA you have to use our special commands, available in dt01.bsc.es and dt02.bsc.es or any other machine by loading “transfer” module:

These commands submit a job into a special class performing the selected command. Their syntax is the same than the shell command without ‘dt’ prefix (cp, mv, rsync, tar).

    dtq

dtq shows all the transfer jobs that belong to you, it works like squeue in SLURM.

    dtcancel <job_id>

dtcancel cancels the transfer job with the job id given as parameter, it works like scancel in SLURM.

    % dttar -cvf  /gpfs/archive/hpc/group01/outputs.tar ~/OUTPUTS 
    # Example: Copying data from /gpfs to /gpfs/archive/hpc    
    % dtcp -r  ~/OUTPUTS /gpfs/archive/hpc/group01/
    # Example: Copying data from /gpfs/archive/hpc to /gpfs
    % dtcp -r  /gpfs/archive/hpc/group01/OUTPUTS ~/
    # Example: Copying data from /gpfs to /gpfs/archive/hpc    
    % dtrsync -avP  ~/OUTPUTS /gpfs/archive/hpc/group01/
    # Example: Copying data from /gpfs/archive/hpc to /gpfs
    % dtrsync -avP  /gpfs/archive/hpc/group01/OUTPUTS ~/
    # Example: Copying data from group01 to group02
    % dtsgrsync group02 /gpfs/projects/group01/OUTPUTS /gpfs/projects/group02/
    # Example: Moving data from /gpfs to /gpfs/archive/hpc    
    % dtmv ~/OUTPUTS /gpfs/archive/hpc/group01/
    # Example: Moving data from /gpfs/archive/hpc to /gpfs
    % dtmv /gpfs/archive/hpc/group01/OUTPUTS ~/

Additionally, these commands accept the following options:

--blocking: Block any process from reading file at final destination until transfer completed.

--time: Set up new maximum transfer time (Default is 18h).

It is important to note that these kind of jobs can be submitted from both the ‘login’ nodes (automatic file management within a production job) and ‘dt01.bsc.es’ machine. AA is only mounted in Data Transfer Machine. Therefore if you wish to navigate through AA directory tree you have to login into dt01.bsc.es

Repository management (GIT/SVN) ↩

There’s no outgoing internet connection from the cluster, which prevents the use of external repositories directly from our machines. To circumvent that, you can use the “sshfs” command in your local machine, as explained in the previous [Setting up sshfs (Linux)] and [Setting up sshfs (Windows)] sections.

Doing that, you can mount a desired directory from our GPFS filesystem in your local machine. That way, you can operate your GPFS files as if they were stored in your local computer. That includes the use of git, so you can clone, push or pull any desired repositories inside that mount point and the changes will transfer over to GPFS.

Running Jobs ↩

LSF is the utility used at Nord III for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at the Cluster.

LSF Commands ↩

These are the basic commands to submit, control and check your jobs:

 bsub < job_script

submits a “job script” passed through standard input (STDIN) to the queue system. Job directives explains the available options to write a jobscript

 bjobs [-w][-X][-l job_id]

shows all the submitted jobs.

 bkill <job_id>

remove the job from the queue system, canceling the execution of the processes, if they were still running.

 bsc_jobs

shows all the pending or running jobs from your group. Check BSC Commands for more information.

Interactive Sessions ↩

Allocation of an interactive session in the interactive partition has to be done by executing:

    bsub -q interactive -W 01:00 -n 1 -Is /bin/bash
  • Please note that the run limit for an interactive session is preset to 4 hours (-W 04:00).

Job directives ↩

A job must contain a series of directives to inform the batch system about the characteristics of the job. We encourage that you read the bsub command’s manual from any of Nord’s terminals:

    % man bsub

Here we provide a short summary of most common directives:

    #BSUB -J job_name

Specify the name (description) of the job.

    #BSUB -q debug

Specify the queue for the job to be submitted. The debug queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus (4 nodes), and one hour of wall clock limit. The queue might be reassigned by LSF internal policy, as with the sequential queue.

    #BSUB -W HH:MM

Specify how much time the job will be allowed to run. This is a mandatory field. NOTE: take into account that you can not specify the amount of seconds in LSF. You must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.

    #BSUB -cwd pathname

The working directory of your job (i.e. where the job will run).If not specified, it is the current working directory at the time the job was submitted.

    #BSUB -e/-eo file

The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will APPEND the file, -eo will REPLACE the file.

    #BSUB -o/-oo file

The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the file, -oo will REPLACE the file.

    #BSUB -n number

The number of tasks for the job. In MPI executions corresponds to the number of MPI processes and for sequential executions the number of cores.

    #BSUB -x

Use the nodes exclusively. This is the default behaviour except for sequential executions.

    #BSUB -M number

Specify the minimum amount of memory necessary for each task, specified in MB. This option is used to better schedule the jobs and select the compute nodes adequate to fulfill the job’s needs. By default 1800 MB are reserved per task. Exclusive jobs will still get all available memory.

The default option could choose any kind of node. You can request the minimum amount of necessary memory with “-M” memory per task (in MB) and “-n” number of tasks. If the multiplication of these values is more than 32 GB, then we will get nodes of 64 GB or 128 GB of memory. Therefore, if we set a value higher than 64 GB, we will get nodes of 128 GB of memory.

Here is a list of the common parameters needed to request each type of node:

You may check this FAQ for a more in-depth explanation. Note: For non-LowMem requests you must specify a ptile of 16.

    #BSUB -U reservation_ID

The reservation where your jobs will be allocated (assuming that your account has access to that reservation). In some ocasions, node reservations can be granted for executions where only a se t of accounts can run jobs. Useful for courses.

    #BSUB -R "span[ptile=number]"

The number of processes assigned to a node. Note that full nodes will be allocated except for sequential executions. Examples:

    # if you want to use 4 processes per node and 4 threads:

    #BSUB -R "span[ptile=4]"
    export OMP_NUM_THREADS=4
    # if your program has high memory 
    # consumption you can reduce the number 
    # of processes per node

    #BSUB -R "span[ptile=14]"

MPI particulars ↩

There are different MPI implementations available for usage. Some of them present special requirements to be used.

OpenMPI

OpenMPI is an Open Source MPI implementation and is loaded by default on our machines. You can use it like this:

    % module load openmpi
    % mpicc / mpif90 your_files

Once compiled, your jobscript must invoke mpirun:

    #!/bin/bash
    #BSUB -n 128
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err
    #BSUB -J openmpi_example
    #BSUB -W 00:05

    module load openmpi
    mpirun binary.exe

IntelMPI

IntelMPI is an MPI implementation developed by Intel. You can use it like this:

    % module load intel impi
    % mpicc / mpif90 your_files

Once compiled, your jobscript must invoke mpirun:

    #!/bin/bash
    #BSUB -n 128
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err
    #BSUB -J impi_example
    #BSUB -W 00:05

    module load impi
    mpirun binary.exe

IBM POE

IBM POE is an alternative MPI library binary compatible with IntelMPI. Use POE for executions with a large number of cores. First, in order to compile your MPI code using IBM POE:

    % module load poe
    % mpicc / mpif90 your_files

Once compiled, your jobscript must mention it explicitly:

    #!/bin/bash
    #BSUB -n 128
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err
    #BSUB -J poe_example
    #BSUB -W 00:05
    #BSUB -a poe

    module load poe
    poe binary.exe

Jobscript examples ↩

Purely sequential

    #!/bin/bash 

    #BSUB -n 1
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err
    #BSUB -J sequential
    #BSUB -W 00:05

    ./serial.exe

Sequential with OpenMP threads

    #!/bin/bash

    #BSUB -n 16 
    #BSUB -R "span[ptile=16]"
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err
    #BSUB -J sequential_OpenMP
    #BSUB -x
    #BSUB -W 00:05

    export OMP_NUM_THREADS=16
    ./serial.exe

Parallel using MPI

    #!/bin/bash 

    #BSUB -n 128
    #BSUB -o output_%J.out
    #BSUB -e output_%J.err
    # In order to launch 128 processes with 
    # 16 processes per node:
    #BSUB -R "span[ptile=16]"
    #BSUB -J WRF.128-4
    #BSUB -W 02:00

    # You can choose the parallel environment through modules
    module load intel openmpi

    mpirun ./wrf.exe

Parallel using MPI and OpenMP threads

    #!/bin/bash 

    # The total number of processes:
    # 128 MPI processes + 2 OpenMP threads/process = 256 cores
    #BSUB -n 128             # processes
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err

    ######################################################
    # This will allocate 8 processes per node so we have # 
    # 8 cores per node for the threads                   #
    ######################################################
    #BSUB -R "span[ptile=8]"

    # exclusive mode (enabled by default)
    #BSUB -x

    #####################################################
    # Then (128 MPI tasks) / (8 tasks/node) = 16 nodes  #
    # reserved                                          #
    # 16 nodes * 16 cores/node = 256 cores allocated    #
    # (Matches the amount of cores asked for above)     #
    #####################################################

    #BSUB -J WRF.128-4
    #BSUB -W 02:00

    # Clean your environment modules
    module purge

    # You can choose the parallel environment through
    # modules
    module load intel openmpi

    # 8 MPI processes per node and 16 cpus available 
    # (2 threads per MPI process):
    export OMP_NUM_THREADS=2

    mpirun ./wrf.exe

Parallel using MPI and extra memory

The following example shows how to ask for 3000 MB/task instead of the default 1800 MB/task. As no ‘-R’ options are included it will still put 16 processes on each node, so the Common (2 GB/core) nodes won’t be able to execute this job. This effectively guarantees that either MedMem (4 GB/core) or HighMem (8 GB/core) nodes will be used, or a mixture of both.

    #!/bin/bash

    # Total number of tasks
    #BSUB -n 128
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err

    # Requesting 3000 MB per task
    # As no ptile is specified, only Medium and High memory nodes
    # are elegible for this execution (max 128 nodes, 2048 cores)
    #BSUB -M 3000

    #BSUB -W 01:00

    module purge
    module load intel openmpi

    mpirun ./my_mpi.exe

The following example requests the same memory per task as the preceding one but it is set to be executed on any nodes. This is done by reducing the amount of tasks per node so the total memory requested by all the tasks in the same node is below the LowMem memory threshold. This would be necessary to execute jobs that need more nodes or cpus than the MedMem and HighMem nodes can provide.

    #!/bin/bash

    # Total number of tasks
    #BSUB -n 128
    #BSUB -oo output_%J.out
    #BSUB -eo output_%J.err

    # Requesting 3000 MB per task
    #BSUB -M 3000

    # Only 9 task per node.
    # All compute nodes are available for execution
    #BSUB -R "span[ptile=9]"

    #BSUB -W 01:00

    module purge
    module load intel openmpi

    mpirun ./my_mpi.exe

Queues ↩

There are several queues present in the machines and different users may access different queues. All queues have different limits in amount of cores for the jobs and duration. You can check anytime all queues you have access to and their limits using:

    % bsc_queues

Check BSC Commands for more information.

Sequential executions

For any job that requires a node or less resources, the sequential queue is automatically applied. This queue is the only one that uses the same node for more than one job at a time. It also has the least priority in the machine and the number of concurrently executed sequential jobs is limited to avoid disturbing large jobs’ execution.

If you request 16 processes per node or use -x option, the full node will be assigned to your job. If you have memory problems, mind to specify the exclusive flag because just specifying a ptile may still share the node with other users.

Software Environment ↩

All software and numerical libraries available at the cluster can be found at /apps/. If you need something that is not there please contact us to get it installed (see Getting Help).

C Compilers ↩

In the cluster you can find these C/C++ compilers :

icc / icpc -> Intel C/C++ Compilers

    % man icc
    % man icpc

gcc /g++ -> GNU Compilers for C/C++

    % man gcc
    % man g++

All invocations of the C or C++ compilers follow these suffix conventions for input files:

.C, .cc, .cpp, or .cxx -> C++ source file.
.c -> C source file
.i -> preprocessed C source file
.so -> shared object file
.o -> object file for ld command
.s -> assembler source file

By default, the preprocessor is run on both C and C++ source files.

These are the default sizes of the standard C/C++ datatypes on the machine

Default datatype sizes on the machine
Type Length (bytes)
bool (c++ only) 1
char 1
wchar_t 4
short 2
int 4
long 8
float 4
double 8
long double 16

Distributed Memory Parallelism

To compile MPI programs it is recommended to use the following handy wrappers: mpicc, mpicxx for C and C++ source code. You need to choose the Parallel environment first: module load openmpi / module load impi / module load poe. These wrappers will include all the necessary libraries to build MPI applications without having to specify all the details by hand.

    % mpicc a.c -o a.exe
    % mpicxx a.C -o a.exe 

Shared Memory Parallelism

OpenMP directives are fully supported by the Intel C and C++ compilers. To use it, the flag -qopenmp must be added to the compile line.

    % icc -qopenmp -o exename filename.c
    % icpc -qopenmp -o exename filename.C

You can also mix MPI + OPENMP code using -openmp with the mpi wrappers mentioned above.

Automatic Parallelization

The Intel C and C++ compilers are able to automatically parallelize simple loop constructs, using the option “-parallel” :

    % icc -parallel a.c

FORTRAN Compilers ↩

In the cluster you can find these compilers :

ifort -> Intel Fortran Compilers

    % man ifort

gfortran -> GNU Compilers for FORTRAN

    % man gfortran

By default, the compilers expect all FORTRAN source files to have the extension “.f”, and all FORTRAN source files that require preprocessing to have the extension “.F”. The same applies to FORTRAN 90 source files with extensions “.f90” and “.F90”.

Distributed Memory Parallelism

In order to use MPI, again you can use the wrappers mpif77 or mpif90 depending on the source code type. You can always man mpif77 to see a detailed list of options to configure the wrappers, ie: change the default compiler.

    % mpif77 a.f -o a.exe

Shared Memory Parallelism

OpenMP directives are fully supported by the Intel Fortran compiler when the option “-qopenmp” is set:

    % ifort -qopenmp 

Automatic Parallelization

The Intel Fortran compiler will attempt to automatically parallelize simple loop constructs using the option “-parallel”:

    % ifort -parallel

Modules Environment ↩

The Environment Modules package (http://modules.sourceforge.net/) provides a dynamic modification of a user’s environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application or a compilation. Modules can be loaded and unloaded dynamically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl.

Installed software packages are divided into five categories:

Modules tool usage

Modules can be invoked in two ways: by name alone or by name and version. Invoking them by name implies loading the default module version. This is usually the most recent version that has been tested to be stable (recommended) or the only version available.

    % module load intel

Invoking by version loads the version specified of the application. As of this writing, the previous command and the following one load the same module.

    % module load intel/2017.1

The most important commands for modules are these:

You can run “module help” any time to check the command’s usage and options or check the module(1) manpage for further information.

BSC Commands ↩

The Support team at BSC has provided some commands useful for user’s awareness and ease of use in our HPC machines. These commands are available through a special module (bsc/current) loaded at the beginning of any session. A short summary of these commands follows:

All available commands have a dedicated manpage (not all commands are available for all machines). You can check more information about these commands checking their respective manpage:

% man <bsc_command>

For example:
% man bsc_quota

TotalView ↩

TotalView is a graphical portable powerful debugger from Rogue Wave Software designed for HPC environments. It also includes MemoryScape and ReverseEngine. It can debug one or many processes and/or threads. It is compatible with MPI, OpenMP, Intel Xeon Phi and CUDA.

Users can access to the latest version of TotalView 8.13 installed in:

    /apps/TOTALVIEW/totalview

Important: Remember to access with ssh -X to the cluster and submit the jobs to x11 queue since TotalView uses a single window control.

There is a Quick View of TotalView available for new users. Further documentation and tutorials can be found on their website or in the cluster at:

    /apps/TOTALVIEW/totalview/doc/pdf

Tracing jobs with BSC Tools ↩

In this section you will find an introductory guide to get execution traces in Nord. The tracing tool Extrae supports many different tracing mechanisms, programming models and configurations. For detailed explanations and advanced options, please check the complete Extrae User Guide

The most recent stable version of Extrae is always located at:

    /apps/BSCTOOLS/extrae/latest

This package is compatible with the default MPI runtime in Nord (OpenMPI). Packages corresponding to older versions and enabling compatibility with other MPI runtimes (IntelMPI, MVAPICH) can be respectively found under this directory structure:

    /apps/BSCTOOLS/extrae/<choose-version>/<choose-runtime>/

In order to trace an execution, you have to load the module extrae and write a script that sets the variables to configure the tracing tool. Let’s call this script trace.sh. It must be executable (chmod +x ./trace.sh). Then your job needs to run this script before executing the application.

Example for MPI jobs:

#!/bin/bash 
#BSUB -n 128 
#BSUB -o output_%J.out 
#BSUB -e output_%J.err 
#BSUB -R "span[ptile=16]" 
#BSUB -J  job_name
#BSUB -W 00:10

module load extrae

mpirun ./trace.sh ./app.exe

Example for threaded (OpenMP or pthreads) jobs:

#!/bin/bash 
#BSUB -n 1 
#BSUB -oo output_%J.out 
#BSUB -eo output_%J.err 
#BSUB -J job_name 
#BSUB -W 00:10 

module load extrae

./trace.sh ./app.exe

Example of trace.sh script:

#!/bin/bash 

export EXTRAE_CONFIG_FILE=./extrae.xml
export LD_PRELOAD=${EXTRAE_HOME}/lib/<tracing-library>
$*

Where:

Job Type Tracing library An example to get started
MPI libmpitrace.so (C codes)
libmpitracef.so (Fortran Codes)
MPI/ld-preload/job.lsf
OpenMP libomptrace.so OMP/run_ldpreload.sh
Pthreads libpttrace.so PTHREAD/README
OmpSs - OMPSS/job.lsf
*Sequential job (manual instrumentation) libseqtrace.so SEQ/run_instrumented.sh
**Automatic instrumentation of user
functions and parallel runtime calls
- SEQ/run_dyninst.sh

* Jobs that make explicit calls to the Extrae API do not load the tracing library via LD_PRELOAD, but link with the libraries instead.

** Jobs using automatic instrumentation via Dyninst neither load the tracing library via LD_PRELOAD nor link with it.

For other programming models and their combinations, check the full list of available tracing libraries at section 1.2.2 of the Extrae User Guide.

Getting help ↩

BSC provides users with excellent consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 18 p.m. (CEST time).

User questions and support are handled at: support@bsc.es

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output files. Please contact BSC if you have any questions or comments regarding policies or procedures.

Our address is:

Barcelona Supercomputing Center – Centro Nacional de Supercomputación
C/ Jordi Girona, 31, Edificio Capilla 08034 Barcelona

Frequently Asked Questions (FAQ) ↩

You can check the answers to most common questions at BSC’s Support Knowledge Center. There you will find online and updated versions of our documentation, including this guide, and a listing with deeper answers to the most common questions we receive as well as advanced specific questions unfit for a general-purpose user guide.

Appendices ↩

SSH ↩

SSH is a program that enables secure logins over an insecure network. It encrypts all the data passing both ways, so that if it is intercepted it cannot be read. It also replaces the old an insecure tools like telnet, rlogin, rcp, ftp,etc. SSH is a client-server software. Both machines must have ssh installed for it to work.

We have already installed a ssh server in our machines. You must have installed an ssh client in your local machine. SSH is available without charge for almost all versions of UNIX (including Linux and MacOS X). For UNIX and derivatives, we recommend using the OpenSSH client, downloadable from http://www.openssh.org, and for Windows users we recommend using Putty, a free SSH client that can be downloaded from http://www.putty.org. Otherwise, any client compatible with SSH version 2 can be used. If you want to try a simpler client with multi-tab capabilities, we also recommend using Solar-PuTTY (https://www.solarwinds.com/free-tools/solar-putty).

This section describes installing, configuring and using PuTTy on Windows machines, as it is the most known Windows SSH client. No matter your client, you will need to specify the following information:

For example with putty client:

Putty client
Putty client

This is the first window that you will see at putty startup. Once finished, press the Open button. If it is your first connection to the machine, your will get a Warning telling you that the host key from the server is unknown, and will ask you if you are agree to cache the new host key, press Yes.

Putty certificate security alert
Putty certificate security alert

IMPORTANT: If you see this warning another time and you haven’t modified or reinstalled the ssh client, please do not log in, and contact us as soon as possible (see Getting Help).

Finally, a new window will appear asking for your login and password:

Cluster login
Cluster login

Generating SSH keys with PuTTY

First of all, open PuTTY Key Generator. You should select Type RSA and 2048 or 4096 bits, then hit the “Generate” button.

Public key PuTTY window selection
Public key PuTTY window selection

After that, you will have to move the mouse pointer inside the blue rectangle, as in picture:

PuTTY box where you have to move your mouse
PuTTY box where you have to move your mouse

You will find and output similar to the following picture when completed

PuTTY dialog when completed
PuTTY dialog when completed

This is your public key, you can copy the text in the upper text box to the notepad and save the file. On the other hand, click on “Save private key” as in the previous picture, then export this file to your desired path.

You can close PuTTY Key Generator and open PuTTY by this time,

To use your recently saved private key go to Connection -> SSH -> Auth, click on Browse… and select the file.

PuTTY SSH private key selection
PuTTY SSH private key selection

Transferring files on Windows ↩

To transfer files to or from the cluster you need a secure FTP (SFTP) o secure copy (SCP) client. There are several different clients, but as previously mentioned, we recommend using the Putty clients for transferring files: psftp and pscp. You can find them at the same web page as PuTTY (http://www.putty.org), you just have to go to the download page for PuTTY and you will see them in the “alternative binary files” section of the page. They will most likely be included in the general PuTTY installer too.

Some other possible tools for users requiring graphical file transfers could be:

Using PSFTP

You will need a command window to execute psftp (press start button, click run and type cmd). The program first asks for the machine name (mn1.bsc.es), and then for the username and password. Once you are connected, it’s like a Unix command line.

With command help you will obtain a list of all possible commands. But the most useful are:

You will be able to copy files from your local machine to the cluster, and from the cluster to your local machine. The syntax is the same that cp command except that for remote files you need to specify the remote machine:

Copy a file from the cluster:
> pscp.exe username@mn1.bsc.es:remote_file local_file
Copy a file to the cluster:
> pscp.exe local_file username@mn1.bsc.es:remote_file

Using X11 ↩

In order to start remote X applications you need and X-Server running in your local machine. Here are two of the most common X-servers for Windows:

The only Open Source X-server listed here is Cygwin/X, you need to pay for the other.

Once the X-Server is running run putty with X11 forwarding enabled:

Putty X11 configuration
Putty X11 configuration

On the other hand, XQuartz is the most common application for this purpose in macOS. You can download it from its website:

https://www.xquartz.org

For older versions of macOS or XQuartz you may need to add these commands to your .zshrc file and open a new terminal:

export DISPLAY=:0
/opt/X11/bin/xhost +

This will allow you to use the local terminal as well as xterm to launch graphical applications remotely.

If you installed another version of XQuartz in the past, you may need to launch the following commands to get a clean installation:

    $ launchctl unload /Library/LaunchAgents/org.macosforge.xquartz.startx.plist
    $ sudo launchctl unload /Library/LaunchDaemons/org.macosforge.xquartz.privileged_startx.plist
    $ sudo rm -rf /opt/X11* /Library/Launch*/org.macosforge.xquartz.* /Applications/Utilities/XQuartz.app /etc/*paths.d/*XQuartz
    $ sudo pkgutil --forget org.macosforge.xquartz.pkg

I tried running a X11 graphical application and got a GLX error, what can I do?

If you are running on a macOS/Linux system and, when you try to use some kind of graphical interface through remote SSH X11 remote forwarding, you get an error similar to this:

X Error of failed request: BadValue (integer parameter out of range for operation)
    Major opcode of failed request: 154 (GLX)
    Minor opcode of failed request: 3 (X_GLXCreateContext)
    Value in failed request: 0x0
    Serial number of failed request: 61
    Current serial number in output stream: 62

Try to do this fix:

macOS:

    $ defaults find xquartz | grep domain

You should get something like ‘org.macosforge.xquartz.X11’ or ‘org.xquartz.x11’, use this text in the following command (we will use org.xquartz.x11 for this example):

    $ defaults write org.xquartz.x11 enable_iglx -bool true

Linux:

    Section "ServerFlags"
        Option "AllowIndirectGLX" "on"
        Option "IndirectGLX" "on"
    EndSection

This solves the error most of the time. The error is related to the fact that some OS versions have disabled indirect GLX by default, or disabled it at some point during an OS update.

Requesting and installing a .X509 user certificate ↩

If you are a BSC employee (and you also have a PRACE account), you may be interested in obtaining and configuring a x.509 Grid certificate. If that is the case, you should follow this guide. First, you should obtain a certificate following the details of this guide (you must be logged in the BSC intranet):

Once you have finished requesting the certificate, you must download it in a “.p12” format. This procedure may be different depending on which browser you are using. For example, if you are using Mozilla Firefox, you should be able to do it following these steps:

Once you have obtained the copy of your certificate, you must set up your environment in your HPC account. To acomplish that, follow these steps:

    module load prace globus
    cd ~/.globus
    openssl pkcs12 -nocerts -in usercert.p12 -out userkey.pem 
    chmod 0400 userkey.pem 
    openssl pkcs12 -clcerts -nokeys -in usercert.p12 -out usercert.pem 
    chmod 0444 usercert.pem

Once you have finished all the steps, your personal certificate should be fully installed.