Support Knowledge Center

Barcelona Supercomputing Center

CTE-POWER User's Guide

Power9 CTE User's Guide

Table of Contents

  1. Introduction
  2. System Overview
    1. Other relevant information
  3. Compiling applications
  4. Connecting to CTE-POWER
    1. Password Management
    2. Transferring files
  5. File Systems
    1. Root Filesystem
    2. GPFS Filesystem
    3. Local Hard Drive
    4. NVMe
    5. Quotas
  6. Running Jobs
    1. Submitting jobs
    2. Interactive Sessions
    3. Job directives
  7. Software Environment
    1. C Compilers
    2. FORTRAN Compilers
    3. Modules Environment
    4. BSC Commands
  8. Getting help
    1. Frequently Asked Questions (FAQ)
  9. Appendices
    1. SSH
    2. Transferring files
    3. Using X11
    4. Using the DDT debugger

Introduction ↩

This user’s guide for the CTE IBM Power9 cluster is intended to provide the minimum amount of information needed by a new user of this system. As such, it assumes that the user is familiar with many of the standard features of supercomputing as the Unix operating system.

Here you can find most of the information you need to use our computing resources and the technical documentation about the machine. Please read carefully this document and if any doubt arises do not hesitate to contact us (Getting help).

System Overview ↩

CTE-POWER is a cluster based on IBM Power9 processors, with a Linux Operating System and an Infiniband interconnection network.

It has the following configuration:

The operating system is Red Hat Enterprise Linux Server 7.5 alternative.

Other relevant information ↩

Compiling applications ↩

For compiling applications the system provides GCC version 4.8.5, IBM XL compilers for C/C++ v13.1.6 and for Fortran v15.1.6.

Via modules you can find other compilers such as:

Connecting to CTE-POWER ↩

The first thing you should know is your username and password. Once you have a login and its associated password you can get into the cluster through the following login nodes:

This will provide you with a shell in the login node. There you can compile and prepare your applications.

You must use Secure Shell (ssh) tools to login into or transfer files into the cluster. We do not accept incoming connections from protocols like telnet, ftp, rlogin, rcp, or rsh commands. Once you have logged into the cluster you cannot make outgoing connections for security reasons.

Password Management ↩

In order to change the password, you have to login to a different machine ( This connection must be established from your local machine.

    % ssh -l username

    username@dtransfer1:~> passwd
    Changing password for username.
    Old Password: 
    New Password: 
    Reenter New Password: 
    Password changed.

Mind that that the password change takes about 10 minutes to be effective.

Transferring files ↩

There are two ways to copy files from/to the Cluster:

Direct copy to the login nodes.

As said before no connections are allowed from inside the cluster to the outside world, so all scp and sftp commands have to be executed from your local machines and never from the cluster. The usage examples are in the next section.

On a Windows system, most of the secure shell clients come with a tool to make secure copies or secure ftp’s. There are several tools that accomplish the requirements, please refer to the Appendices, where you will find the most common ones and examples of use.

Data Transfer Machine

We provide special machines for file transfer (required for large amounts of data). These machines are dedicated to Data Transfer and are accessible through ssh with the same account credentials as the cluster. They are:

These machines share the GPFS filesystem with all other BSC HPC machines. Besides scp and sftp, they allow some other useful transfer protocols:

    localsystem$ scp localfile
    username's password:

    localsystem$ sftp
    username's password:
    sftp> put localfile
    localsystem$ scp localdir
    username's password:
    localsystem$ sftp
    username's password:
    sftp> get remotefile
    bbcp -V -z <USER><FILE> <DEST>
    bbcp -V <ORIG>  <USER><DEST>
    gftp-text ftps://<USER>
    get <FILE>
    put <FILE>

File Systems ↩

IMPORTANT: It is your responsibility as a user of our facilities to backup all your critical data. We only guarantee a monthly backup of user data under /gpfs/home and /gpfs/projects.

Each user has several areas of disk space for storing files. These areas may have size or time limits, please read carefully all this section to know about the policy of usage of each of these filesystems. There are 4 different types of storage available inside a node:

Root Filesystem ↩

The root file system, where the operating system is stored has its own partition.

There is a separate partition of the local hard drive mounted on /tmp that can be used for storing user data as you can read in Local Hard Drive.

GPFS Filesystem ↩

The IBM General Parallel File System (GPFS) is a high-performance shared-disk file system providing fast, reliable data access from all nodes of the cluster to a global filesystem. GPFS allows parallel applications simultaneous access to a set of files (even a single file) from any node that has the GPFS file system mounted while providing a high level of control over all file system operations. In addition, GPFS can read or write large blocks of data in a single I/O operation, thereby minimizing overhead.

An incremental backup will be performed monthly only for /gpfs/home and /gpfs/projects (not for /gpfs/scratch).

These are the GPFS filesystems available in the machine from all nodes:

Local Hard Drive ↩

Every node has a local solid-state drive that can be used as a local scratch space to store temporary files during executions of one of your jobs. This space is mounted over /scratch/tmp/$JOBID directory and pointed out by $TMPDIR environment variable. The amount of space within the /scratch filesystem is about 1,6 TB. All data stored in theselocal hard drives at the compute nodes will not be available from the login nodes. You should use the directory referred to by $TMPDIR to save your temporary files during job executions. This directory will automatically be cleaned after the job finishes.

NVMe ↩

Every node has two non-volatile memory express drives (3 TB each) that can be used as a local working directory to speed-up the code by reducing notably the access time to disk.

You can use them with the same methodology described for the local hard drive. There are environment variables to refer to the directories mounted on each drive. Use $NVME1DIR and $NVME2DIR to refer to each drive directory.

Quotas ↩

The quotas are the amount of storage available for a user or a groups’ users. You can picture it as a small disk readily available to you. A default value is applied to all users and groups and cannot be outgrown.

You can inspect your quota anytime you want using the following command from inside each filesystem:

     % bsc_quota

The command provides a readable output for the quota.

If you need more disk space in this filesystem or in any other of the GPFS filesystems, the responsible for your project has to make a request for the extra space needed, specifying the requested space and the reasons why it is needed. For more information or requests you can Contact Us.

Running Jobs ↩

Slurm is the utility used for batch processing support, so all jobs must be run through it. This section provides information for getting started with job execution at the Cluster.

Submitting jobs ↩

The method for submitting jobs is to use the SLURM sbatch directives directly.

A job is the execution unit for SLURM. A job is defined by a text file containing a set of directives describing the job’s requirements, and the commands to execute.

In order to ensure the proper scheduling of jobs, there are execution limitations in the number of nodes and cpus that cna be used at the same time by a group. You may check those limits using command ‘bsc_queues’. If you need to run an execution bigger than the limits already granted, you may contact

Important accounting changes

To ensure fair and reliable CPU usage accounting information, we’ve enforced the need to use at least 40 threads for each GPU requested. In your job scripts, make sure that the amount of threads used meet the requirements for your GPU needs. Note that Slurm does refer to each thread as if it was a physical CPU.

The value of “cpu-per-task” x “task-per-node” should amount to those 40 threads. Remember that, by default, the value of “cpu-per-task” is 1.

If you can’t change the number of tasks in your job, you can edit the number of CPUs per task (#SBATCH –cpus-per-task=). In order to not affect your executions, you can choose the desired CPUs per task by setting the environment variable OMP_NUM_THREADS (this variable may not work for every application).

Otherwise, an error message will be displayed pointing out this issue:

    sbatch: error: Minimum cpus requested should be (nodes * gpus/node * 40). 
    Cpus requested: X. Gpus: Y, Required cpus: Z
    sbatch: error: Batch job submission failed: CPU count specification invalid

SBATCH commands

These are the basic directives to submit jobs with sbatch:

    sbatch <job_script>

submits a “job script” to the queue system (see Job directives).


shows all the submitted jobs.

    scancel <job_id>

remove the job from the queue system, canceling the execution of the processes, if they were still running.

Interactive Sessions ↩

Allocation of an interactive session in the debug partition has to be done through SLURM:

salloc -t 00:10:00 -n 1 -c 64 -J debug srun --pty /bin/bash

You may add -c <ncpus> to allocate n CPUs.

Job directives ↩

A job must contain a series of directives to inform the batch system about the characteristics of the job. These directives appear as comments in the job script and have to conform to either the sbatch syntaxes.

sbatch syxtax is of the form:

     #SBATCH --directive=value

Additionally, the job script may contain a set of commands to execute. If not, an external script may be provided with the ‘executable’ directive. Here you may find the most common directives for both syntaxes:

     # sbatch
     #SBATCH --qos=debug

This partition is only intended for small tests.

     # sbatch
     #SBATCH --time=HH:MM:SS

The limit of wall clock time. This is a mandatory field and you must set it to a value greater than real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the time has passed.

     # sbatch
     #SBATCH -D pathname

The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.

     # sbatch
     #SBATCH --error=file

The name of the file to collect the standard error output (stderr) of the job.

     # sbatch
     #SBATCH --output=file

The name of the file to collect the standard output (stdout) of the job.

     # sbatch
     #SBATCH --ntasks=number

The number of processes to start.

Optionally, you can specify how many threads each process would open with the directive:

     # sbatch
     #SBATCH --cpus-per-task=number

The number of cpus assigned to the job will be the total_tasks number * cpus_per_task number.

     # sbatch
     #SBATCH --ntasks-per-node=number

The number of tasks assigned to a node.

    # sbatch
    #SBATCH --gres=gpu:number

The number of GPU assigned to a node.

    #SBATCH --exclusive

To request an exclusive use of a compute node without sharing the resources with other users.

    #SBATCH --reservation=reservation_name

The reservation where your jobs will be allocated (assuming that your account has access to that reservation). In some ocasions, node reservations can be granted for executions where only a se t of accounts can run jobs. Useful for courses.

     # sbatch
     #SBATCH --switches=number@timeout

By default, Slurm schedules a job in order to use the minimum amount of switches. However, a user can request a specific network topology in order to run his job. Slurm will try to schedule the job for timeout minutes. If it is not possible to request number switches (from 1 to 14) after timeout minutes, Slurm will schedule the job by default.

Variable Meaning
SLURM_JOBID Specifies the job ID of the executing job
SLURM_NPROCS Specifies the total number of processes in the job
SLURM_NNODES Is the actual number of nodes assigned to run your job
SLURM_PROCID Specifies the MPI rank (or relative process ID) for the current process. The range is from 0-(SLURM_NPROCS–1)
SLURM_NODEID Specifies relative node ID of the current job. The range is from 0-(SLURM_NNODES–1)
SLURM_LOCALID Specifies the node-local task ID for the process within a job


sbatch examples

Example for a sequential job:

    #SBATCH --job-name="test_serial"
    #SBATCH -D .
    #SBATCH --output=serial_%j.out
    #SBATCH --error=serial_%j.err
    #SBATCH --ntasks=1
    #SBATCH --time=00:02:00
    ./serial_binary> serial.out

The job would be submitted using:

    > sbatch ptest.cmd

Example for a parallel job:

     #SBATCH --job-name=test_parallel
     #SBATCH -D .
     #SBATCH --output=mpi_%j.out
     #SBATCH --error=mpi_%j.err
     #SBATCH --ntasks=16
     #SBATCH --cpus-per-task=4
     #SBATCH --time=00:02:00
     #SBATCH --gres=gpu:2
     mpirun ./parallel_binary> parallel.output

Software Environment ↩

All software and numerical libraries available at the cluster can be found at /apps/. If you need something that is not there please contact us to get it installed (see Getting Help).

C Compilers ↩

In the cluster you can find these C/C++ compilers :

xlc / xlc++ -> IBM C/C++ Compilers

    % xlc --help
    % xlc++ --help

gcc / g++ -> GNU Compilers for C/C++

    % man gcc
    % man g++

pgcc / pgc++ -> Portland Group Compilers for C/C++

    % module load pgi
    % man gcc
    % man g++

All invocations of the C or C++ compilers follow these suffix conventions for input files:

.C, .cc, .cpp, or .cxx -> C++ source file.
.c -> C source file
.i -> preprocessed C source file
.so -> shared object file
.o -> object file for ld command
.s -> assembler source file

By default, the preprocessor is run on both C and C++ source files.

These are the default sizes of the standard C/C++ datatypes on the machine

Default datatype sizes on the machine
Type Length (bytes)
bool (c++ only) 1
char 1
wchar_t 4
short 2
int 4
long 8
float 4
double 8
long double 16

Distributed Memory Parallelism

To compile MPI programs it is recommended to use the following handy wrappers: mpicc, mpicxx for C and C++ source code. You need to choose the Parallel environment first: module load openmpi / module load ibm_mpi. These wrappers will include all the necessary libraries to build MPI applications without having to specify all the details by hand.

    % mpicc a.c -o a.exe
    % mpicxx a.C -o a.exe 

Shared Memory Parallelism

OpenMP directives are supported by the GNU/PGI C and C++ compilers. To use it, the flag -fopenmp/-mp must be added to the compile line.

    % gcc -fopenmp -o exename filename.c
    % g++ -fopenmp -o exename filename.C

You can also mix MPI + OPENMP code using -fopenmp/-mp with the mpi wrappers mentioned above.

FORTRAN Compilers ↩

In the cluster you can find these compilers :

xlf -> IBM Fortran Compiler

    % xlf -qhelp

gfortran -> GNU Compilers for FORTRAN

    % man gfortran

pgfortran -> Portland Group Compilers for Fortran

    % module load pgi
    % man pgfortran

By default, the compilers expect all FORTRAN source files to have the extension “.f”, and all FORTRAN source files that require preprocessing to have the extension “.F”. The same applies to FORTRAN 90 source files with extensions “.f90” and “.F90”.

Distributed Memory Parallelism

In order to use MPI, again you can use the wrappers mpif77 or mpif90 depending on the source code type. You can always man mpif77 to see a detailed list of options to configure the wrappers, ie: change the default compiler.

    % mpif77 a.f -o a.exe

Shared Memory Parallelism

OpenMP directives are supported by the GNU/PGI Fortran compiler when the option -fopenmp/-mp is set:

    % gfortran -fopenmp
    % pgfortran -mp

Modules Environment ↩

The Environment Modules package ( provides a dynamic modification of a user’s environment via modulefiles. Each modulefile contains the information needed to configure the shell for an application or a compilation. Modules can be loaded and unloaded dynamically, in a clean fashion. All popular shells are supported, including bash, ksh, zsh, sh, csh, tcsh, as well as some scripting languages such as perl.

Installed software packages are divided into five categories:

Modules tool usage

Modules can be invoked in two ways: by name alone or by name and version. Invoking them by name implies loading the default module version. This is usually the most recent version that has been tested to be stable (recommended) or the only version available.

    % module load pgi

Invoking by version loads the version specified of the application. As of this writing, the previous command and the following one load the same module.

    % module load pgi/18.1

The most important commands for modules are these:

You can run “module help” any time to check the command’s usage and options or check the module(1) manpage for further information.

BSC Commands ↩

The Support team at BSC has provided some commands useful for user’s awareness and ease of use in our HPC machines. A short summary of these commands follows:

You can check more information about these commands through any of the following manpages:

% man bsc_commands

Getting help ↩

BSC provides users with excellent consulting assistance. User support consultants are available during normal business hours, Monday to Friday, 09 a.m. to 18 p.m. (CEST time).

User questions and support are handled at:

If you need assistance, please supply us with the nature of the problem, the date and time that the problem occurred, and the location of any other relevant information, such as output files. Please contact BSC if you have any questions or comments regarding policies or procedures.

Our address is:

Barcelona Supercomputing Center – Centro Nacional de Supercomputación
C/ Jordi Girona, 31, Edificio Capilla 08034 Barcelona

Frequently Asked Questions (FAQ) ↩

You can check the answers to most common questions at BSC’s Support Knowledge Center. There you will find online and updated versions of our documentation, including this guide, and a listing with deeper answers to the most common questions we receive as well as advanced specific questions unfit for a general-purpose user guide.

Appendices ↩


SSH is a program that enables secure logins over an insecure network. It encrypts all the data passing both ways, so that if it is intercepted it cannot be read. It also replaces the old an insecure tools like telnet, rlogin, rcp, ftp,etc. SSH is a client-server software. Both machines must have ssh installed for it to work.

We have already installed a ssh server in our machines. You must have installed an ssh client in your local machine. SSH is available without charge for almost all versions of UNIX (including Linux and MacOS X). For UNIX and derivatives, we recommend using the OpenSSH client, downloadable from, and for Windows users we recommend using Putty, a free SSH client that can be downloaded from Otherwise, any client compatible with SSH version 2 can be used.

This section describes installing, configuring and using the client on Windows machines. No matter your client, you will need to specify the following information:

For example with putty client:

Putty client
Putty client

This is the first window that you will see at putty startup. Once finished, press the Open button. If it is your first connection to the machine, your will get a Warning telling you that the host key from the server is unknown, and will ask you if you are agree to cache the new host key, press Yes.

Putty certificate security alert
Putty certificate security alert

IMPORTANT: If you see this warning another time and you haven’t modified or reinstalled the ssh client, please do not log in, and contact us as soon as possible (see Getting Help).

Finally, a new window will appear asking for your login and password:

Cluster login
Cluster login

Transferring files ↩

To transfer files to or from the cluster you need a secure ftp (sftp) o secure copy (scp) client. There are several different clients, but as previously mentioned, we recommend using of Putty clients for transferring files: psftp and pscp. You can find it at the same web page as Putty (

Some other possible tools for users requiring graphical file transfers could be:


You will need a command window to execute psftp (press start button, click run and type cmd). The program first asks for the machine name (, and then for the username and password. Once you are connected, it’s like a Unix command line.

With command help you will obtain a list of all possible commands. But the most useful are:

You will be able to copy files from your local machine to the cluster, and from the cluster to your local machine. The syntax is the same that cp command except that for remote files you need to specify the remote machine:

Copy a file from the cluster:
> pscp.exe local_file
Copy a file to the cluster:
> pscp.exe local_file

Using X11 ↩

In order to start remote X applications you need and X-Server running in your local machine. Here is a list of most common X-servers for windows:

The only Open Source X-server listed here is Cygwin/X, you need to pay for the others.

Once the X-Server is running run putty with X11 forwarding enabled:

Putty X11 configuration
Putty X11 configuration

Using the DDT debugger ↩

Introduction to debugging with DDT

Debugging programs that run on MPI can be fairly cumbersome without the right tools, so we have provided our systems with the DDT program.

DDT is a debugger initially developed by Allinea, now property of ARM. The debugger is specifically designed to be used in HPC environments, as its purpose is to keep track of the state of the program in every MPI node/task it uses.

With DDT you can (but not limited to): * Interactively track and debug program crashes that may occur on certain nodes. * Track memory related problems in your programs. * Use offline (non-interactive) debugging for long running jobs. * Get more information about crashes.

We’ll begin explaining how to set up your environment and job scripts for a simple debugging session.

Basic interactive debugging with DDT

To debug with DDT using an interactive session (as if it was a typical debugger), you need to do some things: you need to compile your program with a debugging flag and then modify your job script so your program is launched with the option to connect to the debugger (note that this is only one of the ways you use DDT).

Compiling your program

To compile your program for debugging purposes, you need to add the following flags to your compiler: * -g (enabling executable debugging) * -O0 (do not apply optimizations)

For example, the compiling line would be rewritten in the following manner:

$ mpicc application.c -o application.exe → $ mpicc  -O0 -g application.c -o application.exe 
Modifying your job script

Your job script needs to be modified so it can launch DDT when the job enters execution in the queue. To do that, you need to specify that you want to connect with the DDT debugger when the job is launched, loading the DDT module and adding these parameters to the line that launches the program:

module load DDT
mpirun ./application.exe → ddt --connect mpirun ./application.exe
Launching the job script

Finally, to launch the job script with the debugger you need to load the DDT module, start the program in background mode and then launch the modified job script:

$ module load DDT (if not already loaded)
$ ddt &
$ sbatch your_modified_jobscript

We have provided a capture of a real modified job script as an example:

Example of a job script
Example of a job script

The DDT main screen will appear, but you have to wait until the job enters execution. To do interactive debugging, we strongly recommend using the debug queue as it normally has a shorter waiting time, but remember that you will have limited resources.

When the job enters execution, you will be prompted with the option to accept the incoming connection. It will give you some options before loading the debugger, mainly so it can know if you use OpenMP, CUDA or some sort of memory debugging.

Debugging options
Debugging options

Once the desired options are selected and the program is loaded, you will see the main GUI for the debugger.

Main GUI
Main GUI
Quick look of common utilities and general usage

DDT largely operates the same way than most classical debuggers for serial applications, with the distinct difference that it can effectively track the state of the execution of every MPI process involved. We’ve made a general legend of the different utilities present on the main GUI:

Utility legend
Utility legend
  1. General debugging actions.
  2. Process selector. You can also focus on single processes or threads of a process.
  3. Project tree.
  4. Code of the selected process.
  5. Variable and stack monitoring window.
  6. Input/Output and general tracking utility window.
  7. Evaluate window (used to view values for arbitrary expressions and global variables).

Outside the process selector, everything is like a normal debugger and is used in a similar way.

Offline debugging

As we know, jobs can take a while to complete or even get into an execution state, so an interactive debugging session may not be the best solution if we expect them to take some time. DDT offers the possibility of offline debugging, allowing us to come back whenever the execution finishes. The execution will generate a file (either a .html or .txt) where you can check the parameters of the execution and the problems that it may have encountered.

To do it, you need to follow the next steps.

Compiling the application for debugging

For this step, you have to compile the application applyting the same changes we did in the previous chapter:

$ mpicc application.c -o application.exe → $ mpicc -O0 -g application.c -o application.exe 
Modifying your job script

Make sure that your script loads the ddt module:

module load DDT

And now, modify your launching adding ddt and your desired flags. Note that you have the option to choose between generating a .txt or a .html. We will generate a .html in this example:

ddt --offline --output=report.html  mpirun ./your_application.exe
Launching the job

To launch the job, you just need to launch it as if you were launching it normally. Once the execution finishes, the report file will be generated. If it was a .txt, you can check it on the login node itself. The HTML version is more user-friendly and interactive, but needs a web browser to display it, so you will need to transfer it to your local machine.

Here’s an example of a report:

HTML offline report
HTML offline report

It may have caught your eye that there’s a “Memory Leak Report” tab. DDT allows memory debugging with different granularities, which can be really helpful. Let’s talk more about that in the following chapter.

Enabling memory debugging with DDT

DDT can track down memory related issues like invalid pointers, abnormal memory allocation, memory leaks and more. You can enable memory debugging using two different methods, one for interactive debugging and the other for offline debugging.

Interactive debugging

You don’t have to modify anything for this. When your job requests a connection to DDT, you can check the “Memory Debugging” (which can be seen in Fig. 2) option with the desired parameters.

Offline debugging

For offline debugging you will need to add a simple flag to the execution line inside your job script. Using the line we used for the offline debugging chapter as an example, add this flag:

$ ddt --offline --mem-debug --output=report.html  mpirun ./your_application.exe

With this, you should be able to have memory-related information inside you report.

Example of a debugging, step by step

To end this manual, we will provide you a code and we will debug it using DDT. You can follow the same procedures that we will show by yourself. You can get the source code here (copy it to your home folder and extract it):

(Assuming you’re inside your home folder)
$ cp /apps/DDT/SRC/DDT_example.tar.gz ~
$ tar xvf DDT_example.tar.gz

Inside the generated folder you will see some source code files (one in C and the other one in Fortran, we’ll use the C version), a job script and a makefile alongside a solutions folder.


Our job is to find and fix what is wrong with the source code, so the first step will be compiling our application using our makefile (feel free to check the contents). This makefile has an option to add the required compiling flags for debugging, so we’ll take advantage of it:

$ make DEBUG=1

This will generate the required executable files for when we launch our job script.

Adapting the job script

The job script provided is functional as it is, but we will be doing an interactive debugging session, so you could be waiting for a while. To alleviate that, we will be using the debug queue, which shouldn’t have too many waiting jobs. To achieve that, add this line to your job script:

#SBATCH --qos=debug

We’re almost ready to launch it!

Launching the program and the debugger

First we need to load our DDT module:

$ module load DDT

Once we’ve done this, we can launch DDT as a background process:

$ ddt &

As you read before, the DDT window will appear, but ignore it for now. Now it’s time to launch our job script:

$ sbatch job.sub

It may take a while, but eventually your job will enter execution and DDT will prompt you with a little window telling you there’s an incoming connection. Accept it. In the next window you don’t need to check any box, just press “Run”.

Locating the issue with the debugger

First of all, lets talk a bit about the program we are launching. It’s a matrix multiplication implemented with MPI, following this algorithm:

  1. Master initializes matrices A, B and C.
  2. Master slices the matrices A and C, sends them to slaves.
  3. Master and slaves perform the multiplication.
  4. Slaves send their results back to master.
  5. Master writes the result matrix C in an output file.

Here you have a diagram showing the data distribution:

Data distribution
Data distribution

Reading the code you can see the detailed implementation. To see if the program works, we can just execute it without any break point. Let’s do that:

Program flow control bar
Program flow control bar

If everything works as expected (which is, that it isn’t really working), we should see that DDT prompts us with a notification that our program received a signal (SIGFPE, arithmetic exception) and stopped.

Program crash
Program crash

DDT will give us some hints. The first one is the nature of the problem, in this case an integer division by zero. Not only that, it also tells (and shows) the line of code that launched the error. We can deduce that there’s something wrong with the operation “size/nslices”.

Using the window to our right, we can check the values of all variables affected by the current line of code, and we can see that the problem resides in the variable “nslices”, having 0 as its value.

Variable values
Variable values

The variable “nslices” is a parameter given to the function “mmult” and it’s not changed anywhere inside it. That means that the value provided to our function is incorrect and we should check how the function was called. Looking through the code, we locate it:

mmult call
mmult call

We can see the arguments that this call provided. Specifically, we’re interested in the “mr” variable, which in theory should be the one defining the number of slices used to divide the partition the data of the matrices.

Inspecting the code, we can see that the “mr” variable is not what we thought it was. Why? Because we can see that in reality is the variable that holds the identifier of our MPI rank. Our conclusion is that the error is just putting a wrong variable as a function parameter.

Getting the process rank
Getting the process rank

This explains why only process 0 is the only that gives us this problem, as it will be the only one where “mr” equals zero. We also know that the right variable is defined in the code, so we only need to find it and put it as the argument inside the “mmult” call.

Fixing the issue

Knowing that this program distributes the data into N slices of the matrices (one for each process), we can use the variable “nproc” shown above for that purpose. The only thing left to do is to apply the change to the function call:

mmult(size, mr, mat_a, mat_b, mat_c); → mmult(size, nproc, mat_a, mat_b, mat_c);

And with this, the program should work now. Let’s recompile it and launch it again following the same steps we did for the first version, compiler and all. Once DDT is up and running, we can directly click the continue button. This time, DDT shouldn’t give us any problems and the execution should end normally, as shown see here:

Program termination without problems
Program termination without problems

And this is it. We’ve debugged our first application! Although it is a rather simple application and fix, it’s a good exercise to grasp the methodology to use with DDT. We hope you find it useful in future debugging sessions.

Where can I know more?

If you need more information about DDT and how to use it, check the reference manual: