Skip to main content

Frequently Asked Questions

General

Backup Policy

The Backup Policy at BSC for the HPC machines is to backup daily the contents of $HOME. For each present file a few versions may be stored. For deleted files, after a few weeks only the last version is kept. If your files are stored elsewhere in the filesystem, there's no backup available (unless you specifically request for those files to be backed up to the Support Team).

Which filesystems do I have available? Which usage is intended?

Most of our machines use a shared filesystem (GPFS) across them, so all your data is accessible from your account in all of them. Also, quotas are enforced to limit user data on filesystem. Those machines present the following structure:

  • /gpfs/home
  • /apps
  • /gpfs/projects
  • /gpfs/scratch
  • /gpfs/archive/hpc

Also there is a local hard disk available in each node but not shared nor accessible from anywhere else but the node:

  • /scratch/tmp

Filesystems overview

FilesystemIntendedPermissionsBackUpExtra comment
/gpfs/homePersonal filesUserYes (daily)
/appsApplicationsOnly BSC SupportYesContact BSC Support to request access to licensed software
/gpfs/projectsInputs and results or share data with your groupGroupNo (*)(*) : Backup can be requested to support
/gpfs/scratchTemporary or intermediate files usedUserNo
/gpfs/archive/hpcLong-term data storageGroupNoOnly browsable from Data Transfer and accessed through dtcommands
/scratch/tmpLocal disk. Temporary files that don't need to be recoveredUserNo

/gpfs/home

This is your home filesystem where you should store your personal and configuration files and is private even for other members of your team.

This filesystem is backed up once a day.

/apps

This is where applications are installed in the machine. Users are not allowed to install software or modify existing installations. Some of the applications may require licenses to be used and its access may be restricted. If your group holds a valid license for any of such software and you can't access the software, do not hesitate to contact our Support Team.

/gpfs/projects

This filesystem is intended to store the inputs and results of your executions and to share data with other group members.

This filesystem is not backed up.

/gpfs/scratch

This filesystem presents the highest performance for distributed read and write and should be used to store the temporary or intermediate files used by your executions. You may copy your inputs to these filesystem for the execution and retrieve the results to /gpfs/projects for analysis.

This filesystem is not backed up and data older than 15 days may be deleted without warning if space is needed.

/gpfs/archive/hpc

This filesystem is intended for long-term data storage and is only browsable from Data Transfer. It may be accessed through dtcommands

There is no backup of data. All project's data may remain there up to a year after end of project to account for project renovations in non-contiguous periods.

/scratch/tmp

This disk is intended for temporary files that don't need to be recovered, like mesh partitions local to the node or MPI temporary communications between nodes. The data is not recoverable after the job. Do not confound with /gpfs/scratch.

How can I check how much free disk have I available? What is a quota?

All the filesystems you have available have a quota set. These quotas limit the amount of space usable by a user or group of users. We provide some convenience tools to check the usage and limits of quotas depending on the machine you are connected to. Mind that the filesystem is shared so the limits and status is the same across all machines.

You can use the command bsc_quota from the bsc_commands bundle which aggregates information about all filesystems in a comprehensive way.

For /gpfs/archive/hpc quota you should issue command dtquota.

Why can't I login into HPC|Portal?

Your account for HPC Portal is different and a separate thing from your account for accessing the machines, it only works for HPC Portal and does not interfere in any way with your machines account. If it is your first time logging in, please follow the steps explained in the HPC Portal section.

Connections

Why can't I connect/download/update... from logins to ...?

For security reasons, our clusters are unable to open outgoing connections to other machines, either internal (other BSC facilities) or external, but they will accept incoming connections. You must upload all needed data to the cluster by yourself and download it the same way.

I am a PRACE user, how can I access MareNostrum 4?

In order to have access through ssh, PRACE users have to upload their public ssh-key in the HPC Accounts portal https://hpcaccounts.bsc.es/ssh-keys

For your safety, the web portal does not upload the keys automatically and we add them after human verification, during working hours (Monday-Friday 7h-18h CET/CEST).

Why is not working my SSH key?

If you have installed an SSH key on your account as an authorized key, but it stills request a password when login, you should check the following:

  • If you have changed the default filename from the file in which the key is saved, you should specify the full-path to the private key when you log in with SSH. For example:

    $ ssh -i /home/user/.ssh/custom_key username@machine.bsc.es

  • If you have changed the permissions from your $HOME, $HOME/.ssh or $HOME.ssh/authorized_keys, it may be the problem.

    • The $HOME directory must have "rwx" permissions for the "user", and it must not have "w" permissions for group or others.
    • The $HOME/.ssh must have "rwx" permissions for the "user", and it must not have any permissions for group or others.
    • The $HOME/.ssh/authorized_keys must have "rw" permissions for the "user", and it must not have any permissions for group or others.
  • If you have set a passphrase to your ssh-key, you will need to enter the passphrase every time you log in.

  • If nothing from the previous points work, please contact Support.

How can I open a GUI in the logins?

To open GUIs you need to connect with parameter -X in your ssh connection (in Linux/OSX) or with some kind of x11 Forwarding (Windows). Examples:

Linux/OSX

   ssh -X -l username <login.bsc.es>

ERROR: "/usr/bin/manpath: can't set the locale; make sure $LC_* and $LANG are correct" or "cannot set LC_CTYPE locale"

This error is related to the locale (the language dependent character encoding) of your system being different/incompatible with MareNostrum's. If you find yourself in this situation, please try the following:

All

    LANG=es_ES.UTF-8 ssh -l username <login.bsc.es>

MacOsX

Some Mac versions have a bug in the terminal that ignores the previous setting and causes this error. You should be able to disable this behaviour by unchecking "Set locale environment variables on startup" in Terminal Settings -> Advanced.

What is this "connect to host XXX.bsc.es port 22: Connection refused" error?

Probably you are banned from the login node you are trying to access after some failed password attempts. To check if this is the problem, you can try to access to another login node from the same machine.

To unban you, please send a mail to Support specifying your public IP address and the login node you are getting the connection refused from. You can know your public IP from websites like: https://whatismyipaddress.com/

Data sharing and deletion

If you want to give access to a folder to another user, our policy for every case is the following:

If you want to share a folder with a user of your same group

You can do it by yourself by changing the permissions of the folder. The command is:

chmod <-R> g+<r/w/x> path/to/folder

The option "-R" is used in case you want it to be recursive, and "r/w/x" stands for reading permissions, write permissions and execution permissions respectively. You can specify more than one at the same time. In case the folder is from someone that is not available at the moment (vacations, off work, etc...), you can contact Support in order to grant you permissions, but the Responsible's authorization will be needed.

If you want to share a folder with a user of a different group

You will have to contact our Support Team if you accomplish all these requisites:

  • Data is stored under /gpfs/projects/<unixgroup>
  • The responsible of the unixgroup approves (put them in CC in the mail so we can check their "OK")

It is not allowed to share data under /gpfs/scratch nor /gpfs/home. After the Responsible has given its permission, we will grant you access to it.

If you want to share a folder with a different group but it is composed by the same people

Please contact our Support Team with the PI from the other group in copy (so they can give us their "OK"), and we will grant you the desired secondary group.

You need the data from someone who no longer works at BSC, or from yourself with another account

Please contact our Support Team specifying the files that you need and your username, in order to change the data's owner to you. If you have various accounts, we can grant you the requested secondary group so it is easier for you to navigate through the filesystem.

If you need to delete that data, unfortunately, we cannot do that for you. After changing the owner of this data, you will have to delete it by yourself.

Executions

I am preparing a pipeline and I want to quickly test job sbmission/job environemnt.

All accounts have access to a high priority qos named "debug". Submitting jobs to this qos will allow users to enter execuion faster. However, it implies certain limitations to jobs "timelimit" (2h) and size. Specific limits can be checked using the command "bsc_queues". Jobs can be subitted to debug qos by adding to the jobscript (SLURM):

$ #SBATCH --qos=debug

Additionally, in the clusters with SLURM available, the users will be able to request for a compute node/s in which to try different commands interactively, to run commands in the same environment as a job. This resources can be requested interactively in the terminal of any of the login nodes, by running the following (1 task example):

$ salloc -n 1 --qos=debug

Salloc command can be combined with any of the other slurm sbatch constraints, in order to request for different tasks, nodes...

I was copying data/compiling/executing something in the login and the process was killed. Why?

The logins in our facilities have 5 minutes CPU time limit. This means that any execution requiring more than that is automatically killed.

You may avoid this restriction by using the queues (interactive queue or standard job) to either compile or execute something depending of your interactive needs. To transfer files, you may use Data Transfer commands inside our cluster's filesystems or connect to our Data Transfer Machines, where uploads and downloads are not time limited.

My job failed and I see a message like "OOM Killed..." in the logs. What is this?

This is a message from the OS kernel stating that your process was consuming too much memory exceeding the node's limits and was thus killed.

If you encounter such problem you should try to change how many processes are executed on the same node. It is recommended you contact our Support Team if it's the first time you try to tweak this settings.

My job has been waiting for a long time. How can I see when will it be executed?

There is no reliable prediction of when a certain job will start its execution as the priorities in the queue are composed by several factors; thus, this value may change due to the presence of other jobs. The system is designed so all jobs will be executed eventually, but it may take longer for some of them.

If you are working in a cluster managed by SLURM, you can check the expected start time and resources to be allocated for pending jobs doing:

$ squeue --start

You should consider that as an estimation, so it can vary through time. It might start sooner or later than the time expected by the scheduler.

Performance Tuning

At [...] we saw that using the mpi option [...] gives up to [...] speedup. Have you seen a similar behavior at [...]?

We have seen that the tuning options of our different MPI implementations have very different outcomes depending on the application, the number of nodes used and sometimes, even on the input. We recommend that you benchmark different options to see which is best for your application, and if you have any doubt, contact Support Team to check the issue together. They will be happy to help you to find the best possible environment and software stack for your execution.

Our general recommendations are:

  • Prefer intel compilers for most cases (only x86 nodes, for PPC64, we usually use GNU compilers with good behavior)
  • When using Intel MKL math libraries, prefer intel compilers and IntelMPI implementation over OpenMPI ( to compile with MKL you can check the linking flags and libraries required here : https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html)
  • For big executions (with more than 250 nodes), it is better contact to support to check the scalability and performance before launching production jobs, to be sure the execution will perform properly and the usage of the hours will be effective.

I have found performance problems, what should I do?

First, you should find a method to reproduce the problem and confirm after some tests that it is reproducible (sometimes a node can have memory/network problems, so if the performance problem only happened one time, most probably it was caused by a temporary problem).

After that, you should provide all relevant information to the Support Team by e-mail together with instructions on how to reproduce your tests. Support Team will investigate the issue and contact you as soon as possible with our recommendations.

Support Team

How/when may I contact Support Team?

You may send an e-mail any time and it will be answered on the next working day on office hours (9:00 - 18:00 CET). Bank holidays correspond to Barcelona's. Depending on your Access Project:

  • RES/BSC: support AT bsc DOT es
  • PRACE: prace UNDERSCORE support AT bsc DOT es
caution

Please, bear in mind to include all the relevant information in your mail when contacting support:

  • Job Ids
  • Software and version used
  • Environment
  • Machine
  • Username
  • Exact steps that lead to the issue
  • Error messages

Who may access the resources of BSC?

Scientific access to BSC's HPC resources is granted through some national and international research projects:

How can I get access to BSC for computing?

There are periodic proposal submission deadlines for new and continuing projects 3 times a year. Visit each Access Project's website to check when is the next submission deadline.