Performance Tools


Parallel architectures enable to target more complex and ambitious problems each year. But in many cases, the achieved performance is far away from what the theoretical values promised us. Performance analysis tools allow application developers to identify and characterize the inefficiencies that caused a poor performance. We consider that this analysis must be the first step towards the optimization of an application. Optimizing without a previous analysis could be like driving without directions as it could mean wasting efforts improving parts of the code that were not the real performance bottlenecks.

In 1991 we started to work on the development of performance analysis tools, initially only for internal use and distributed since 2000. Flexibility, simplicity and the ability to interact between qualitative and quantitative information were the keys we considered most important in the design of the tools 15 years ago. These features allow us to keep working with the same tools today.


The main objective of the team is to be able to use efficiently our tools to solve any performance problem or question we face (directly as users of the tools or trough external users). This means that our tools should be easily adaptable to new platforms, environments or programs, and should be able to scale, extend and evolve in the same way that applications, platforms and programming models do.

We consider that performance analysis is, in some sense, still an art where the experience and intuition of the analyst drives the analysis and determines the quality of the results. For this reason a second objective is to work in the definition of methodologies and procedures that would simplify and facilitate the process of extracting information from the performance data. Our belief is that if performance analysis do not require a special skill or expertise, more people would be interested in applying it.


The set of performance tools that we develop, named CEPBA-Tools, is currently comprised by:

  • Paraver: A very powerful performance visualization and analysis tool based on traces that can be used to analyse any information that is expressed on its input trace format.
  • Dimemas: Simulation tool for the parametric analysis of the behaviour of message-passing applications on a configurable parallel platform.
  • Instrumentation packages: Set of programs and libraries to generate or translate Paraver and Dimemas traces. We have packages for instrumenting different programming models (MPI, OpenMP, mixed) under different platforms (IBM-AIX, Linux, SGI-IRIX, HP-Alpha) and translators from IBM-AIXtrace and LTT formats.
  • Utilities: Set of small programs to process Paraver traces to cut, summarize, translate or accumulate the performance data. They can be used independently but they are also integrated within Paraver.




Servat H, Llort G, González J, Giménez J, Labarta J. Low-Overhead Detection of Memory Access Patterns and Their Time Evolution. Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24-28, 2015, Proceedings [Internet]. 2015 ;9233:57–69. Available from:


Rosas C, Giménez J, Labarta J. Scalability Prediction for Fundamental Performance Factors. Supercomputing Frontiers and Innovation [Internet]. 2014 ;1:4-19. Available from:
Servat H, Llort G, González J, Giménez J, Labarta J. Identifying code phases using piece-wise linear regressions. 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS) [Internet]. 2014 :941-951. Available from:


González J, Gimenez J, Labarta J. Performance Analytics: understanding parallel applications using cluster and sequence analysis. 5th International Workshop on Parallel Tools for High Performance Computing. 2013 .
Servat H, Llort G, Giménez J, Labarta J. Detailed and simultaneous power and performance analysis. Concurrency and Computation: Practice and Experience. 2013 ;(Analysis of Performance and Power for Highly Parallel Systems).
Servat H, Llort G, Huck K, Giménez J, Labarta J. Framework for a Productive Performance Optimization. Parallel Computing Systems and Applications. 2013 ;39(8):336-353.
Jokanovic A, Prisacari B, Rodriguez G, Minkenberg C. Randomizing Task Placement Does Not Randomize Traffic (Enough). Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip [Internet]. 2013 :9–12. Available from:
Llort G, Servat H, González J, Giménez J, Labarta J. On the Usefulness of Object Tracking Techniques in Performance Analysis. Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis [Internet]. 2013 :29:1–29:11. Available from:


González J, Huck K, Giménez J, Labarta J. Automatic Refinement of Parallel Applications Structure Detection. LSPP '12: Proceedings of the 2012 Workshop on Large-Scale Parallel Processing. 2012 .
Servat H, Teruel X, Llort G, Duran A, Giménez J, Martorell X, Ayguadé E, Labarta J. On the Instrumentation of OpenMP and OmpSs Tasking Constructs. Euro-Par 2012: Parallel Processing Workshops. Lecture Notes in Computer Science. 2012 ;7640:414-428.
Mohr B, Voevedin V, Giménez J, Hagersten E, Knüpfer A, Nikitenko DA, Nilsson M, Servat H, Shah A, Winkler F, et al. The HOPSA Workflow and Tools. Proceedings of the 6th International Parallel Tools Workshop (Stuttgart, Germany). 2012 .
Reine S, et al. Petascaling and Performance Analysis of DALTON on Different Platforms. In: PRACE white paper. PRACE white paper. ; 2012.


Subotic V, Sancho JC, Labarta J, Valero M. The Impact of Application's Micro-Imbalance on the Communication-Computation Overlap. Parallel, Distributed and Network-Based Processing (PDP), 2011 19th Euromicro International Conference on. 2011 :191-198.
Llort G, Casas M, Servat H, Huck K, Giménez J, Labarta J. Trace Spectral Analysis toward Dynamic Levels of Detail. 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011, Tainan, Taiwan. 2011 :332 - 339.
Servat H, Llort G, Giménez J, Huck K, Labarta J. Folding: detailed analysis with coarse sampling. Tools for High Performance Computing 2011. Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing. 2011 .
Servat H, Llort G, Giménez J, Huck K, Labarta J. Unveiling Internal Evolution of Parallel Application Computation Phases. ICPP'2011: International Conference on Parallel Processing (ICPP). 2011 : 155-164 .