Parallel Programming Models

Overview: 

C/C++ and Fortran are widely used programming languages for sequential applications. There is a variety of parallel programming models based on them. MPI is build like a set of message passing primitives, usually called from C and Fortran. OpenMP proposes extensions to such languages. There are other programming languages that extend them in different ways: Unified Parallel C and Co-Array Fortran. And there are new language proposals like Fortress (SUN), Chapel (Cray), and X-10 (IBM), which promise improved programmer productivity. Our team explores such programming environments and provides support for the execution of parallel applications on top of some of them.

Objectives: 

The main goal of the team is to investigate new and current programming paradigms and the associated runtime system support to provide high performance to parallel applications. The target architectures range from multicore and SMT processors to shared- and distributed-memory systems. The Cell BE processor is currently one of our main targets.

Target architectures and programming models

  • At the programming model, we are proposing extensions to OpenMP to improve the expressiveness of the model. The Mercurium Compiler is able to exploit multiple levels of parallelism and generate work from multiple simultaneously executing threads. Once parallelism is spawned on a coarse level, new opportunities for parallelism on a finer level result in the generation of work for all or for restricted groups of processors. Although our primary focus is on OpenMP, we are also investigating ways to exploit the parallelism in distributed memory architectures, specially now that new systems start offering local memories that must be managed by the software (i.e., in the Cell processor).
  • The runtime system offers the basic services to spawn/join parallelism and synchronize across threads. The NthLib API is simple enough to be used for multithreaded programming on a variety of multiprocessor platforms. The goal is to have a platform easy to modify to incorporate new functionalities, as needed by the new architectures. For instance, currently is being used to experiment with new ways to exploit parallelism using local memories in the Cell processor.
Projects/Areas: 
  • Nanos runtime system: NthLib is a user-level threads library primarily designed to provide runtime support as the backend of the Mercurium compiler. The main focus is to provide effective support for multiple levels of parallelism. The library not only supports the structured parallelism offered by OpenMP but also supports the execution of parallel tasks in a non-structured way. This is useful for instance to support applications that use the client-server model.
  • Mercurium compiler infrastructure: The infrastructure provides support for OpenMP 2.5 for Fortran 77/90 and C. A mechanism based on templates allow the researcher to specify program transformations for each element of the OpenMP programming model.
  • NanosDSM: NanosDSM is the support of Distributed Shared Memory for Nanos. NanosDSM comes as a run-time library that is used to keep consistent the memory used in different nodes of a cluster, in such a way that a parallel application can be run as if it was on a shared-memory architecture.

PEOPLE

PUBLICATIONS AND COMMUNICATIONS

2012

Vujic, N., et al. DMA++: On the Fly Data Realignment for On-Chip Memories. Computers, IEEE Transactions on 61, 237 -250 (2012).
Ayguadé, E., et al. Hybrid/Heterogeneous Programming with OmpSs and its Software/Hardware Implications. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (2012).at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Bueno-Hedo, J., et al. Productive Programming of GPU Clusters with OmpSs. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012) (2012).at <http://www.ipdps.org/ipdps2012/2012_advance_program.html>

2011

Ferrer, R., et al. Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL. Lecture Notes in Computer Science 6548/2011, 215-229 (2011).
Alvanos, M., Tiotto, E., Farreras, M. & Martorell, X. Improving communication in PGAS environments: Data prefetching and aggregation in UPC. 20th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2011) (2011).at <https://www-927.ibm.com/ibm/cas/cascon/>
Bueno, J., et al. Productive Cluster Programming with OmpSs. Euro-Par 2011 Parallel Processing 6852, 555-566 (2011).
Caballero, D., Ferrer, R., Duran, A., Martorell, X. & Ayguadé, E. User-directed Auto-vectorization in OmpSs. ACACES 2011. Poster Abstracts. Advanced Computer Architecture and Compilation for Embedded Systems (2011).at <http://www.hipeac.net/summerschool>
Royuela, S., Ferrer, R., Duran, A. & Martorell, X. Compiler Analysis for Improving OpenMP Code Generation. ACACES 2011. Poster Abstracts. Advanced Computer Architecture and Compilation for Embedded Systems (2011).at <http://www.hipeac.net/summerschool>
Alvarez, L., et al. Design space exploration for aggressive core replication schemes in CMPs. Proceedings of the 20th international symposium on High performance distributed computing 269–270 (2011).doi:http://doi.acm.org/10.1145/1996130.1996169

Pages