Identifying code phases using piece-wise linear regressions. 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 941-951 (2014).
Automatic Refinement of Parallel Applications Structure Detection. LSPP '12: Proceedings of the 2012 Workshop on Large-Scale Parallel Processing (2012).
A high-productivity task-based programming model for clusters. Concurr. Comput. : Pract. Exper. 24, 2421–2448 (2012).
Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (John Wiley & Sons, Inc., 2012). at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Productive Programming of GPU Clusters with OmpSs. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012) 557-568 (2012). doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2012.58
ClusterSs: a Task-based Programming Model for Clusters. Proceedings of the 20th International ACM Symposium on High Performance Distributed Computing, San Jose, California, USA 267–268 (2011). doi:http://doi.acm.org/10.1145/1996130.1996168
Extracting the optimal sampling frequency of applications using spectral analysis. Concurrency and Computation: Practice and Experience n/a–n/a (2011). doi:10.1002/cpe.1819
Folding: detailed analysis with coarse sampling. Tools for High Performance Computing 2011. Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing (2011).
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E. International Journal of High Performance Computing Applications 25, (2011).
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES. Parallel Processing Letters 21, 173-193 (2011).
Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL. Lecture Notes in Computer Science 6548/2011, 215-229 (2011).
Parallel implementation of the integral histogram. Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems 586–598 (2011). at <http://dl.acm.org/citation.cfm?id=2034246.2034306>
A Portable Implementation of the Integral Histogram in StarSs. Proceedings of the SC11 conference (2011).
Poster: programming clusters of GPUs with OMPSs. Proceedings of the international conference on Supercomputing 378–378 (2011). doi:http://doi.acm.org/10.1145/1995896.1995961
Productive Cluster Programming with OmpSs. Euro-Par 2011 Parallel Processing 6852, 555-566 (2011).
Simulating Whole Supercomputer Applications. IEEE Micro 31, 32-45 (2011).
A Study of Speculative Distributed Scheduling on the Cell/B.E. proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (2011).
Trace Spectral Analysis toward Dynamic Levels of Detail. 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011, Tainan, Taiwan 332 - 339 (2011).
(in Perfromance TUning of Cientific Applications. CRC Press. ISBN 978-1-4398-1569-4, 2011).
Unveiling Internal Evolution of Parallel Application Computation Phases. ICPP'2011: International Conference on Parallel Processing (ICPP) 155-164 (2011).
BSC contributions in Energy-aware Resource Management for Large Scale Distributed Systems. 1st Year Workshop of the COST Action IC0804 on Energy Efficiency in Large Scale Distributed Systems 76-79 (2010).
On-line Detection of Large-scale Parallel Application's Structure. 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'2010) (2010).
Performance Data Extrapolation in Parallel Codes. ICPADS '10: Proceedings of the 16th International Conference on Parallel and Distributed Systems (2010).
Task Superscalar: An Out-of-Order Task Pipeline. IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-43) 89-100 (2010). at <http://dx.doi.org/10.1109/MICRO.2010.13>
Task Superscalar: Using Processors as Functional Units. USENIX Workshop on Hot Topics In Parallelism (HotPar) (2010).
Automatic Detection of Parallel Applications Computation Phases. IPDPS '09: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (2009).
Automatic evaluation of the computation structure of parallel applications. PDCAT'09: Proceedings of the 10th International Conference on Parallel and Distributed Computing, Applications and Technologies (2009).
Cores as Functional Units: A Task-Based, Out-of-Order, Dataflow Pipeline. Advanced Computer Architecture and Compilation for Embedded Systems (ACACES) (2009).
Impact of the memory hierarchy on shared memory architectures in multicore programming models. (2009).