Publications

Export 101 results:
Author Title [ Type(Desc)] Year
Filters: Author is Xavier Martorell  [Clear All Filters]
Book Chapter
Ayguadé, E. et al. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (John Wiley & Sons, Inc., 2012). at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Conference Paper
Milovanovic, M. et al. Transactional Memory and OpenMP. International Workshop on OpenMP (IWOMP-2007) 37–53 (Springer-Verlag, 2007). at <http://capinfo.e.ac.upc.edu/PDFs/dir05/file003195.pdf>
International Conferences
Patt, Y. N., Foglia, P., Duesterwald, E., Faraboschi, P. & Martorell, X. 5th Int. Conf. on High Performance Embedded Architectures and Compilers (HiPEAC 2010). (2010). at <http://www.informatik.uni-trier.de/ ley/db/conf/hipeac/hipeac2010.html>
Oro, D., Fernandez, C., Segura, C., Martorell, X. & Hernando, J. Accelerating Boosting-based Face Detection on GPUs. Proc. of the 41st International Conference on Parallel Processing (2012). doi:10.1109/ICPP.2012.65
Bertran, R. et al. Accurate Energy Accounting for Shared Virtualized Environments using PMC-based Power Modeling Techniques. (2010).
Ferrer, R., Beltran, V., González, M., Martorell, X. & Ayguadé, E. Analysis of Task Offloading for Accelerators. (2010). at <http://www.springerlink.com/content/978-3-642-11514-1>
Guitart, J., Martorell, X., Torres, J. & Ayguadé, E. Application/Kernel Cooperation Towards the Efficient Execution of Shared-memory Parallel Java Codes. 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS'03) (2003). doi:http://dx.doi.org/10.1109/IPDPS.2003.1213122
Alvanos, M., Tiotto, E., Farreras, M. & Martorell, X. Automatic Communication Coalescing for Irregular Computations in UPC Language. Proc. of the 2012 CASCON conference (2012). at <https://www-927.ibm.com/ibm/cas/cascon/paper.jsp>
González, C., Fernández, M., Jiménez, D., Álvarez, C. & Martorell, X. Automatic generation of application-specific hardware accelerators on OpenSPARC. International Symposium on Code Generation and Optimization (CGO 2011) (2011). at <http://capinfo.e.ac.upc.edu/PDFs/dir28/file003972.pdf>
Duran, A. et al. Automatic Thread Distribution for Nested Parallelism in OpenMP. (2005).
Duran, A., Teruel, X., Ferrer, R., Martorell, X. & Ayguadé, E. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. (2009). at <http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2009.64>
Cabarcas, F. et al. CellSim: A Cell Processor Simulation Infrastructure. 2007 Advanced Computer Architecture and Compilation for Embedded Systems (ACACES-07) 279-282 (2007).
Royuela, S., Duran, A. & Martorell, X. Compiler automatic discovery of OmpSs task dependencies. Proceedings of the workshop on Languages and Compilers for Parallel Computing (2012). at <http://www.kasahara.cs.waseda.ac.jp/lcpc2012/?page_id=98>
Bertran, R., González, M., Martorell, X., Navarro, N. & Ayguadé, E. Decomposable and Responsive Power Models for Multicore Processors using Performance Counters. (2010). at <http://doi.acm.org/10.1145/1810085.1810108>
Álvarez, L. et al. Design space exploration for aggressive core replication schemes in CMPs. Proceedings of the 20th international symposium on High performance distributed computing 269–270 (2011). doi:http://doi.acm.org/10.1145/1996130.1996169
Vujic, N. et al. DMA++: On the Fly Data Realignment for On-Chip Memories. 16th IEEE International Symposium on High-Performance Computer Architecture (2010).
Vujic, N., Álvarez, L., González, M., Martorell, X. & Ayguadé, E. DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories. Proceedings of the 9th conference on Computing Frontiers 113–122 (2012). doi:10.1145/2212908.2212925
Guitart, J., Martorell, X., Torres, J. & Ayguadé, E. Efficient Execution of Parallel Java Applications. 3rd Annual Workshop on Java for High Performance Computing 31-35 (2001). at <http://www.bsc.es/media/395.pdf>
Álvarez, L. et al. Hardware-software coherence protocol for the coexistence of caches and local memories. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 89:1–89:11 (2012). at <http://dl.acm.org/citation.cfm?id=2388996.2389117>
Filgueras, A. et al. Heterogeneous tasking on SMP/FPGA SoCs: the case of OmpSs and the Zynq. 21st IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC) 290–291 (2013).
González, M. et al. Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. (2008). at <http://www.eecg.toronto.edu/pact/>
Bueno-Hedo, J., Badia, R. M., Martorell, X., Ayguadé, E. & Labarta, J. Implementing OmpSs Support for Regions of Data in Architectures with Multiple Address Spaces. 27th International Conference on Supercomputing (ICS) 359–368 (2013).
Alvanos, M., Tiotto, E., Farreras, M. & Martorell, X. Improving communication in PGAS environments: Data prefetching and aggregation in UPC. 20th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2011) (2011). at <https://www-927.ibm.com/ibm/cas/cascon/>
Alvanos, M., Farreras, M., Tiotto, E., Amaral, J. N. & Martorell, X. Improving Communication in PGAS Environments: Static and Dynamic Coalescing in UPC. 27th International Conference on Supercomputing (ICS) 129–138 (2013). doi:10.1145/2464996.2465006
Alvanos, M. et al. Improving Performance of All-to-all Communication Through Loop Scheduling in PGAS Environments. 27th International Conference on Supercomputing (ICS) 457–458 (2013).
Dadvand, P. et al. Migration of a Generic Multi-Physics Framework to HPC Environments. 23rd International Conference on Parallel Computational Fluid Dynamics (2011). at <http://parcfd2011.bsc.es/sites/default/files/abstracts/id124-pooyan.pdf>
Balart, J. et al. A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. (2007). at <http://portal.acm.org/citation.cfm?id=1433050>
Filgueras, A. et al. OmpSs@Zynq All-Programmable SoC Ecosystem. 22nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2014). at <http://www.eecg.utoronto.ca/FPGA2014/>
Cabrera, D., Martorell, X., Gaydadjiev, G. N., Ayguadé, E. & Jiménez-González, D. OpenMP Extensions for FPGA Accelerators. (2009). at <http://portal.acm.org/citation.cfm?id=1812714>
Teruel, X. et al. OpenMP Tasking Analysis for Programmers. (2009). at <http://www-927.ibm.com/ibm/cas/cascon2009/papers.shtml>
Teruel, X. et al. OpenMP Tasks in IBM XL Compilers. (2008). at <http://www-927.ibm.com/ibm/cas/cascon/>
Almasi, G. et al. Optimization of MPI Collective Communication on Blue Gene/L Systems. (2005).
Ródenas, D. et al. Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture. (2005).
Jiménez-González, D., Martorell, X. & Ramirez, A. Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2007) 210-219 (2007).
Bueno, J. et al. Poster: programming clusters of GPUs with OMPSs. Proceedings of the international conference on Supercomputing 378–378 (2011). doi:http://doi.acm.org/10.1145/1995896.1995961
Bueno, J. et al. Productive Cluster Programming with OmpSs. Euro-Par 2011 Parallel Processing 6852, 555-566 (2011).
Bueno-Hedo, J. et al. Productive Programming of GPU Clusters with OmpSs. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012) 557-568 (2012). doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2012.58
Ayguadé, E. et al. A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. (2009).
Bueno, J. et al. Reducing Data Access Latency in SDSM Systems using Runtime Optimizations. 19th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2010) 160-173 (2010). doi:10.1145/1923947.1923965
Costa, J., Cortes, T., Martorell, X., Ayguadé, E. & Labarta, J. Running OpenMP Applications Efficiently on an Everything-shared SDSM. (2004).
Ciobanu, C., Martorell, X., Kuzmanov, G. K., Ramirez, A. & Gaydadjiev, G. N. Scalability Evaluation of a Polymorphic Register File: A CG Case Study. Architecture of Computing Systems - ARCS 2011 13-25 (2011). doi:10.1007/978-3-642-19137-4.
Álvarez, M., Ramirez, A., Martorell, X., Ayguadé, E. & Valero, M. Scalability of Macroblock-level Parallelism for H.264 Decoding. Advanced Computer Architecture and Compilation for Embedded Systems. ACACES 2008, Poster 59-62 (2008).
Teruel, X., Martorell, X., Duran, A., Ferrer, R. & Ayguadé, E. Support for OpenMP Tasks in Nanos v4. (2007). at <http://www-927.ibm.com/ibm/cas/cascon>
Costa, J., Cortes, T., Martorell, X., Bueno-Hedo, J. & Ayguadé, E. Transient Congestion Avoidance in Software Distributed Shared Memory Systems. (2010). at <http://doi.ieeecomputersociety.org/10.1109/PDCAT.2010.32>

Pages