MARTORELL BOFILL, XAVIER
Primary tabs
Implementing OmpSs Support for Regions of Data in Architectures with Multiple Address Spaces. Proc. of the 27th International Conference on Supercomputing (2013).at <http://www.ics-conference.org/>
Accelerating Boosting-based Face Detection on GPUs. Proc. of the 41st International Conference on Parallel Processing (2012).doi:10.1109/ICPP.2012.65
Automatic Communication Coalescing for Irregular Computations in UPC Language. Proc. of the 2012 CASCON conference (2012).at <https://www-927.ibm.com/ibm/cas/cascon/paper.jsp>
Compiler automatic discovery of OmpSs task dependencies. Proceedings of the workshop on Languages and Compilers for Parallel Computing (2012).at <http://www.kasahara.cs.waseda.ac.jp/lcpc2012/?page_id=98>
Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up. The Computer Journal (2012).doi:10.1093/comjnl/bxs116
DMA++: On the Fly Data Realignment for On-Chip Memories. Computers, IEEE Transactions on 61, 237 -250 (2012).
DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories. Proceedings of the 9th conference on Computing Frontiers 113–122 (2012).doi:10.1145/2212908.2212925
Energy accounting for shared virtualized environments under DVFS using PMC-based power models. Future Generation Computer Systems 28, 457 - 468 (2012).
Extending OpenMP* with vector constructs for modern multicore SIMD architectures. Proceedings of the 8th international conference on OpenMP in a Heterogeneous World 59–72 (2012).doi:10.1007/978-3-642-30961-8_5
Hardware-software coherence protocol for the coexistence of caches and local memories. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 89:1–89:11 (2012).at <http://dl.acm.org/citation.cfm?id=2388996.2389117>
Hybrid/Heterogeneous Programming with OmpSs and its Software/Hardware Implications. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (2012).at <http://www.par.univie.ac.at/~pllana/manycore_book/>
On the Instrumentation of OpenMP and OmpSs Tasking Constructs. Euro-Par 2012: Parallel Processing Workshops. Lecture Notes in Computer Science 7640, 414-428 (2012).
Productive Programming of GPU Clusters with OmpSs. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012) 557-568 (2012).doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2012.58
Automatic Generation and Testing of Application Specific Hardware Accelerators on a New Reconfigurable OpenSPARC Platform. 5th HiPEAC Workshop on Reconfigurable Computing (WRC 2011) 85-94 (2011).
Automatic generation of application-specific hardware accelerators on OpenSPARC. International Symposium on Code Generation and Optimization (CGO 2011) (2011).at <http://capinfo.e.ac.upc.edu/PDFs/dir28/file003972.pdf>
Compiler Analysis for Improving OpenMP Code Generation. ACACES 2011. Poster Abstracts. Advanced Computer Architecture and Compilation for Embedded Systems (2011).at <http://www.hipeac.net/summerschool>
Design space exploration for aggressive core replication schemes in CMPs. Proceedings of the 20th international symposium on High performance distributed computing 269–270 (2011).doi:http://doi.acm.org/10.1145/1996130.1996169
Energy accounting for shared virtualized environments under DVFS using PMC-based power models. Future Generation Computer Systems 28, 457 - 468 (2011).
Improving communication in PGAS environments: Data prefetching and aggregation in UPC. 20th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2011) (2011).at <https://www-927.ibm.com/ibm/cas/cascon/>
Mercurium: Design Decisions for a S2S Compiler. Cetus Users and Compiler Infastructure Workshop in conjunction with PACT 2011 (2011).
Migration of a Generic Multi-Physics Framework to HPC Environments. 23rd International Conference on Parallel Computational Fluid Dynamics (2011).at <http://parcfd2011.bsc.es/sites/default/files/abstracts/id124-pooyan.pdf>
OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES. Parallel Processing Letters 21, 173-193 (2011).
Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL. Lecture Notes in Computer Science 6548/2011, 215-229 (2011).
Poster: programming clusters of GPUs with OMPSs. Proceedings of the international conference on Supercomputing 378–378 (2011).doi:http://doi.acm.org/10.1145/1995896.1995961
Scalability Evaluation of a Polymorphic Register File: A CG Case Study. Architecture of Computing Systems - ARCS 2011 13-25 (2011).doi:10.1007/978-3-642-19137-4.
User-directed Auto-vectorization in OmpSs. ACACES 2011. Poster Abstracts. Advanced Computer Architecture and Compilation for Embedded Systems (2011).at <http://www.hipeac.net/summerschool>
5th Int. Conf. on High Performance Embedded Architectures and Compilers (HiPEAC 2010). (2010).at <http://www.informatik.uni-trier.de/ ley/db/conf/hipeac/hipeac2010.html>
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming Harm Munk. International Journal of Parallel Programming 39, 397-450 (2010).
Analysis of Task Offloading for Accelerators. (2010).at <http://www.springerlink.com/content/978-3-642-11514-1>
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture. (2010).at <http://doi.ieeecomputersociety.org/10.1109/TPDS.2009.97>
Decomposable and Responsive Power Models for Multicore Processors using Performance Counters. (2010).at <http://doi.acm.org/10.1145/1810085.1810108>
DMA++: On the Fly Data Realignment for On-Chip Memories. 16th IEEE International Symposium on High-Performance Computer Architecture (2010).
GPFPGA: entorno para la generación automática de códigos HDL portables entre FPGAs. (2010).at <http://jcraconf.org/JCRA2010/>
Harmonizing serial optimizations with OpenMP. (2010).at <http://www.complang.tuwien.ac.at/cpc10/program.html>
Local Memory Design Space Exploration for High-Performance Computing. (2010).at <http://comjnl.oxfordjournals.org/content/early/2010/03/23/comjnl.bxq026.full.pdf+html>
Reducing Data Access Latency in SDSM Systems using Runtime Optimizations. 19th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2010) 160-173 (2010).doi:10.1145/1923947.1923965
Achieving High Memory Performance from Heterogeneous Architectures with the SARC Programming Model. (2009).at <http://portal.acm.org/citation.cfm?id=1621963>
Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories. (2009).at <http://nanos.ac.upc.edu/content/adaptive-and-speculative-memory-consistency-support-multi-core-architectures-chip-local-memo>
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. (2009).at <http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2009.64>


