Publications
Export 88 results:
Sort by: Author Title [ Type
] Year Filters: Author is Xavier Martorell [Clear All Filters]
Hybrid/Heterogeneous Programming with OmpSs and its Software/Hardware Implications. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (2012).at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Transactional Memory and OpenMP. International Workshop on OpenMP (IWOMP-2007) 37–53 (2007).at <http://capinfo.e.ac.upc.edu/PDFs/dir05/file003195.pdf>
5th Int. Conf. on High Performance Embedded Architectures and Compilers (HiPEAC 2010). (2010).at <http://www.informatik.uni-trier.de/ ley/db/conf/hipeac/hipeac2010.html>
Accelerating Boosting-based Face Detection on GPUs. Proc. of the 41st International Conference on Parallel Processing (2012).doi:10.1109/ICPP.2012.65
Analysis of Task Offloading for Accelerators. (2010).at <http://www.springerlink.com/content/978-3-642-11514-1>
Application/Kernel Cooperation Towards the Efficient Execution of Shared-memory Parallel Java Codes. 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS'03) (2003).doi:http://dx.doi.org/10.1109/IPDPS.2003.1213122
Automatic Communication Coalescing for Irregular Computations in UPC Language. Proc. of the 2012 CASCON conference (2012).at <https://www-927.ibm.com/ibm/cas/cascon/paper.jsp>
Automatic generation of application-specific hardware accelerators on OpenSPARC. International Symposium on Code Generation and Optimization (CGO 2011) (2011).at <http://capinfo.e.ac.upc.edu/PDFs/dir28/file003972.pdf>
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. (2009).at <http://www.computer.org/portal/web/csdl/doi/10.1109/ICPP.2009.64>
CellSim: A Cell Processor Simulation Infrastructure. 2007 Advanced Computer Architecture and Compilation for Embedded Systems (ACACES-07) 279-282 (2007).
Compiler automatic discovery of OmpSs task dependencies. Proceedings of the workshop on Languages and Compilers for Parallel Computing (2012).at <http://www.kasahara.cs.waseda.ac.jp/lcpc2012/?page_id=98>
Decomposable and Responsive Power Models for Multicore Processors using Performance Counters. (2010).at <http://doi.acm.org/10.1145/1810085.1810108>
Design space exploration for aggressive core replication schemes in CMPs. Proceedings of the 20th international symposium on High performance distributed computing 269–270 (2011).doi:http://doi.acm.org/10.1145/1996130.1996169
DMA++: On the Fly Data Realignment for On-Chip Memories. 16th IEEE International Symposium on High-Performance Computer Architecture (2010).
DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories. Proceedings of the 9th conference on Computing Frontiers 113–122 (2012).doi:10.1145/2212908.2212925
Efficient Execution of Parallel Java Applications. 3rd Annual Workshop on Java for High Performance Computing 31-35 (2001).at <http://www.bsc.es/media/395.pdf>
Hardware-software coherence protocol for the coexistence of caches and local memories. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 89:1–89:11 (2012).at <http://dl.acm.org/citation.cfm?id=2388996.2389117>
Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. (2008).at <http://www.eecg.toronto.edu/pact/>
Improving communication in PGAS environments: Data prefetching and aggregation in UPC. 20th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2011) (2011).at <https://www-927.ibm.com/ibm/cas/cascon/>
Migration of a Generic Multi-Physics Framework to HPC Environments. 23rd International Conference on Parallel Computational Fluid Dynamics (2011).at <http://parcfd2011.bsc.es/sites/default/files/abstracts/id124-pooyan.pdf>
A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor. (2007).at <http://portal.acm.org/citation.cfm?id=1433050>
OpenMP Tasking Analysis for Programmers. (2009).at <http://www-927.ibm.com/ibm/cas/cascon2009/papers.shtml>
Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2007) 210-219 (2007).
Poster: programming clusters of GPUs with OMPSs. Proceedings of the international conference on Supercomputing 378–378 (2011).doi:http://doi.acm.org/10.1145/1995896.1995961
Productive Programming of GPU Clusters with OmpSs. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012) 557-568 (2012).doi:http://doi.ieeecomputersociety.org/10.1109/IPDPS.2012.58
Reducing Data Access Latency in SDSM Systems using Runtime Optimizations. 19th Annual International Conference hosted by the Centre for Advanced Studies & Research (CASCON 2010) 160-173 (2010).doi:10.1145/1923947.1923965
Scalability Evaluation of a Polymorphic Register File: A CG Case Study. Architecture of Computing Systems - ARCS 2011 13-25 (2011).doi:10.1007/978-3-642-19137-4.
Scalability of Macroblock-level Parallelism for H.264 Decoding. Advanced Computer Architecture and Compilation for Embedded Systems. ACACES 2008, Poster 59-62 (2008).
ACOTES Project: Advanced Compiler Technologies for Embedded Streaming Harm Munk. International Journal of Parallel Programming 39, 397-450 (2010).
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture. (2010).at <http://doi.ieeecomputersociety.org/10.1109/TPDS.2009.97>
Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up. The Computer Journal (2012).doi:10.1093/comjnl/bxs116
DMA++: On the Fly Data Realignment for On-Chip Memories. Computers, IEEE Transactions on 61, 237 -250 (2012).
Energy accounting for shared virtualized environments under DVFS using PMC-based power models. Future Generation Computer Systems 28, 457 - 468 (2012).
Energy accounting for shared virtualized environments under DVFS using PMC-based power models. Future Generation Computer Systems 28, 457 - 468 (2011).


