Export 148 results:Sort by: Author Title Type [ Year]
Filters: Author is Alex Ramirez [Clear All Filters]
Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications. Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS'08) 53-64 (2008).
Scalability of Macroblock-level Parallelism for H.264 Decoding. Advanced Computer Architecture and Compilation for Embedded Systems. ACACES 2008, Poster 59-62 (2008).
Task Management Analysis on the Cell BE. XIX Jornadas de Paralelismo, pp. 271-276, Castellón (Spain) 271-276 (2008).
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors. IX International Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS Workshop IX) 12-23 (2009).
Available task-level parallelism on the Cell BE. Scientific Programming 17, 59-76 (2009).
Cores as Functional Units: A Task-Based, Out-of-Order, Dataflow Pipeline. Advanced Computer Architecture and Compilation for Embedded Systems (ACACES) (2009).
DIA: A Complexity-Effective Decoding Architecture. IEEE Transactions on Computers 58, 448-462 (2009).
Exploiting Different Levels of Parallelism in the Biological Sequence Comparison Problem. 4CCC. 4th Colombian Computing Conference (2009).
FlexDCP: a QoS framework for CMP architectures. ACM SIGOPS Operating System Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 0163-5980 (2009).
A Highly Scalable Parallel Implementation of H.264. Transactions on High-Performance Embedded Architectures and Compilers 4, (2009).
Mapping stream programs onto heterogeneous multiprocessor systems. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES 2009) 57-66 (2009).
Parallel H.264 Decoding on an Embedded Multicore Processor. 4th International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC'09) 404-418 (2009).
Parallel Scalability of Video Decoders. Journal of Signal Processing Systems 57, 173-194 (2009).
Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture. Avances en Sistemas e Informática 6, 219-228 (2009).
Quantitative analysis of sequence alignment applications on multiprocessor architectures. 6th ACM conference on Computing frontiers 61-70 (2009).
Scalability of Macroblock-level parallelism for H.264 decoding. The Fifteenth International Conference on Parallel and Distributed Systems (ICPADS'09) (2009).
Thread to Core Assignment in SMT On-Chip Multiprocessors. 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'09) (2009).
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. International conference on High-Performance Embedded Architectures and Compilers (HiPEAC) 2010 96-110 (2010).
Can Manycores Support the Memory Requirements of Scientific Applications?. Workshop on Applications for Multi and Many Core Processors (A4MMC) (2010).
Comparing last-level cache designs for CMP architectures. IFMT '10: International Forum on Next-Generation Multicore/Manycore Technologies (2010).
DMA++: On the Fly Data Realignment for On-Chip Memories. 16th IEEE International Symposium on High-Performance Computer Architecture (2010).
FlexDCP: a QoS framework for CMP architectures. ACM Operating Systems Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 86-96 (2010).
Interleaving Granularity on High Bandwidth Memory Architecture for CMPs. Intl. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS X) 250-257 (2010).at <http://dx.doi.org/10.1109/ICSAMOS.2010.5642060>
Long DNA Sequence Comparison on Multicore Architectures. 16th international Euro-Par conference on Parallel processing (2010).at <http://dx.doi.org/10.1007/978-3-642-15291-7_24>
Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture. 4CCC. 4th Colombian Computing Conference, Bucaramanga (Colombia) (2010).
The SARC Architecture. IEEE Micro 30, 16-29 (2010).
Scalability Analysis of Progressive Alignment in a Multicore. International Workshop on Multi-Core Computing Systems (MuCoCoS 2010) (2010).
Scalability Analysis of Progressive Alignment on a Multicore. Fourth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS '10) 889-894 (2010).at <http://dx.doi.org/10.1109/CISIS.2010.149>
Starsscheck: a tool to find errors in task-based parallel programs. 16th international Euro-Par conference on Parallel processing 2-13 (2010).at <http://portal.acm.org/citation.cfm?id=1887695.1887698>
Task Superscalar: An Out-of-Order Task Pipeline. IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-43) 89-100 (2010).at <http://dx.doi.org/10.1109/MICRO.2010.13>
Task Superscalar: Using Processors as Functional Units. USENIX Workshop on Hot Topics In Parallelism (HotPar) (2010).
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors. Transactions on HiPEAC 5, (2011).
DiDi: Mitigating The Performance Impact of TLB Shootdowns Using A Shared TLB Directory. Parallel Architectures and Compilation Techniques (PACT) (2011).
Parameterizing Multicore Architectures for Multiple Sequence Alignment. 2011 International Conference on Computing Frontiers (2011).
Scalability Evaluation of a Polymorphic Register File: A CG Case Study. Architecture of Computing Systems - ARCS 2011 13-25 (2011).doi:10.1007/978-3-642-19137-4.
Scalable multicore architectures for long DNA sequence comparison. Concurrency and Computation Practice and Experience 23, (2011).
Simulating Whole Supercomputer Applications. IEEE Micro 31, 32-45 (2011).
DMA++: On the Fly Data Realignment for On-Chip Memories. Computers, IEEE Transactions on 61, 237 -250 (2012).
Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster. Journal of Computational Physics 237, 132--150 (2012).
Hybrid/Heterogeneous Programming with OmpSs and its Software/Hardware Implications. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (2012).at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem . International Symposium on Microarchitecture (MICRO-45) (2012).
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem. International Symposium on Microarchitecture (MICRO-45) (2012).at <http://capinfo.e.ac.upc.edu/PDFs/dir01/file004119.pdf>
Prediction of regulatory regions using ReLA". 16th Annual International Conference on Research in Computational Molecular Biology. 16th Annual International Conference on Research in Computational Molecular Biology, RECOMB (2012).
The low-power architecture approach towards exascale computing. Journal of Computational Science (2013).doi:http://dx.doi.org/10.1016/j.jocs.2013.01.002
Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. IEEE International Symposium on Performance Analysis of Systems and Software - ISPASS 2013 (2013).
Programmable and Scalable Reductions on Clusters. Proceedings of 27th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS) (2013).