Export 151 results:Sort by: Author Title Type [ Year]
Filters: Author is Alex Ramirez [Clear All Filters]
The low-power architecture approach towards exascale computing. Journal of Computational Science (2013).doi:http://dx.doi.org/10.1016/j.jocs.2013.01.002
Parallelizing general histogram application for CUDA architectures. Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIII), 2013 International Conference on 11-18 (2013).doi:10.1109/SAMOS.2013.6621100
Power/Performance evaluation of Energy Efficient Ethernet (EEE) for High Performance Computing. IEEE International Symposium on Performance Analysis of Systems and Software - ISPASS 2013 (2013).
Programmable and Scalable Reductions on Clusters. Proceedings of 27th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS) (2013).
Trace filtering of multithreaded applications for CMP memory simulation. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 134--135 (2013).
DMA++: On the Fly Data Realignment for On-Chip Memories. Computers, IEEE Transactions on 61, 237 -250 (2012).
Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster. Journal of Computational Physics 237, 132--150 (2012).
Hybrid/Heterogeneous Programming with OmpSs and its Software/Hardware Implications. Programming Multi-Core and Many-Core Computing Systems (Wiley Series on Parallel and Distributed Computing) (2012).at <http://www.par.univie.ac.at/~pllana/manycore_book/>
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem. International Symposium on Microarchitecture (MICRO-45) (2012).at <http://capinfo.e.ac.upc.edu/PDFs/dir01/file004119.pdf>
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem . International Symposium on Microarchitecture (MICRO-45) (2012).
Prediction of regulatory regions using ReLA". 16th Annual International Conference on Research in Computational Molecular Biology. 16th Annual International Conference on Research in Computational Molecular Biology, RECOMB (2012).
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors. Transactions on HiPEAC 5, (2011).
DiDi: Mitigating The Performance Impact of TLB Shootdowns Using A Shared TLB Directory. Parallel Architectures and Compilation Techniques (PACT) (2011).
Parameterizing Multicore Architectures for Multiple Sequence Alignment. 2011 International Conference on Computing Frontiers (2011).
Scalability Evaluation of a Polymorphic Register File: A CG Case Study. Architecture of Computing Systems - ARCS 2011 13-25 (2011).doi:10.1007/978-3-642-19137-4.
Scalable multicore architectures for long DNA sequence comparison. Concurrency and Computation Practice and Experience 23, (2011).
Simulating Whole Supercomputer Applications. IEEE Micro 31, 32-45 (2011).
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors. International conference on High-Performance Embedded Architectures and Compilers (HiPEAC) 2010 96-110 (2010).
Can Manycores Support the Memory Requirements of Scientific Applications?. Workshop on Applications for Multi and Many Core Processors (A4MMC) (2010).
Comparing last-level cache designs for CMP architectures. IFMT '10: International Forum on Next-Generation Multicore/Manycore Technologies (2010).
DMA++: On the Fly Data Realignment for On-Chip Memories. 16th IEEE International Symposium on High-Performance Computer Architecture (2010).
FlexDCP: a QoS framework for CMP architectures. ACM Operating Systems Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 86-96 (2010).
Interleaving Granularity on High Bandwidth Memory Architecture for CMPs. Intl. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS X) 250-257 (2010).at <http://dx.doi.org/10.1109/ICSAMOS.2010.5642060>
Long DNA Sequence Comparison on Multicore Architectures. 16th international Euro-Par conference on Parallel processing (2010).at <http://dx.doi.org/10.1007/978-3-642-15291-7_24>
Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture. 4CCC. 4th Colombian Computing Conference, Bucaramanga (Colombia) (2010).
The SARC Architecture. IEEE Micro 30, 16-29 (2010).
Scalability Analysis of Progressive Alignment in a Multicore. International Workshop on Multi-Core Computing Systems (MuCoCoS 2010) (2010).
Scalability Analysis of Progressive Alignment on a Multicore. Fourth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS '10) 889-894 (2010).at <http://dx.doi.org/10.1109/CISIS.2010.149>
Starsscheck: a tool to find errors in task-based parallel programs. 16th international Euro-Par conference on Parallel processing 2-13 (2010).at <http://portal.acm.org/citation.cfm?id=1887695.1887698>
Task Superscalar: An Out-of-Order Task Pipeline. IEEE/ACM Intl. Symp. on Microarchitecture (MICRO-43) 89-100 (2010).at <http://dx.doi.org/10.1109/MICRO.2010.13>
Task Superscalar: Using Processors as Functional Units. USENIX Workshop on Hot Topics In Parallelism (HotPar) (2010).
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors. IX International Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS Workshop IX) 12-23 (2009).
Available task-level parallelism on the Cell BE. Scientific Programming 17, 59-76 (2009).
Cores as Functional Units: A Task-Based, Out-of-Order, Dataflow Pipeline. Advanced Computer Architecture and Compilation for Embedded Systems (ACACES) (2009).
DIA: A Complexity-Effective Decoding Architecture. IEEE Transactions on Computers 58, 448-462 (2009).
Exploiting Different Levels of Parallelism in the Biological Sequence Comparison Problem. 4CCC. 4th Colombian Computing Conference (2009).
FlexDCP: a QoS framework for CMP architectures. ACM SIGOPS Operating System Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 0163-5980 (2009).
A Highly Scalable Parallel Implementation of H.264. Transactions on High-Performance Embedded Architectures and Compilers 4, (2009).
Mapping stream programs onto heterogeneous multiprocessor systems. International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES 2009) 57-66 (2009).
Parallel H.264 Decoding on an Embedded Multicore Processor. 4th International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC'09) 404-418 (2009).
Parallel Scalability of Video Decoders. Journal of Signal Processing Systems 57, 173-194 (2009).
Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture. Avances en Sistemas e Informática 6, 219-228 (2009).
Quantitative analysis of sequence alignment applications on multiprocessor architectures. 6th ACM conference on Computing frontiers 61-70 (2009).
Scalability of Macroblock-level parallelism for H.264 decoding. The Fifteenth International Conference on Parallel and Distributed Systems (ICPADS'09) (2009).
Thread to Core Assignment in SMT On-Chip Multiprocessors. 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'09) (2009).
3D Die-Stacking Architectures: State of the Art. Advanced Computer Architecture and Compilation for Embedded Systems. ACACES 2008 203-207 (2008).
Analysis of Video Filtering on the Cell Processor. 2008 IEEE International Symposium on Circuits and Systems (ISCAS'08) 488-491 (2008).