Trace Filtering of Multithreaded Applications for CMP Memory Simulation. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2013) 134–135 (2013).
Trace-driven simulation of multithreaded applications. 2011 IEEE International Symposium on Performance Analysis of Systems and Software 87--96 (2011).
Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory. Parallel Architectures and Compilation Techniques (PACT) 360–370 (2011).
Vector Extensions for Decision Support DBMS Acceleration. The 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO45) 166-176 (2012). doi:10.1109/MICRO.2012.24
An Analyzable Memory Controller for Hard Real-Time CMPs. IEEE Embedded Systems Letters 1, (2009).
Assessing Accelerator-based HPC Reverse Time Migration. Transactions on Parallel and Distributed Systems, Special Issue on Accelerators 22(1), 147-162 (2011).
Available task-level parallelism on the Cell BE. Scientific Programming 17, 59-76 (2009).
Better branch prediction through prophet/critic hybrids. IEEE Micro 25, 80-89 (2005).
A Case for Energy-Aware Accounting and Billing in Large-Scale Computing Facilities Cost Metrics and Design Implications. IEEE Micro (2011).
Characterizing Power and Temperature Behavior of POWER6-Based System. (invited paper). IEEE Journal of Emerging and Selected Topics in Circuits and Systems (2011).
CPU Accounting for Multicore Processors. IEEE Transactions on Computers 61, 251–264 (2012).
CPU accounting in CMP Processors. (2009).
DIA: A Complexity-Effective Decoding Architecture. IEEE Transactions on Computers 58, 448-462 (2009).
Dynamic Cache Partitioning Based on the MLP on Cache Misses. Transactions on HiPEAC 3, 1-21 (2008).
Dynamic Memory Instruction Bypassing. (2004).
Dynamic Memory Interval Test vs. Interprocedural Pointer Analiysis in Multimedia Applications. (2005).
Enlarging Instruction Streams. IEEE Transactions on Computers 56, 1342-1357 (2007).
Explaining Dynamic Cache Partitioning Speed Ups. IEEE Computer Architecture Letters 6, 1-12 (2007).
Fair CPU Time Accounting in CMP+SMT Processors. ACM Trans. Archit. Code Optim. 9, 50:1–50:25 (2013).
Feasibility of QoS for SMT by Resource Allocation. Lecture Notes in Computer Science (LNCS) 3149/2004, (2004).
FlexDCP: a QoS framework for CMP architectures. ACM Operating Systems Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 86-96 (2010).
FlexDCP: a QoS framework for CMP architectures. ACM SIGOPS Operating System Review, Special Issue on the Interaction among the OS, Compilers, and Multicore Processors 43, 0163-5980 (2009).
Hardware Support for Accurate Per-task Energy Metering in Multicore Systems. ACM Trans. Archit. Code Optim. 10, 34:1–34:27 (2013).
A Highly Scalable Parallel Implementation of H.264. Transactions on High-Performance Embedded Architectures and Compilers 4, (2009).
Instruction Fetch Architectures and Code Layout Optimizations. Proceedings of the IEEE 89, 1588-1609 (2001).
Kilo-instruction Processors: Overcoming the Memory Wall. IEEE Micro 25, 48–57 (2005).
A latency conscious SMT branch predictor architecture. International Journal of High Performance Computing and Networking (IJHPCN) 2, 11-21 (2004).
A Low Complexity Fetch Architecture for High Performance Superscalar Processors. ACM Transactions on Architecture and Compiler Optimizations (TACO) 1, 220-245 (2004).
A Low-Complexity Fetch Architecture for High-Performance Superscalar Processors. ACM Transactions on Architecture and Code Optimization 1, 220-245 (2004).
Multicore Resource Management. IEEE Micro 28, 6-16 (2008).
Optimizing Long-Latency-Load-Aware Fetch Policies for SMT Processors. International Journal of High Performance Computing and Networking (IJHPCN) 2, (2004).
Performance Evaluation of Macroblock-level Parallelization of H.264 Decoding on a cc-NUMA Multiprocessor Architecture. Avances en Sistemas e Informática 6, 219-228 (2009).
Predictable Performance in SMT processors: Synergy Between the OS and SMTs. IEEE Transactions on Computers 55, 785-799 (2006).
On the Problem of Evaluating the Performance of Multiprogrammed Workloads. . IEEE Transactions on Computers 59, (2010).
Programmability and portability for exascale: Top down programming methodology and tools with StarSs. Journal of Computational Science 4, 450–456 (2013).
QoS for High-Performance SMT Processors in Embedded Systems. IEEE Micro 24, 24-31 (2004).