Publications
Primary tabs
Publications
“RICH: Implementing Reductions in the Cache Hierarchy”, Proceedings of the 34th ACM International Conference on Supercomputing (ICS'20). pp. 1 - 13, 2020. ,
“Data Prefetching on In-order Processors”, 2018 International Conference on High Performance Computing & Simulation (HPCS). pp. 322 - 329, 2018. ,
“Evaluating Scientific Workflow Execution on an Asymmetric Multicore Processor”, Lecture Notes in Computer ScienceEuro-Par 2017: Parallel Processing Workshops, vol. 10659. pp. 439 - 451, 2018. ,
“Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies”, Proceedings of the 2018 International Conference on Supercomputing - ICS '18. pp. 207 - 217, 2018. ,
“Runtime-assisted Cache Coherence Deactivation in Task Parallel Programs”, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). Piscataway, NJ, USA, pp. 35:1–35:12, 2018. ,
“Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs”, Proceedings of the 2018 International Conference on Supercomputing - ICS '18. pp. 218 - 228, 2018. ,
“ATM: Approximate Task Memoization in the Runtime System”, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). Orlando, FL, USA, pp. 1140 - 1150, 2017. ,
“General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models”, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). pp. 244 - 253, 2017. ,
“libPRISM: An Intelligent Adaption of Prefetch and SMT Levels”, Proceedings of the 31st ACM International Conference on Supercomputing (ICS). 2017. ,
“Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals”, Lecture Notes in Computer ScienceEuro-Par 2017: Parallel Processing, vol. 10417. pp. 247 - 259, 2017. ,
“To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy”, Lecture Notes in Computer ScienceEuro-Par 2017: Parallel Processing, vol. 10417. pp. 710 - 722, 2017. ,
“An Integrated Vector-Scalar Design on an In-Order ARM Core”, ACM Transactions on Architecture and Code Optimization, vol. 14. pp. 1 - 26, 2017. ,
“An Integrated Vector-Scalar Design on an In-Order ARM Core”, ACM Transactions on Architecture and Code Optimization, vol. 14. pp. 1 - 26, 2017. ,
“Prediction of the impact of network switch utilization on application performance via active measurement”, Parallel Computing, vol. 67. pp. 38 - 56, 2017. ,
“Task scheduling techniques for asymmetric multi-core systems”, IEEE Transactions on Parallel and Distributed Systems , vol. 28. IEEE, pp. 2074-2087, 2017. ,
“CATA: Criticality Aware Task Acceleration for Multicore Processors”, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, Chicago, IL, USA, pp. 413-422, 2016. ,
“Future Vector Microprocessor Extensions for Data Aggregations”, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, Seoul, South Korea, pp. 418-430, 2016. ,
“MUSA: A Multi-level Simulation Approach for Next-generation HPC Machines”, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Salt Lake City, Utah, pp. 45:1–45:12, 2016. ,
“Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models”, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, Uppsala, Sweden, pp. 225 - 234, 2016. ,
“POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware”, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16. Haifa, Israel , pp. 415 - 417, 2016. ,