Naveen Cherukuri
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Naveen Cherukuri.
ieee international conference on high performance computing, data, and analytics | 2009
Shoumeng Yan; Xiaocheng Zhou; Ying Gao; Hu Chen; Sai Luo; Peinan Zhang; Naveen Cherukuri; Ronny Ronen; Bratin Saha
Small scale chip multiprocessors are being shipped in volume by all microprocessor vendors. Many of these vendors are also investigating large scale chip multiprocessors targeted towards highly parallel workloads in media, graphics, and others. One of the most challenging aspects of architecting terascale processors is the design of a scalable memory hierarchy. Current proposals for providing coherent shared memory in terascale systems require a sophisticated coherence protocol and memory hierarchy. In this paper we propose an alternate memory configuration along with a programming model that significantly simplifies the terascale memory hierarchy. Our proposal still provides fully coherent shared memory but eliminates the hardware coherence protocol. Our programming model enables the programmer to better express the memory characteristic of terascale workloads. Finally, our proposed memory hierarchy performs better and is more scalable than conventional designs.
ieee international conference on high performance computing data and analytics | 2017
Sudheer Chunduri; Kevin Harms; Scott Parker; Vitali A. Morozov; Samuel Oshin; Naveen Cherukuri; Kalyan Kumaran
The increasing complexity of HPC systems has introduced new sources of variability, which can contribute to significant differences in run-to-run performance of applications. With components at various levels of the system contributing variability, application developers and system users are now faced with the difficult task of running and tuning their applications in an environment where run-to-run performance measurements can vary by as much as a factor of two to three. In this study, we classify, quantify, and present ways to mitigate the sources of run-to-run variability on Cray XC systems with Intel Xeon Phi processors and a dragonfly interconnect. We further demonstrate that the code-tuning performance observed in a variability-mitigating environment correlates with the performance observed in production running conditions.
Archive | 2004
Naveen Cherukuri; Aaron T. Spink; Phanindra K. Mannava; Tim Frodsham; Jeffrey R. Wilcox; Sanjay Dabral; David S. Dunning; Theodore Z. Schoenborn
Archive | 2007
Naveen Cherukuri; Jeffrey R. Wilcox; Sanjay Dabral; Phanindra K. Mannava; Aaron T. Spink; David S. Dunning; Tim Frodsham; Theodore Z. Schoenborn
Archive | 2004
Tim Frodsham; Michael J. Tripp; David J. O'Brien; Muraleedhara Navada; Naveen Cherukuri; Sanjay Dabral; David S. Dunning; Theodore Z. Schoenborn
Archive | 2004
Naveen Cherukuri; Sanjay Dabral; David S. Dunning; Tim Frodsham; Theodore Z. Schoenborn; Rahul R. Shah; Maurice B. Steinman
Archive | 2004
Maurice B. Steinman; Rahul R. Shah; Naveen Cherukuri; Aaron T. Spink; Allen J. Baum; Sanjay Dabral; Tim Frodsham; David S. Dunning; Theodore Z. Schoenborn
Archive | 2009
Naveen Cherukuri; Dennis W. Brzezinski; Ioannis Schoinas; Anahita Shayesteh; Akhilesh Kumar; Mani Azimi
Archive | 2004
Naveen Cherukuri; Sanjay Dabral; Davis S Dunning; Tim Frodsham; Theodore Z. Schoenborn
Archive | 2004
Naveen Cherukuri; Sanjay Dabral; David S. Dunning; Tim Frodsham; Theodore Z. Schoenborn; Santanu Chaudhuri