Srinivas Chennupaty | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Srinivas Chennupaty is active.

Explore More

Publication

Featured researches published by Srinivas Chennupaty.

international symposium on computer architecture | 2010

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Victor W. Lee; Changkyu Kim; Jatin Chhugani; Michael E. Deisher; Daehyun Kim; Anthony D. Nguyen; Nadathur Satish; Mikhail Smelyanskiy; Srinivas Chennupaty; Per Hammarlund; Ronak Singhal; Pradeep Dubey

Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. Our analysis of a set of important throughput computing kernels shows that there is an ample amount of parallelism in these kernels which makes them suitable for todays multi-core CPUs and GPUs. In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average. In this paper, we discuss optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance differences between the two architectures, and recommend a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

IEEE Micro | 2014

Haswell: The Fourth-Generation Intel Core Processor

Per Hammarlund; Alberto J. Martinez; Atiq Bajwa; David L. Hill; Erik G. Hallnor; Hong Jiang; Martin G. Dixon; Michael N. Derr; Mikal C. Hunsaker; Rajesh Kumar; Randy B. Osborne; Ravi Rajwar; Ronak Singhal; Reynold V. D'Sa; Robert S. Chappell; Shiv Kaushik; Srinivas Chennupaty; Stephan J. Jourdan; Steve H. Gunther; Thomas A. Piazza; Ted Burton

Haswell, Intels fourth-generation core processor architecture, delivers a range of client parts, a converged core for the client and server, and technologies used across many products. It uses an optimized version of Intel 22-nm process technology. Haswell provides enhancements in power-performance efficiency, power management, form factor and cost, core and uncore microarchitecture, and the cores instruction set.

international solid-state circuits conference | 2014

5.9 Haswell: A family of IA 22nm processors

Nasser A. Kurd; Muntaquim Chowdhury; Edward A. Burton; Thomas P. Thomas; Christopher P. Mozak; Brent R. Boswell; Manoj B. Lal; Anant Deval; Jonathan P. Douglas; Mahmoud Elassal; Ankireddy Nalamalpu; Timothy M. Wilson; Matthew C. Merten; Srinivas Chennupaty; Wilfred Gomes; Rajesh Kumar

The 4th Generation Intel® Core™ processor, codenamed Haswell, is a family of products implemented on Intel 22nm Tri-gate process technology [1]. The primary goals for the Haswell program are platform integration and low power to enable smaller form factors. Haswell incorporates several building blocks, including: platform controller hubs (PCHs), memory, CPU, graphics and media processing engines, thus creating a portfolio of product segments from fan-less Ultrabooks™ to high-performance desktop, as shown in Fig. 5.9.1. It also integrates a number of new technologies: a fully integrated voltage regulator (VR) consolidating 5 platform VRs down to 1, on-die eDRAM cache for improved graphics performance, lower-power states, optimized IO interfaces, an Intel AVX2 instruction set that supports floating-point multiply-add (FMA), and 256b SIMD integer achieving 2× the number of floating-point and integer operations over its predecessor. The 22nm process is optimized for Haswell and includes 11 metal layers (2 additional metal layers vs. Ivy Bridge [2]), high-density metal-insulator-metal (MIM) capacitors, and is tuned for different leakage/speed targets based on the market segment. For example, in some low-power products, the process is optimized to reduce leakage by 75% at Vmin, while paying only 12% intrinsic device degradation at the high-voltage corner.

Archive | 1998

Instruction set extension using prefixes

Srinivas Chennupaty; Lance E. Hacking; Thomas Huff; Patrice Roussel; Shreekant S. Thakkar

Archive | 1998

Dual function system and method for shuffling packed data elements

Patrice Roussel; Srinivas Chennupaty; Micheal D. Cranford; Mohammed A F Abdallah; James S. Coke; Katherine Kong

Archive | 2009

Gathering and Scattering Multiple Data Elements

Christopher J. Hughes; Yen-Kuang Chen; Mayank Bomb; Jason W. Brandt; Mark J. Buxton; Mark J. Charney; Srinivas Chennupaty; Jesus Corbal; Martin G. Dixon; Milind Girkar; Jonathan C. Hall; Hideki Ido; Peter Lachner; Gilbert Neiger; Chris J. Newburn; Rajesh S. Parthasarathy; Bret L. Toll; Robert Valentine; Jeffrey G. Wiedemeier

Archive | 2014