Is this you? Create Your Porfile

Ganesh Venkatesh

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ganesh Venkatesh is active.

Explore More

Publication

Featured researches published by Ganesh Venkatesh.

architectural support for programming languages and operating systems | 2010

Conservation cores: reducing the energy of mature computations

Ganesh Venkatesh; Jack Sampson; Nathan Goulding; Saturnino Garcia; Vladyslav Bryksin; Jose Lugo-Martinez; Steven Swanson; Michael Bedford Taylor

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.

international symposium on microarchitecture | 2011

The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future

Nathan Goulding-Hotta; Jack Sampson; Ganesh Venkatesh; Saturnino Garcia; Joe Auricchio; Po-Chao Huang; Manish Arora; Siddhartha Nath; Vikram Bhatt; Jonathan Babb; Steven Swanson; Michael Bedford Taylor

This article discusses about Greendroid mobile Application Processor. Dark silicon has emerged as the fundamental limiter in modern processor design. The Greendroid mobile application processor demonstrates an approach that uses dark silicon to execute general-purpose smart phone applications with less energy than todays most energy efficient designs.

international symposium on microarchitecture | 2011

QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Ganesh Venkatesh; Jack Sampson; Nathan Goulding-Hotta; Sravanthi Kota Venkata; Michael Bedford Taylor; Steven Swanson

Transistor density continues to increase exponentially, but power dissipation per transistor is improving only slightly with each generation of Moores law. Given the constant chip-level power budgets, this exponentially decreases the percentage of transistors that can switch at full frequency with each technology generation. Hence, while the transistor budget continues to increase exponentially, the power budget has become the dominant limiting factor in processor design. In this regime, utilizing transistors to design specialized cores that optimize energy-per-computation becomes an effective approach to improve system performance.

ieee hot chips symposium | 2010

GreenDroid: A mobile application processor for a future of dark silicon

Nathan Goulding; Jack Sampson; Ganesh Venkatesh; Saturnino Garcia; Joe Auricchio; Jonathan Babb; Michael Bedford Taylor; Steven Swanson

This article consists of a collection of slides from the authors conference presentation on the GreenDroid, a mobile application processor. Also assesses the future of dark silicon. Some of the specific topics discussed include: the special features, system specifications, and system design for the GreenDroid; system architectures; applications for use; platforms supported; processing capabilities; memory capabilities; and targeted markets for application processors.

high-performance computer architecture | 2011

Efficient complex operators for irregular codes

Jack Sampson; Ganesh Venkatesh; Nathan Goulding-Hotta; Saturnino Garcia; Steven Swanson; Michael Bedford Taylor

Complex “fat operators” are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(c-cores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2× and area by 35% relative to c-cores. They are up to 2.5× faster than a general-purpose processor and reduce energy consumption by up to 8× for a variety of irregular applications including several SPECINT benchmarks.

high performance embedded architectures and compilers | 2005

Exploiting a computation reuse cache to reduce energy in network processors

Bengu Li; Ganesh Venkatesh; Brad Calder; Rajiv Gupta

High end routers are targeted at providing worst case throughput guarantees over latency. Caches on the other hand are meant to help latency not throughput in a traditional processor, and provide no additional throughput for a balanced network processor design. This is why most high end routers do not use caches for their data plane algorithms. In this paper we examine how to use a cache for a balanced high bandwidth network processor. We focus on using a cache not as a latency saving mechanism, but as an energy saving device. We propose using a Computation Reuse Cache that caches the answer to a query for data-plane algorithms, where the tags are the inputs to the query and the block the result of the query. This allows the data-plane algorithm to perform a complete query in one cache access if there is a hit. This creates slack by reducing the number of instructions executed. We then exploit this slack by fetch-gating the data-plane algorithm while matching the worst case throughput guarantees of the rest of the network processor. We evaluate the computation reuse cache for network data-plane algorithms IP-lookup, Packet Classification and NAT protocol.

field-programmable custom computing machines | 2011

Reducing the Energy Cost of Irregular Code Bases in Soft Processor Systems

Manish Arora; Jack Sampson; Nathan Goulding-Hotta; Jonathan Babb; Ganesh Venkatesh; Michael Bedford Taylor; Steven Swanson

This paper describes an architecture and FPGA synthesis tool chain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide range of unmodified C programs. FPGAs are increasingly used to build large-scale systems, and many large software systems contain relatively little code that is amenable to automatic, semi-automatic, or even manual parallelization. Whereas accelerator approaches have traditionally achieved energy benefits as a side effect from increasing performance via parallel execution, ICERs aim to achieve energy gains even on code with little exploitable parallelism. Traditional approaches to automatically generating accelerators from existing software rely on inferring parallel execution from serial code, so they face the same code analysis challenges as parallelizing compilers. In contrast, because the ICER approach targets energy rather than performance, it easily scales to large, irregular applications that are poor candidates for traditional acceleration. Our results show that, compared to a baseline system with soft processor cores, ICERs can reduce energy consumption by up to 9.5x for the code they target and 2.8x for whole applications.

field-programmable logic and applications | 2011

An Evaluation of Selective Depipelining for FPGA-Based Energy-Reducing Irregular Code Coprocessors

Jack Sampson; Manish Arora; Nathan Goulding-Hotta; Ganesh Venkatesh; Jonathan Babb; Vikram Bhatt; Steven Swanson; Michael Bedford Taylor

As the complexity of FPGA-based systems scales, the importance of efficiently handling irregular code increases. Recent work has proposed Irregular Code Energy Reducers (ICERs), a high-level synthesis approach for FPGAs that offers significant energy reduction for irregular code compared to a soft core processor. ICERs target the hot-spots of programs, and are seamlessly connected via a shared L1 cache with a soft processor that executes the cold code. This paper evaluates the application of the selective depipelining (SDP) technique to ICERs, which greatly reduces both the execution time and energy of irregular computations. SDP enables irregular computations to be expressed as large, fast, low-power combinational blocks. SDP maintains high memory bandwidth by scheduling the many potentially dependent memory operations within these blocks onto a high-frequency, highly-multiplexed coherent memory while scheduling combinational operations at a much lower frequency. SDP is a key enabler for improving the execution properties of irregular computations that are difficult to parallelize. We show that applying SDP to ICERs reduces energy-delay by 2.62× relative to ICERs. ICERs with SDP are up to 2.38× faster than a soft core processor and reduce energy consumption by up to 15.83× for a variety of irregular applications.

architectural support for programming languages and operating systems | 2006