Vinay K. Chippa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vinay K. Chippa is active.

Explore More

Publication

Featured researches published by Vinay K. Chippa.

design automation conference | 2013

Analysis and characterization of inherent application resilience for approximate computing

Vinay K. Chippa; Srimat T. Chakradhar; Kaushik Roy; Anand Raghunathan

Approximate computing is an emerging design paradigm that enables highly efficient hardware and software implementations by exploiting the inherent resilience of applications to in-exactness in their computations. Previous work in this area has demonstrated the potential for significant energy and performance improvements, but largely consists of ad hoc techniques that have been applied to a small number of applications. Taking approximate computing closer to mainstream adoption requires (i) a deeper understanding of inherent application resilience across a broader range of applications (ii) tools that can quantitatively establish the inherent resilience of an application, and (iii) methods to quickly assess the potential of various approximate computing techniques for a given application. We make two key contributions in this direction. Our primary contribution is the analysis and characterization of inherent application resilience present in a suite of 12 widely used applications from the domains of recognition, data mining, and search. Based on this analysis, we present several new insights into the nature of resilience and its relationship to various key application characteristics. To facilitate our analysis, we propose a systematic framework for Application Resilience Characterization (ARC) that (a) partitions an application into resilient and sensitive parts and (b) characterizes the resilient parts using approximation models that abstract a wide range of approximate computing techniques. We believe that the key insights that we present can help shape further research in the area of approximate computing, while automatic resilience characterization frameworks such as ARC can greatly aid designers in the adoption approximate computing.

international symposium on microarchitecture | 2013

Quality programmable vector processors for approximate computing

Swagath Venkataramani; Vinay K. Chippa; Srimat T. Chakradhar; Kaushik Roy; Anand Raghunathan

Approximate computing leverages the intrinsic resilience of applications to inexactness in their computations, to achieve a desirable trade-off between efficiency (performance or energy) and acceptable quality of results. To broaden the applicability of approximate computing, we propose quality programmable processors, in which the notion of quality is explicitly codified in the HW/SW interface, i.e., the instruction set. The ISA of a quality programmable processor contains instructions associated with quality fields to specify the accuracy level that must be met during their execution. We show that this ability to control the accuracy of instruction execution greatly enhances the scope of approximate computing, allowing it to be applied to larger parts of programs. The micro-architecture of a quality programmable processor contains hardware mechanisms that translate the instruction-level quality specifications into energy savings. Additionally, it may expose the actual error incurred during the execution of each instruction (which may be less than the specified limit) back to software. As a first embodiment of quality programmable processors, we present the design of Quora, an energy efficient, quality programmable vector processor. Quora utilizes a 3-tiered hierarchy of processing elements that provide distinctly different energy vs. quality trade-offs, and uses hardware mechanisms based on precision scaling with error monitoring and compensation to facilitate quality programmable execution. We evaluate an implementation of Quora with 289 processing elements in 45nm technology. The results demonstrate that leveraging quality-programmability leads to 1.05×–1.7× savings in energy for virtually no loss (< 0.5%) in application output quality, and 1.18×–2.1× energy savings for modest impact (<2.5%) on output quality. Our work suggests that quality programmable processors are a significant step towards bringing approximate computing to the mainstream.

design automation conference | 2010

Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency

Vinay K. Chippa; Debabrata Mohapatra; Anand Raghunathan; Kaushik Roy; Srimat T. Chakradhar

Algorithms from several interesting application domains exhibit the property of inherent resilience to “errors” from extrinsic or intrinsic sources, offering entirely new avenues for performance and power optimization by relaxing the conventional requirement of exact (numerical or Boolean) equivalence between the specification and hardware implementation. We propose scalable effort hardware design as an approach to tap the reservoir of algorithmic resilience and translate it into highly efficient hardware implementations. The basic tenet of the scalable effort design approach is to identify mechanisms at each level of design abstraction (circuit, architecture and algorithm) that can be used to vary the computational effort expended towards generation of the correct (exact) result, and expose them as control knobs in the implementation. These scaling mechanisms can be utilized to achieve improved energy efficiency while maintaining an acceptable (and often, near identical) level of quality of the overall result. A second major tenet of the scalable effort design approach is that fully exploiting the potential of algorithmic resilience requires synergistic cross-layer optimization of scaling mechanisms identified at different levels of design abstraction. We have implemented an energy-efficient SVM classification chip based on the proposed scalable effort design approach. We present results from post-layout simulations and demonstrate that scalable effort hardware can achieve large energy reductions (1.2X-2.2X with no impact on classification accuracy, and 2.2X-4.1X with modest reductions in accuracy) across various sets. Our results also establish that cross-layer optimization leads to much improved energy vs. quality tradeoffs compared to each of the individual techniques.

design, automation, and test in europe | 2011

Design of voltage-scalable meta-functions for approximate computing

Debabrata Mohapatra; Vinay K. Chippa; Anand Raghunathan; Kaushik Roy

Approximate computing techniques that exploit the inherent resilience in algorithms through mechanisms such as voltage over-scaling (VOS) have gained significant interest. In this work, we focus on meta-functions that represent computational kernels commonly found in application domains that demonstrate significant inherent resilience, namely Multimedia, Recognition and Data Mining. We propose design techniques (dynamic segmentation with multi-cycle error compensation, and delay budgeting for chained data path components) which enable the hardware implementations of these meta-functions to scale more gracefully under voltage over-scaling. The net effect of these design techniques is improved accuracy (fewer and smaller errors) under a wide range of over-scaled voltages. Results based on extensive transistor-level simulations demonstrate that the optimized meta-function implementations consume up to 30% less energy at iso-error rates, while achieving upto 27% lower error rates at iso-energy when compared to their baseline counterparts. System-level simulations for three applications, motion estimation, support vector machine based classification and k-means based clustering are also presented to demonstrate the impact of the improved meta-functions at the application level.

design automation conference | 2011

Dynamic effort scaling: managing the quality-efficiency tradeoff

Vinay K. Chippa; Anand Raghunathan; Kaushik Roy; Srimat T. Chakradhar

Several recently proposed design techniques leverage the inherent error resilience of applications for improved efficiency (energy or performance). Hardware and software systems that are thus designed may be viewed as “scalable effort systems”, since they offer the capability to modulate the effort that they expend towards computation, thereby allowing for tradeoffs between output quality and efficiency. We propose the concept of Dynamic Effort Scaling (DES), which refers to dynamic management of the control knobs that are exposed by scalable effort systems. We argue the need for DES by observing that the degree of resilience often varies significantly across applications, across datasets, and even within a dataset. We propose a general conceptual framework for DES by formulating it as a feedback control problem, wherein the scaling mechanisms are regulated with the goal of maintaining output quality within a certain specified limit. We present an implementation of Dynamic Effort Scaling in the context of a scalable-effort processor for Support Vector Machines, and evaluate it under various application scenarios and data sets. Our results clearly demonstrate the benefits of the proposed approach — statically setting the scaling mechanisms leads to either significant error overshoot or significant opportunities for energy savings left on the table unexploited. In contrast, DES is able to effectively regulate the output quality while maximally exploiting the time-varying resiliency in the workload.

asilomar conference on signals, systems and computers | 2013

Approximate computing: An integrated hardware approach

Vinay K. Chippa; Swagath Venkataramani; Srimat T. Chakradhar; Kaushik Roy; Anand Raghunathan

Computing today is largely not about calculating a precise numerical end result. Instead, computing platforms are increasingly used to execute applications (such as search, analytics, sensor data processing, recognition, mining, and synthesis) for which “correctness” is defined as producing results that are good enough, or of sufficient quality. These applications are often intrinsically resilient to a large fraction of their computations being executed in an imprecise or approximate manner. However, the design of computing platforms continues to be guided by the principle that every computation must be executed with the same strict notion of correctness. Approximate computing departs from this long-held dogma, and exploits intrinsic application resilience to improve the efficiency (energy or speed) of computing platforms. We describe an integrated approach to approximate computing in hardware that consists of three key components. First, we present an automatic resilience characterization framework that allows the designer to quantitatively evaluate the intrinsic resilience of an application, and to quickly assess the potential of various approximate computing techniques. We then describe scalable effort hardware, an approach to approximate computing wherein hardware is designed with various scaling mechanisms, or knobs that modulate the effort expended towards correctly performing an applications computations. Scaling mechanisms are identified at the algorithm, architecture, and circuit levels, and embodied in the hardware to provide a rich trade-off between computational accuracy and energy. Finally, dynamic effort scaling is proposed as a feedback control approach to modulate the scaling mechanisms at runtime in response to varying application requirements and data characteristics. To demonstrate the proposed concepts, we have designed and fabricated an energy-efficient Recognition and Mining (RM) processor in the TSMC 65nm process technology. Our measurement results demonstrate that approximate computing leads to 2-20X energy savings with minimal impact on output quality across a range of applications.

international symposium on low power electronics and design | 2014

StoRM: a stochastic recognition and mining processor

Vinay K. Chippa; Swagath Venkataramani; Kaushik Roy; Anand Raghunathan

Recognition and Mining applications are becoming prevalent across the entire spectrum of computing platforms, and place very high demands on their capabilities. We propose a Stochastic Recognition and Mining processor (StoRM), which uses Stochastic Computing (SC) to efficiently realize computational kernels from these domains. Stochastic computing facilitates compact, power-efficient realization of arithmetic operations by representing and processing information as pseudo-random bit-streams. However, the overhead of conversion between representations, and the exponential relationship between precision and bit-stream length, are key challenges that limit the efficiency of stochastic designs. The proposed architecture for StoRM consists of a 2D array of Stochastic Processing Elements (StoPEs) with a streaming memory hierarchy, enabling binary-to-stochastic conversion to be amortized across rows or columns of StoPEs. We propose vector processing and segmented stochastic processing in the StoPEs to mitigate the unfavorable tradeoff between precision and bit-stream length. We also exploit the compactness of StoPEs to increase parallelism, thereby improving performance and energy efficiency. Finally, leveraging the resilience of RM applications to approximations in their computations, we design StoRM to support modulation of the stochastic bit-stream length, and utilize this capability to to optimize energy for a desired output quality. StoRM achieves 2-3X energy-delay improvements over a conventional design without sacrificing output quality, and upto 10X (20X) improvements when upto 5% (10%) loss in output quality is allowed. Our results also demonstrate that the proposed design techniques greatly enhance the applicability and benefits of stochastic computing.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Scalable Effort Hardware Design

Vinay K. Chippa; Debabrata Mohapatra; Kaushik Roy; Srimat T. Chakradhar; Anand Raghunathan

Applications from several application domains exhibit the property of inherent application resilience, offering entirely new avenues for performance and power optimization by relaxing the conventional requirement of exact (numerical or Boolean) equivalence between the specification and hardware implementation. We propose scalable effort hardware as a design approach to tap the reservoir of application resilience and translate it into highly efficient hardware implementations. The first tenet of the scalable effort design approach is to identify mechanisms at each level of design abstraction (circuit, architecture, and algorithm) that can be used to vary the computational effort expended toward generation of the correct (exact) result, and to expose these mechanisms as control knobs in the implementation. These scaling mechanisms can be utilized to achieve improved energy efficiency while maintaining an acceptable (and often, near identical) level of quality of the overall result. The second tenet of the scalable effort design approach is that fully exploiting the potential of application resilience requires synergistic cross-layer optimization of scaling mechanisms identified at different levels of design abstraction. We have implemented an energy-efficient recognition and mining (RM) processor based on the proposed scalable effort design approach. Results from the execution of support vector machine training and classification, generalized learning vector quantization training, and k-means clustering on the scalable effort RM processor show that it can achieve energy reductions of 1.2×-5× with negligible impact on output quality, and 2.2×-50× with moderate loss in output quality, across various data sets. Our results also establish that cross-layer optimization across different scaling mechanisms leads to higher energy savings (1.4×-2× on an average) for a given output quality compared with each of the individual techniques.

international symposium on nanoscale architectures | 2011

Energy efficient many-core processor for recognition and mining using spin-based memory

Rangharajan Venkatesan; Vinay K. Chippa; Charles Augustine; Kaushik Roy; Anand Raghunathan

Emerging workloads such as Recognition, Mining and Synthesis present great opportunities for many-core parallel computing, but also place significant demands on the memory system. Spin-based devices have shown great promise in enabling high-density, energy-efficient memory. In this paper, we present the design and evaluation of a many-core domain-specific processor for Recognition and Data Mining (RM) using spin-based memory. The RM processor has a two-level on-chip memory hierarchy consisting of a streaming access first-level memory and a random access second-level memory. Based on the memory access characteristics, we suggest the use of Domain Wall Memory (DWM) and Spin Transfer Torque Magnetic RAM (STT MRAM) to realize the first and second levels, respectively. We develop architectural models of DWM and STT MRAM, and use them to evaluate the proposed design and explore various architectural tradeoffs in the RM processor. We evaluate the proposed design by comparing it to a CMOS based design at the same 45nm technology node. For three representative RM algorithms (Support Vector Machines, k-means clustering, and GLVQ classification), the iso-area spin memory based design achieves an energy-delay product improvement of 1.5X–3X. Our results suggest that spin based memory technologies can enable significant improvements in energy efficiency and performance for highly parallel, data-intensive workloads.

IEEE Transactions on Nanotechnology | 2014

Domain-Specific Many-core Computing using Spin-based Memory

Rangharajan Venkatesan; Vinay K. Chippa; Charles Augustine; Kaushik Roy; Anand Raghunathan

Spin-based devices have shown great potential in enabling high-density, energy-efficient memory and are therefore, considered highly promising for the design of future computing platforms. While the impact of spin-based devices on general-purpose computing platforms has been studied, they are yet to be explored in the context of domain-specific computing, where the characteristics of the devices can be matched to application characteristics through architectural customization, so as to maximize the benefits. We present the design and evaluation of a many-core domain-specific processor for the emerging application domains of Recognition and Mining (RM) using spin-based memories. The domain-specific processor has a two-level on-chip memory hierarchy consisting of a streaming access first-level memory and a random access second-level memory. Based on the memory access characteristics, we suggest the use of Domain Wall Memory (DWM) and Spin Transfer Torque Magnetic RAM (STT-MRAM) to realize the first and second levels, respectively. We develop architectural models of DWM and STT-MRAM, and use them to evaluate the proposed design and explore various architectural tradeoffs in the domain-specific processor. We evaluate the proposed design by comparing it to a CMOS-based baseline at the same technology node. For three representative RM algorithms (support vector machine, k-means clustering, and generalized learning vector quantization), the spin-memory-based design achieves an energy-delay product improvement of 1.5 ×-4 × over the CMOS baseline at iso-area. Our results suggest that spin-based memory technologies can enable significant improvements in energy efficiency and performance for highly parallel, data-intensive workloads. Our study also highlights the importance of synergistic architectural exploration along with the use of emerging devices rather than simply considering them as drop-in replacements.

Explore More