Sadagopan Srinivasan
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sadagopan Srinivasan.
Operating Systems Review | 2011
Sadagopan Srinivasan; Li Zhao; Ramesh Illikkal; Ravishankar R. Iyer
Almost all hardware platforms to date have been homogeneous with one or more identical processors managed by the operating system (OS). However, recently, it has been recognized that power constraints and the need for domain-specific high performance computing may lead architects towards building heterogeneous architectures and platforms in the near future. In this paper, we consider the three types of heterogeneous core architectures: (a) Virtual asymmetric cores: multiple processors that have identical core micro-architectures and ISA but each running at a different frequency point or perhaps having a different cache size, (b) Physically asymmetric cores: heterogeneous cores, each with a fundamentally different microarchitecture (in-order vs. out-of-order for instance) running at similar or different frequencies, with identical ISA and (c) Hybrid cores: multiple cores, where some cores have tightly-coupled hardware accelerators or special functional units. We show case studies that highlight why existing OS and hardware interaction in such heterogeneous architectures is inefficient and causes loss in application performance, throughput efficiency and lack of quality of service. We then discuss hardware and software support needed to address these challenges in heterogeneous platforms and establish efficient heterogeneous environments for platforms in the next decade. In particular, we will outline a monitoring and prediction framework for heterogeneity along with software support to take advantage of this information. Based on measurements on real platforms, we will show that these proposed techniques can provide significant advantage in terms of performance and power efficiency in heterogeneous platforms.
international conference on computer design | 2009
Seung Eun Lee; Zhen Fang Yong Zhang; Sadagopan Srinivasan; Ravi R. Iyer; Donald Newell
Mobile Augmented Reality (MAR) is an emerging visual computing application for the mobile internet device (MID). In one MAR usage model, the user points the handheld device to an object (like a wine bottle or a building) and the MID automatically recognizes and displays information regarding the object. Achieving this in software on the handheld requires significant compute processing for object recognition and matching. In this paper, we identify hotspot functions of the MAR workload on a low-power x86 platform that motivates acceleration. We present the detailed design of two hardware accelerators, one for object recognition (MAR-HA) and the other for match processing (MAR-MA). We also quantify the performance and area efficiency of the hardware accelerators. Our analysis shows that hardware acceleration has the potential to improve the individual hotspot functions by as much as 20x, and overall response time by 7x. As a result, user response time can be reduced significantly.
international symposium on microarchitecture | 2011
Ravi R. Iyer; Sadagopan Srinivasan; Omesh Tickoo; Zhen Fang; Rameshkumar G. Illikkal; Steven Zhang; Vineet Chadha; Paul M. Stillwell; Seung Eun Lee
As smart mobile devices become pervasive, vendors are offering rich features supported by cloud-based servers to enhance the user experience. Such servers implement large-scale computing environments, where target data is compared to a massive preloaded database. CogniServe is a highly efficient recognition server for large-scale recognition that employs a heterogeneous architecture to provide low-power, high-throughput cores, along with application-specific accelerators.
high-performance computer architecture | 2011
Xiaowei Jiang; Asit K. Mishra; Li Zhao; Ravishankar R. Iyer; Zhen Fang; Sadagopan Srinivasan; Srihari Makineni; Paul Brett; Chita R. Das
In current Chip-multiprocessors (CMPs), a significant portion of the die is consumed by the last-level cache. Until recently, the balance of cache and core space has been primarily guided by the needs of single applications. However, as multiple applications or virtual machines (VMs) are consolidated on such a platform, researchers have observed that not all VMs or applications require significant amount of cache space. In order to take advantage of this phenomenon, we explore the use of asymmetric last-level caches in a CMP platform. While asymmetric cache CMPs provide the benefit of reduced power and area, it is important to build in hardware/software support to appropriately schedule applications on to cores with suitable cache capacity. In this paper, we address this problem with our ACCESS architecture comprising of: (a) asymmetric caches across a group of cores, (b) hardware support that enables prediction of cache performance on the different sized caches and (c) OS scheduler support to make use of the prediction capability and appropriately schedule applications on to core with suitable cache capacity. Measurements on a working prototype using SPEC2006 benchmarks show that our ACCESS architecture can effectively schedule jobs in an asymmetric cache CMP and provide 23% performance improvement compared to a naive scheduler, and is 97% close to an oracle scheduler in making schedules.
ieee international symposium on workload characterization | 2009
Sadagopan Srinivasan; Zhen Fang; Ravi R. Iyer; Steven Zhang; Mike Espig; Don Newell; Daniel M. Cermak; Yi Wu; Igor Kozintsev; Horst W. Haussecker
The introduction of low power general purpose processors (like the Intel® Atom™ processor) expands the capability of handheld and mobile internet devices (MIDs) to include compelling visual computing applications. One rapidly emerging visual computing usage model is known as mobile augmented reality (MAR). In the MAR usage model, the user is able to point the handheld camera to an object (like a wine bottle) or a set of objects (like an outdoor scene of buildings or monuments) and the device automatically recognizes and displays information regarding the object(s). Achieving this on the handheld requires significant compute processing resulting in a response time in the order of several seconds. In this paper, we analyze a MAR workload and identify the primary hotspot functions that incur a large fraction of the overall response time. We also present a detailed architectural characterization of the hotspot functions in terms of CPI, MPI, etc. We then implement and analyze the benefits of several software optimizations: (a) vectorization, (b) multi-threading, (c) cache conflict avoidance and (d) miscellaneous code optimizations that reduce the number of computations. We show that a 3X performance improvement in execution time can be achieved by implementing these optimizations. Overall, we believe our analysis provides a detailed understanding of the processing for a new domain of visual computing workloads (i.e. MAR) running on low power handheld compute platforms.
international conference on parallel architectures and compilation techniques | 2012
Kshitij Sudan; Sadagopan Srinivasan; Rajeev Balasubramonian; Ravi R. Iyer
Co-location of applications is a proven technique to improve hardware utilization. Recent advances in virtualization have made co-location of independent applications on shared hardware a common scenario in datacenters. Co-location, while maintaining Quality-of-Service (QoS) for each application is a complex problem that is fast gaining relevance for these datacenters. The problem is exacerbated by the need for effective resource utilization at datacenter scales. In this work, we show that the memory system is a primary bottleneck in many workloads and is a more effective focal point when enforcing QoS. We examine four different memory system levers to enforce QoS: two that have been previously proposed, and two novel levers. We compare the effectiveness of each lever in minimizing power and resource needs, while enforcing QoS guarantees. We also evaluate the effectiveness of combining various levers and show that this combined approach can yield power reductions
measurement and modeling of computer systems | 2011
Sadagopan Srinivasan; Ravishankar R. Iyer; Li Zhao; Ramesh Illikkal
Designing heterogeneous chip multiprocessors (CMPs) with a mix of big cores (complex superscalar out-of-order pipelines) and small cores (simple in-order pipeline) is emerging as an attractive option for future architectures. Such architectures have the potential to deliver both high performance and power efficiency but this requires operating systems (OS) or virtual machine monitors (VMMs) to efficiently schedule each software thread on the type of core that is best suited for it. In this paper, we highlight the need for architectural support for OS scheduling in a heterogeneous CMP. We propose HeteroScouts, a hardware mechanism to assist the OS to efficiently predict the performance of a task on different cores in the platform.
ieee international symposium on workload characterization | 2010
Sadagopan Srinivasan; Li Zhao; Lin Sun; Zhen Fang; Peng Li; Tao Wang; Ravishankar Iyer; Ramesh Illikkal; Dong Liu
Optical Character Recognition (OCR) converts images of handwritten or printed text captured by camera or scanner into editable text. OCR has seen limited adoption in mobile platforms due to the performance constraints of these systems. Intel® Atom™ processors have enabled general purpose applications to be executed on handheld devices. In this paper, we analyze a reference implementation of the OCR workload on a low power general purpose processor and identify the primary hotspot functions that incur a large fraction of the overall response time. We also present a detailed architectural characterization of the hotspot functions in terms of CPI, MPI, etc. We then implement and analyze several software/algorithmic optimizations such as i) Multi-threading, ii) image sampling for a hotspot function and iii) miscellaneous code optimization. Our results show that up to 2X performance improvement in execution time of the application and almost 9X improvement for a hotspot can be achieved by using various software optimizations. We designed and implemented a hardware accelerator for one of the hotspots to further reduce the execution time and power. Overall, we believe our analysis provides a detailed understanding of the processing overheads for OCR running on a new class of low power compute platforms.
international conference on signal and image processing applications | 2011
Pavel S. Smirnov; Piotr Semenov; Mikhail Lyakh; Anthony L. Chun; Dmitry Gusev; Alexander Redkin; Sadagopan Srinivasan
A number of well-known computer vision algorithms for image feature detection use luminosity only or some specific color model. Although these methods are effective in many cases, it can be shown that these transformations of the full image information reduce detection performance due to method-induced restrictions. In this paper, we describe a formal approach to the construction of a multi-channel interest point detector for an arbitrary number of channels (regardless of data nature), which maximizes the benefits from the usage of information from these additional channels. We introduce the Generalized Robust Multi-channel (GRoM) feature detector prototype that is based upon the proposed approach, detail features of GRoM and include a set of illustrative examples to highlight its differentiation from existing methods.
international symposium on computers and communications | 2011
Michael E. Kounavis; Joel Morrissette; Sadagopan Srinivasan; Raj Yavatkar
We address the problem of detecting non-transient anomalies in visual information. By non-transient anomalies we mean changes in the way environments look that are persistent across time. Such changes may include leaving unattended bags at airport corridors, putting graffiti in building walls or damaging public property. Detecting non-transient anomalies is critical to security and surveillance in indoor and outdoor environments. We argue that existing off-the-shelf solutions to computer vision problems (e.g., image recognition, gesture recognition, text recognition) are not the most efficient when applied to detecting non-transient anomalies due to their associated computational overhead. In this paper we present a neural network-based architecture that addresses some of the limitations of the state of the art. To speed up computations, our architecture supports the processing of a large number of neurons in parallel. To reduce computational overheads, our architecture omits some of the Gaussian kernel-based feature extraction tasks performed by other systems. To classify visual anomalies as non-transient, our architecture uses a codebook-based algorithm which builds a history profile for every image segment. We describe our architecture and present some performance analysis.