Koushik Chakraborty | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Koushik Chakraborty is active.

Explore More

Publication

Featured researches published by Koushik Chakraborty.

architectural support for programming languages and operating systems | 2006

Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Koushik Chakraborty; Philip M. Wells; Gurindar S. Sohi

In canonical parallel processing, the operating system (OS) assigns a processing core to a single thread from a multithreaded server application. Since different threads from the same application often carry out similar computation, albeit at different times, we observe extensive code reuse among different processors, causing redundancy (e.g., in our server workloads, 45-65% of all instruction blocks are accessed by all processors). Moreover, largely independent fragments of computation compete for the same private resources causing destructive interference. Together, this redundancy and interference lead to poor utilization of private microarchitecture resources such as caches and branch predictors.We present Computation Spreading (CSP), which employs hardware migration to distribute a threads dissimilar fragments of computation across the multiple processing cores of a chip multiprocessor (CMP), while grouping similar computation fragments from different threads together. This paper focuses on a specific example of CSP for OS intensive server applications: separating application level (user) computation from the OS calls it makes.When performing CSP, each core becomes temporally specialized to execute certain computation fragments, and the same core is repeatedly used for such fragments. We examine two specific thread assignment policies for CSP, and show that these policies, across four server workloads, are able to reduce instruction misses in private L2 caches by 27-58%, private L2 load misses by 0-19%, and branch mispredictions by 9-25%.

architectural support for programming languages and operating systems | 2008

Adapting to intermittent faults in multicore systems

Philip M. Wells; Koushik Chakraborty; Gurindar S. Sohi

Future multicore processors will be more susceptible to a variety of hardware failures. In particular, intermittent faults, caused in part by manufacturing, thermal, and voltage variations, can cause bursts of frequent faults that last from several cycles to several seconds or more. Due to practical limitations of circuit techniques, cost-effective reliability will likely require the ability to temporarily suspend execution on a core during periods of intermittent faults. We investigate three of the most obvious techniques for adapting to the dynamically changing resource availability caused by intermittent faults, and demonstrate their different system-level implications. We show that system software reconfiguration has very high overhead, that temporarily pausing execution on a faulty core can lead to cascading livelock, and that using spare cores has high fault-free cost. To remedy these and other drawbacks of the three baseline techniques, we propose using a thin hardware/firmware layer to manage an overcommitted system -- one where the OS is configured to use more virtual processors than the number of currently available physical cores. We show that this proposed technique can gracefully degrade performance during intermittent faults of various duration with low overhead, without involving system software, and without requiring spare cores.

international conference on parallel architectures and compilation techniques | 2006

Hardware support for spin management in overcommitted virtual machines

Philip M. Wells; Koushik Chakraborty; Gurindar S. Sohi

Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines (System VMs). For example, most existing system VMs resort to gang scheduling a guest OSs virtual processors (VCPUs) to avoid OS synchronization overhead. However, gang scheduling is infeasible for some application domains, and is inflexible in other domains. In an overcommitted environment, an individual guest OS has more VCPUs than available physical processors (PCPUs), precluding the use of gang scheduling. In such an environment, we demonstrate a more than two-fold increase in runtime when transparently virtualizing a chip-multiprocessors cores. To combat this problem, we propose a hardware technique to detect several cases when a VCPU is not performing useful work, and suggest preempting that VCPU to run a different, more productive VCPU. Our technique can dramatically reduce cycles wasted on OS synchronization, without requiring any semantic information from the software. We then present a case study, typical of server consolidation, to demonstrate the potential of more flexible scheduling policies enabled by our technique. We propose one such policy that logically partitions the CMP cores between guest VMs. This policy increases throughput by 10–25% for consolidated server workloads due to improved cache locality and core utilization, and substantially improves performance isolation in private caches.

architectural support for programming languages and operating systems | 2009

Mixed-mode multicore reliability

Philip M. Wells; Koushik Chakraborty; Gurindar S. Sohi

Future processors are expected to observe increasing rates of hardware faults. Using Dual-Modular Redundancy (DMR), two cores of a multicore can be loosely coupled to redundantly execute a single software thread, providing very high coverage from many difference sources of faults. This reliability, however, comes at a high price in terms of per-thread IPC and overall system throughput. We make the observation that a user may want to run both applications requiring high reliability, such as financial software, and more fault tolerant applications requiring high performance, such as media or web software, on the same machine at the same time. Yet a traditional DMR system must fully operate in redundant mode whenever any application requires high reliability. This paper proposes a Mixed-Mode Multicore (MMM), which enables most applications, including the system software, to run with high reliability in DMR mode, while applications that need high performance can avoid the penalty of DMR. Though conceptually simple, two key challenges arise: 1) care must be taken to protect reliable applications from any faults occurring to applications running in high performance mode, and 2) the desire to execute additional independent software threads for a performance application complicates the scheduling of computation to cores. After solving these issues, an MMM is shown to improve overall system performance, compared to a traditional DMR system, by approximately 2X when one reliable and one performance application are concurrently executing.

design automation conference | 2014

Fort-NoCs: Mitigating the Threat of a Compromised NoC

Dean Michael Ancajas; Koushik Chakraborty; Sanghamitra Roy

In this paper, we uncover a novel and imminent threat to an emerging computing paradigm: MPSoCs built with 3rd party IP NoCs. We demonstrate that a compromised NoC (C-NoC) can enable a range of security attacks with an accomplice software component. To counteract these threats, we propose Fort-NoCs, a series of techniques that work together to provide protection from a C-NoC in an MPSoC. Fort-NoCss foolproof protection disables covert backdoor activation, and reduces the chance of a successful side-channel attack by “clouding” the information obtained by an attacker. Compared to recently proposed techniques, Fort-NoCs offers a substantially better protection with lower overheads.

design automation conference | 2012

Towards graceful aging degradation in NoCs through an adaptive routing algorithm

Kshitij Bhardwaj; Koushik Chakraborty; Sanghamitra Roy

Continuous technology scaling has made aging mechanisms such as Negative Bias Temperature Instability (NBTI) and electromigration primary concerns in Network-on-Chip (NoC) designs. In this paper, we model the effects of these aging mechanisms on NoC components such as routers and links using a novel reliability metric called Traffic Threshold per Epoch (TTpE). We observe a critical need of a robust aging-aware routing algorithm that not only reduces power-performance overheads caused due to aging degradation but also minimizes the stress experienced by heavily utilized routers and links. To solve this problem, we propose an aging-aware adaptive routing algorithm and a router microarchitecture that routes the packets along the paths which are both least congested and experience minimum aging stress. After an extensive experimental analysis using real workloads, we observe a 13%, 12.7% average overhead reduction in network latency and Energy-Delay-Product-Per-Flit (EDPPF) and a 10.4% improvement in performance using our aging-aware routing algorithm.

Operating Systems Review | 2009

Dynamic heterogeneity and the need for multicore virtualization

Philip M. Wells; Koushik Chakraborty; Gurindar S. Sohi

As the computing industry enters the multicore era, exponential growth in the number of transistors on a chip continues to present challenges and opportunities for computer architects and system designers. We examine one emerging issue in particular: that of dynamic heterogeneity, which can arise, even among physically homogeneous cores, from changing reliability, power, or thermal conditions, different cache and TLB contents, or changing resource configurations. This heterogeneity results in a constantly varying pool of hardware resources, which greatly complicates softwares traditional task of assigning computation to cores. In part to address dynamic heterogeneity, we argue that hardware should take a more active role in the management of its computation resources. We propose hardware techniques to virtualize the cores of a multicore processor, allowing hardware to flexibly reassign the virtual processors that are exposed, even to a single operating system, to any subset of the physical cores. We show that multicore virtualization operates with minimal overhead, and that it enables several novel resource management applications for improving both performance and reliability.

international symposium on low power electronics and design | 2012

Designing for dark silicon: a methodological perspective on energy efficient systems

Jason M. Allred; Sanghamitra Roy; Koushik Chakraborty

The emergence of dark silicon - a fundamental design constraint absent in the past generations - brings intriguing challenges and opportunities in microprocessor design. To gracefully embrace dark silicon, design methodologies must adapt themselves to identify progressive systems that can effectively exploit the growing dark silicon. We demonstrate that relying on traditional design metrics may lead to sub-optimal design choices with the rise of the dark silicon area. We provide a new metric to guide a dark silicon aware system design and propose a stochastic optimization algorithm for dark silicon aware multicore system design. Our design approach shows 7-23% benefit in upcoming technology generations.

design automation conference | 2012

Predicting timing violations through instruction-level path sensitization analysis

Sanghamitra Roy; Koushik Chakraborty

In this paper, we present a novel technique for early prediction of timing violations in high-performance pipelined microprocessors. We show that a static instruction in a microprocessor, identified by its Program Counter (PC), is an excellent predictor of an upcoming timing violation. Our analysis combines architectural data collected from real program execution with gate level logic analysis. Exploiting this PC based timing violation predictability, we propose a robust system design that predicts and tolerates timing violations seamlessly in a pipelined microprocessor. Under two different faulty environments, we show 20.9-89.8% and 14.6-80.6% average performance improvements in real programs over other state-of-the-art techniques, respectively.

international symposium on quality electronic design | 2011

Analysis and mitigation of NBTI aging in register file: An end-to-end approach

Saurabh Kothawade; Koushik Chakraborty; Sanghamitra Roy

Analysis and tackling of NBTI wearout effects are important design objectives in microprocessor designs. Application induced stress, combined with circuit-architectural design styles creates widely diverging wearout characteristics in a processor datapath. Moreover, in a typical case in desktop computing, different applications can interleave. This interleaving can cause destructive interference in stress patterns leading to substantially worse aging effect than an isolated application. We investigate NBTI wearout degradation in a register file using a comprehensive circuit-architectural analysis of SRAM cells, and show that recently proposed periodic bit inversion is unable to cope with interleaving application induced stress. We propose two novel micro-architecture techniques to mitigate this limitation. Our techniques reduce the Static Noise Margin (SNM) by 2.2X, while improving the degradation uncertainty by 14X over current state-of-the-art techniques. Our overhead analysis shows that both area and power overheads of our proposed technique can be minimal in the context of the reliability improvement it provides.

Explore More