Is this you? Create Your Porfile

Shankar Balachandran

Indian Institute of Technology Madras

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shankar Balachandran is active.

Explore More

Publication

Featured researches published by Shankar Balachandran.

Bioinformatics | 2015

Fast-SL: an efficient algorithm to identify synthetic lethal sets in metabolic networks

Aditya Pratapa; Shankar Balachandran; Karthik Raman

MOTIVATION Synthetic lethal sets are sets of reactions/genes where only the simultaneous removal of all reactions/genes in the set abolishes growth of an organism. Previous approaches to identify synthetic lethal genes in genome-scale metabolic networks have built on the framework of flux balance analysis (FBA), extending it either to exhaustively analyze all possible combinations of genes or formulate the problem as a bi-level mixed integer linear programming (MILP) problem. We here propose an algorithm, Fast-SL, which surmounts the computational complexity of previous approaches by iteratively reducing the search space for synthetic lethals, resulting in a substantial reduction in running time, even for higher order synthetic lethals. RESULTS We performed synthetic reaction and gene lethality analysis, using Fast-SL, for genome-scale metabolic networks of Escherichia coli, Salmonella enterica Typhimurium and Mycobacterium tuberculosis. Fast-SL also rigorously identifies synthetic lethal gene deletions, uncovering synthetic lethal triplets that were not reported previously. We confirm that the triple lethal gene sets obtained for the three organisms have a precise match with the results obtained through exhaustive enumeration of lethals performed on a computer cluster. We also parallelized our algorithm, enabling the identification of synthetic lethal gene quadruplets for all three organisms in under 6 h. Overall, Fast-SL enables an efficient enumeration of higher order synthetic lethals in metabolic networks, which may help uncover previously unknown genetic interactions and combinatorial drug targets. AVAILABILITY AND IMPLEMENTATION The MATLAB implementation of the algorithm, compatible with COBRA toolbox v2.0, is available at https://github.com/RamanLab/FastSL CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

ieee computer society annual symposium on vlsi | 2011

A New Wirelength Model for Analytical Placement

B. N. Bhramar Ray; Shankar Balachandran

Minimization of Half-Perimeter Wire length (HPWL) is a commonly used objective for circuit placement. Analytical placers require approximations of it that are smooth, continuous and differentiable. This paper proposes a new mathematical model to approximate the HPWL cost function. We discuss the theory behind the model and show its convergence properties. We derive the error bounds of the new cost function and show several desirable properties of the new approximation model. We use the global and detailed placements produced by the NTUPlacer on ISPD 2004 benchmark suite to compare the smoothed approximation to two other approximation schemes namely the LogSumExp and CHKS based approximations. Our experiments validate our theoretical results and we show that our scheme has an average of 5\% error in the total wire length. We also discuss key implementation issues that can help in keeping the analytical placers based on this approximation numerically stable.

symposium on computer architecture and high performance computing | 2012

CSHARP: Coherence and SHaring Aware Cache Replacement Policies for Parallel Applications

Biswabandan Panda; Shankar Balachandran

Parallel applications are becoming mainstream and architectural techniques for multicores that target these applications are the need of the hour. Sharing of data by multiple threads and issues due to data coherence are unique to parallel applications. We propose CSHARP, a hardware framework that brings coherence and sharing awareness to any shared last level cache replacement policy. We use the degree of sharing of cache lines and the information present in coherence vectors to make replacement decisions. We apply CSHARP to a state-of-the-art cache replacement policy called TA-DRRIP to show its effectiveness. Our experiments on four core simulated system show that applying CSHARP on TA-DRRIP gives an extra 10% reduction in miss-rate at the LLC. Compared to LRU policy, CSHARP on TA-DRRIP shows a 18% miss-rate reduction and a 7% performance boost. We also show the scalability of our proposal by studying the hardware overhead and performance on a 8-core system.

design, automation, and test in europe | 2013

An efficient wirelength model for analytical placement

B. N. Bhramar Ray; Shankar Balachandran

Smooth approximations to half-perimeter wirelength are being investigated actively because of the recent increase in interest in analytical placement. It is necessary to not just provide smooth approximations but also to provide error analysis and convergence properties of these approximations. We present a new approximation scheme which uses a non-recursive approximation to the max function. We also show the convergence properties and the error bounds. The accuracy of our proposed scheme is better than those of the popular Logarithm-Sum-Exponential (LSE) wirelength model [7] and the recently proposed Weighted Average(WA) wirelength model[3]. We also experimentally validate the comparison by using global and detail placements produced by NTU Placer [1] on ISPD 2004 benchmark suite. The experimentations on benchmarks validate that the error bounds of our model are lower, with an average of 4% error in the total wirelength.

IEEE Computer Architecture Letters | 2016

Expert Prefetch Prediction: An Expert Predicting the Usefulness of Hardware Prefetchers

Biswabandan Panda; Shankar Balachandran

Hardware prefetching improves system performance by hiding and tolerating the latencies of lower levels of cache and off-chip DRAM. An accurate prefetcher improves system performance whereas an inaccurate prefetcher can cause cache pollution and consume additional bandwidth. Prefetch address filtering techniques improve prefetch accuracy by predicting the usefulness of a prefetch address and based on the outcome of the prediction, the prefetcher decides whether or not to issue a prefetch request. Existing techniques use only one signature to predict the usefulness of a prefetcher but no single predictor works well across all the applications. In this work, we propose weighted-majority filter, an expert way of predicting the usefulness of prefetch addresses. The proposed filter is adaptive in nature and uses the prediction of the best predictor(s) from a pool of predictors. Our filter is orthogonal to the underlying prefetching algorithm. We evaluate the effectiveness of our technique on 22 SPEC-2000/2006 applications. On an average, when employed with three state-of-the-art prefetchers such as AMPM, SMS, and GHB-PC/DC, our filter provides performance improvement of 8.1, 9.3, and 11 percent respectively.

international conference on parallel architectures and compilation techniques | 2013

TCPT: thread criticality-driven prefetcher throttling

Biswabandan Panda; Shankar Balachandran

A single parallel application running on a multicore system shows sub-linear speedup because of slow progress of one or more threads known as critical threads. Identifying critical threads and accelerating them can improve system performance. One of the metrics that correlate to thread criticality is the number of cache misses and the penalty associated with it. This paper proposes a throttling mechanism called TCPT which throttles hardware prefetchers by changing the prefetch degree based on the thread criticality.

international conference on vlsi design | 2012

Set-Cover Heuristics for Two-Level Logic Minimization

Ankit Kagliwal; Shankar Balachandran

Given a Boolean function, the Unate-Covering Problem (UCP) is NP-hard. This problem can be modeled as a set cover problem where minterms are the elements and implicants form the sets. Traditional solutions in logic synthesis use set cover algorithms that are oblivious to the special semantic of the elements and sets. We propose three new heuristics for the set-cover problem which are aware of the relationship between implicants and minterms. We show that the proposed heuristics are effective for breaking ties when a cyclic core is obtained. We evaluate the heuristics on a set of hard instances from BHOSLIB benchmark suite. We also replace ESPRESSOs set cover algorithm using these heuristics and compare the logic synthesis results. We further map the minimized Boolean equations using ABCs technology mapping tool using 2-input NAND gates and 4-input Lookup Tables (LUTs).

international conference on parallel architectures and compilation techniques | 2014

XStream: cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs

Biswabandan Panda; Shankar Balachandran

Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential applications running on a multicore system. In contrast to multiple independent applications, a single parallel application running on a multicore system exhibits different behavior. In case of a parallel application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

design, automation, and test in europe | 2014

Introducing Thread Criticality awareness in Prefetcher Aggressiveness Control

Biswabandan Panda; Shankar Balachandran

A single parallel application running on a multi-core system shows sub-linear speedup because of slow progress of one or more threads known as critical threads. Some of the reasons for the slow progress of threads are (1) load imbalance, (2) frequent cache misses and (3) effect of synchronization primitives. Identifying critical threads and minimizing their cache miss latencies can improve the overall execution time of a program. One way to hide and tolerate the cache misses is through hardware prefetching. Hardware prefetching is one of the most commonly used memory latency hiding techniques. Previous studies have shown the effectiveness of hardware prefetchers for multiprogrammed workloads (multiple sequential applications running independently on different cores). In contrast to multiprogrammed workloads, the performance of a single parallel application depends on the progress of slow progress(critical) threads. This paper introduces a prefetcher aggressiveness control mechanism called Thread Criticality-aware Prefetcher Aggressiveness Control (TCPAC). TCPAC controls the aggressiveness of the prefetchers at the L2 prefetching controllers (known as TCPAC-P), DRAM controller (known as TCPAC-D) and at the Last Level Cache (LLC) controller (known as TCPAC-C) using prefetch accuracy and thread progress. Each TCPAC sub-technique outperforms the respective state-of-the-art techniques such as HPAC [2], PADC [4], and PACMan [3] and the combination of all the TCPAC sub-techniques named as TCPAC-PDC outperforms the combination of HPAC, PADC, and PACMan. On an average, on a 8 core system, in terms of improvement in execution time, TCPAC-PDC outperforms the combination of HPAC, PADC, and PACMan by 7.61%. For 12 and 16 cores, TCPAC-PDC beats the state-of-the-art combinations by 7.21% and 8.32% respectively.

international conference on computer design | 2013

LPScan: An algorithm for supply scaling and switching activity minimization during test

Seetal Potluri; Satya Trinadh Adireddy; Chidhambaranathan Rajamanikkam; Shankar Balachandran

Existing low power testing techniques either focus on reducing the switching activity neglecting supply voltage, or perform supply voltage scaling without attempting to minimize switching activity. In this paper we propose LPScan (Low Power Scan), which integrates supply scaling and switching activity reduction in a single framework to reduce test power. For a shift frequency of 125MHz, the LPScan algorithm when applied to circuits from the ISCAS, OpenCores and ITC benchmark suite, produced power savings of 80% in the best case and 50% in the average case, compared to the best known algorithm [1].

Explore More