Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shantanu Dutt is active.

Publication


Featured researches published by Shantanu Dutt.


design automation conference | 1996

A probability-based approach to VLSI circuit partitioning

Shantanu Dutt; Wenyong Deng

Iterative-improvement 2-way min-cut partitioning is an important phase in most circuit partitioning tools. Most iterative improvement techniques for circuit netlists like the Fidducia-Mattheyses (FM) method compute the gains of nodes using local netlist information that is only concerned with the immediate improvement in the cutset. This can lead to misleading gain calculations. Krishnamurthy suggested a lookahead (LA) gain calculation method to ameliorate this situation; however, as we show, it leaves considerable room for improvement. We present here a probabilistic gain computation approach called PROP that is capable of capturing the global and future implications of moving a node at the current time. Experimental results show that for the same number of runs, PROP performs much better than FM (by about 30%) and LA (by about 27%), and is also better than many recent state-of-the-art clustering-based partitioners like EIG1, WINDOW, MELO and PARABOLI by 15% to 57%. We also show that the space and time complexities of PROP are very reasonable. Our empirical timing results reveal that it is appreciably faster than the above clustering-based techniques, and only a little slower than FM and LA, both of which are very fast.


Journal of Parallel and Distributed Computing | 1991

Designing fault-tolerant systems using automorphisms

Shantanu Dutt; John P. Hayes

Abstract This paper presents a general theory for modeling and designing fault-tolerant multiprocessor systems in a systematic and efficient manner. We are concerned here with structural fault tolerance, defined as the ability to reconfigure around faults in order to preserve the interconnection structure of a multiprocessor. We represent multiprocessor systems by graphs whose node sets denote processors and whose edge sets denote dedicated interprocessor links. The fault-tolerant design and reconfiguration process of a multiprocessor is modeled by graph automorphisms. This automorphism-based methodology also models some important practical design features not previously addressed, including applicability to any multiprocessor structure and any number of faults. Low redundancy and efficient reconfigurability are also addressed. We apply our approach directly to a class of regular multiprocessor graphs termed circulant. For noncirculant graphs we give an algorithm to construct their circulant edge supergraphs efficiently. An application of the theory to the design of fault-tolerant hypercube multiprocessors is described. The resulting designs are shown to be far superior to those proposed in previous work.


IEEE Transactions on Computers | 1992

Some practical issues in the design of fault-tolerant multiprocessors

Shantanu Dutt; John P. Hayes

Methods for modeling and implementing various practical aspects of fault-tolerant multiprocessor systems largely neglected in prior research are examined. The node-covering design approach is generalized to accommodate systems whose structure and failure mechanisms are represented by arbitrary graphs. Several new types of covering graphs are defined, which lead to various useful design tradeoffs. A new technique for incremental design is presented, using a class of switch implementations that reduce a systems interconnection costs. The reduction of other cost factors is also addressed, and methods are presented for VLSI layout area minimization, fast and distributed reconfiguration, efficient transfer of state information for software recovery, and the efficient use of local spares. >


IEEE Transactions on Computers | 1990

On designing and reconfiguring k-fault-tolerant tree architectures

Shantanu Dutt; John P. Hayes

A general approach to designing tree structured multiprocessors with optimal or near-optimal fault tolerance properties is developed. A multiprocessor architecture with a static interconnection network is represented by a graph whose nodes are processors and whose edges are interprocessor communication links. The design of k-fault-tolerant (FT) trees for arbitrary k is considered, with the primary goal of minimizing the number of spare nodes and edges. Also presented are strategies for reconfiguring a k-FT supergraph of a tree T around faults to obtain a fault-free tree isomorphic to T. A systematic methodology is presented for designing k-FT nonhomogeneous symmetry d-ary trees based on a concept termed node covering. The designs are shown to be optimal when k >


international conference on vlsi design | 1996

Node-covering based defect and fault tolerance methods for increased yield in FPGAs

Fran Hanchek; Shantanu Dutt

Fault tolerant techniques are proposed which make use of the reconfigurability of SRAM-based field programmable gate arrays (FPGAs). Based on the principle of node-covering, a routing discipline is developed that reserves unused wiring in the routing channels to allow each cell to cover (to be able to replace) its neighbor in a row. If testing identifies a faulty cell, switches are set to reconfigure the faulty cell out of the array. Not only can reconfiguration of the FPGA be performed by the user, but it can also be done at the factory in such a way as to be transparent to a user programming the array. This can result in substantial yield improvement.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2008

Built-in-Self-Test of FPGAs With Provable Diagnosabilities and High Diagnostic Coverage With Application to Online Testing

Shantanu Dutt; Vinay Verma; Vishal Suthar

We present novel and efficient methods for built-in self-test (BIST) of field-programmable gate arrays (FPGAs) for detection and diagnosis of permanent faults in current, as well as emerging, technologies that are expected to have high fault densities. Our basic BIST methods can be used in both online and offline testing scenarios, although we focus on the former in this paper. We present 1- and 2-diagnosable BISTer designs that make up a ROving TEster (ROTE). Due to their provable diagnosabilities, these BISTers can avoid time-intensive adaptive diagnosis without significantly compromising diagnostic coverage-the percentage of faults correctly diagnosed. We also develop functional testing methods that test programmable logic blocks (PLBs) in only two circuit functions that will be mapped to them as the ROTE moves across a functioning FPGA. We extend our basic BISTer designs to those with test-pattern generators (TPGs) using multiple PLBs to more efficiently test the complex PLBs of current commercial FPGAs and to also prove the diagnosabilities of these designs. Simulation results show that our 1-diagnosable functional-test-based BISTer with a three-PLB TPG has very high diagnostic coverages-for example, for a random-fault distribution, our nonadaptive-diagnosis methods provide diagnostic coverages of 96% and 88% at fault densities of 10% and 25%, respectively, whereas the previous best nonadaptive-diagnosis method of the STAR-3 × 2 BISTer has diagnostic coverages of about 75% and 55% at these fault densities.


hypercube concurrent computers and applications | 1988

On allocating subcubes in a hypercube multiprocessor

Shantanu Dutt; John P. Hayes

In hypercube computers that support a multiuser environment, it is important for the operating system to be able to allocate subcubes of different dimensions. Previously proposed subcube allocation schemes, such as the buddy strategy, may fragment the hypercube excessively. We present a precise characterization of the subcube allocation problem and develop a general methodology to solve it. New subcube allocation and coalescing algorithms are described that have the goal of minimizing fragmentation. The concept of a maximal set of subcubes (MSS), which is useful in making allocations that result in a tightly packed hypercube, is introduced. The problems of allocating subcubes and of forming an MSS are formulated as decision problems, and shown to be NP-hard. Optimal algorithms for allocating subcubes and for forming an MSS are given. We suggest a heuristic procedure for efficiently coalescing a released cube with the existing free cubes. Finally, we present simulation results comparing several different allocation and coalescing strategies, which show that our methods provide a marked performance improvement over previous techniques.


IEEE Transactions on Parallel and Distributed Systems | 1997

Scalable global and local hashing strategies for duplicate pruning in parallel A* graph search

Nihar R. Mahapatra; Shantanu Dutt

For many applications of the A* algorithm, the state space is a graph rather than a tree. The implication of this for parallel A* algorithms is that different processors may perform significant duplicated work if interprocessor duplicates are not pruned. In this paper, we consider the problem of duplicate pruning in parallel A* graph-search algorithms implemented on distributed-memory machines. A commonly used method for duplicate pruning uses a hash function to associate with each distinct node of the search space a particular processor to which duplicate nodes arising in different processors are transmitted and thereby pruned. This approach has two major drawbacks. First, load balance is determined solely by the hash function. Second, node transmissions for duplicate pruning are global; this can lead to hot spots and slower message delivery. To overcome these problems, we propose two different duplicate pruning strategies: 1) To achieve good load balance, we decouple the task of duplicate pruning from load balancing, by using a hash function for the former and a load balancing scheme for the latter. 2) A novel search-space partitioning scheme that allocates disjoint parts of the search space to disjoint subcubes in a hypercube (or disjoint processor groups in the target architecture), so that duplicate pruning is achieved with only intrasubcube or adjacent intersubcube communication. Thus message latency and hot-spot probability are greatly reduced. The above duplicate pruning schemes were implemented on an nCUBE2 hypercube multicomputer to solve the Traveling Salesman Problem (TSP). For uniformly distributed intercity costs, our strategies yield a speedup improvement of 13 to 35 percent on 1,024-processors over previous methods that do not prune any duplicates, and 13 to 25 percent over the previous hashing-only scheme. For normally distributed data the corresponding figures are 135 percent and 10 to 155 percent. Finally, we analyze the scalability of our parallel A* algorithms on k-ary n-cube networks in terms of the isoefficiency metric, and show that they have isoefficiency lower and upper bounds of /spl Theta/(P log P) and /spl Theta/(Pkn/sup 2/), respectively.


Journal of Parallel and Distributed Computing | 1994

Scalable load balancing strategies for parallel A* algorithms

Shantanu Dutt; Nihar R. Mahapatra

Abstract In this paper, we develop load balancing strategies for scalable high-performance parallel A* algorithms suitable for distributed-memory machines. In parallel A* search, inefficiencies such as processor starvation and search of nonessential spaces (search spaces not explored by the sequential algorithm) grow with the number of processors P used, thus restricting its scalability. To alleviate this effect, we propose a novel parallel startup phase and an efficient dynamic load balancing strategy called the quality equalizing (QE) strategy. Our new parallel startup scheme executes optimally in Θ(log P) time and, in addition, achieves good initial load balance. The QE strategy prossess certain unique quantitative and qualitative load balancing properties that enable it to significantly reduce starvation and nonessential work. Consequently, we obtain a highly scalable parallel A* algorithm with an almost-linear speedup. The startup and load balancing schemes were employed in parallel A* algorithms to solve the Traveling Salesman Problem on an nCUBE2 hypercube multicomputer. The QE strategy yields average speedup improvements of about 20-185% and 15-120% at low and intermediate work densities (the ratio of the problem size to P), respectively, over three well-known load balancing methods-the round-robin (RR), the random communication (RC), and the neighborhood averaging (NA) strategies. The average speedup observed on 1024 processors is about 985, representing a very high efficiency of 0.96. Finally, we analyze and empirically evaluate the scalability of parallel A* algorithms in terms of the isoefficiency metric. Our analysis gives (1) a Θ(P log P) lower bound on the isoefficiency function of any parallel A* algorithm, and (2) a general expression for the upper bound on the isoefficiency function of our parallel A* algorithm using the QE strategy on any topology-for the hypercube and 2-D mesh architectures the upper bounds on the isoefficiency function are found to be Θ(P log2P) and Θ(P[formula]), respectively. Experimental results validate our analysis, and also show that parallel A* search has better scalability using the QE load balancing strategy than using the RR, RC, or NA strategies.


international conference on computer aided design | 1997

Partitioning around roadblocks: tackling constraints with intermediate relaxations

Shantanu Dutt; Halim Theny

Constraint satisfaction during partitioning and placement of VLSI circuits is an important problem, and effective techniques to address it lead to high-quality physical design solutions. This problem has, however, been cursorily treated in previous partitioning and placement research. Our work presented here addresses the balance-ratio constraint, and is a crucial first step to an effective solution to the general constraint-satisfaction problem. In current iterative-improvement mincut partitioners, the balance-ratio constraint is tackled by disallowing moves that violate it. These methods can lead to sub-optimal solutions since the process is biased against the movement of large cells and clusters of cells. We present techniques for an informed relaxation process that attempts to estimate whether relaxing the constraint temporarily will ultimately benefit the mincut objective. If so, then a violating move is allowed, otherwise it is disallowed. The violations are corrected in future moves so that the final solution satisfies the given constraint. On a set of ACM/SIGDA PROUD benchmark circuits with actual cell sizes, we obtained up to 38% and an average of 14.5% better cutsizes with as little as 13% time overhead using our techniques compared to the standard method of not allowing any relaxation.

Collaboration


Dive into the Shantanu Dutt's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Huan Ren

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hasan Arslan

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar

Vishal Suthar

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Halim Theny

University of Illinois at Chicago

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge