Jagesh V. Sanghavi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jagesh V. Sanghavi is active.

Explore More

Publication

Featured researches published by Jagesh V. Sanghavi.

IEEE Transactions on Very Large Scale Integration Systems | 1993

ESPRESSO-SIGNATURE: a new exact minimizer for logic functions

Patrick C. McGeer; Jagesh V. Sanghavi; Robert K. Brayton; A.L. Sangiovanni-Vicentelli

An algorithm for exact two-level logic optimization that radically improves the Quine-McCluskey (QM) procedure is presented. The new algorithm derives the covering problem directly and implicitly without generating the set of all prime implicants. It then generates only those prime implicants involved in the covering problem. A set of primes is represented by the cube of their intersection. Therefore, the unique set of sets of primes that forms the covering problem can be implicitly represented by a set of cubes that forms a minimum canonical cover. The minimum canonical cover starting from any initial cover is obtained and then the covering problem is derived. The method is effective; it improves on the runtime and memory usage of ESPRESSO-EXACT by average factors of 1.78 and 1.19, respectively, on the 114 of 134 benchmark examples that could be completed by ESPRESSO-EXACT. Of the remaining 20 hard problems, 14 are solved exactly. For three of the remaining six, the covering problem is derived but it can not be solved exactly. >

design automation conference | 1996

High performance BDD package by exploiting memory hierarchy

Jagesh V. Sanghavi; Rajeev K. Ranjan; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli

The success of binary decision diagram (BDD) based algorithms for verification depend on the availability of a high performance package to manipulate very large BDDs. State-of-the-art BDD packages, based on the conventional depth-first technique, limit the size of the BDDs due to a disorderly memory access patterns that results in unacceptably high elapsed time when the BDD size exceeds the main memory capacity. We present a high performance BDD package that enables manipulation of very large BDDs by using an iterative breadth-first technique directed towards localizing the memory accesses to exploit the memory system hierarchy. The new memory-oriented performance features of this package are: 1) an architecture independent customized memory management scheme, 2) the ability to issue multiple independent BDD operations (superscalarity), and 3) the ability to perform multiple BDD operations even when the operands of some BDD operations are the result of some other operations yet to be completed (pipelining). A comprehensive set of BDD manipulation algorithms are implemented using the above techniques. Unlike the breadth-first algorithms presented in the literature, the new package is faster than the state-of-the-art BDD package by a factor of up to 15, even for the BDD sizes that fit within the main memory. For BDD sizes that do not fit within the main memory, a performance improvement of up to a factor of 100 can be achieved.

design automation conference | 1993

ESPRESSO-SIGNATURE: A New Exact Minimizer for Logic Functions

Patrick C. McGeer; Jagesh V. Sanghavi; Robert K. Brayton; Alberto Sangiovanni Vincentelli

We present a new algorithm for exact two-level logic optimization which radically improves the Quine-McCluskey (QM) procedure. The new algorithm derives the covering problem directly and implicitly without generating the set of all prime implicants. It then generates only those prime implicants involved in the covering problem. We represent a set of primes by the cube of their intersection. Therefore, the unique set of sets of primes which forms the covering problem can be implicitly represented by a set of cubes which forms a minimum canonical cover. We obtain the minimum canonical cover starting from any initial cover and then derive the covering problem. The method is effective; it improves on the runtime and memory usage of ESPRESSO-EXACT by average factors of 1.78 and 1.19 respectively on the 114 of 134 benchmark examples that could be completed by ESPRESSO-EXACT. Of the remaining 20 hard problems, we solve 14 exactly. For 3 of the remaining 6 the covering problem is derived but it could not be solved exactly.

international conference on computer design | 1996

Binary decision diagrams on network of workstations

Rajeev K. Ranjan; Jagesh V. Sanghavi; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli

The success of all binary decision diagram (BDD) based synthesis and verification algorithms depend on the ability to efficiently manipulate very large BDDs. We present algorithms for manipulation of very large Binary Decision Diagrams (BDDs) on a network of workstations (NOW). A NOW provides a collection of main memories and disks which can be used effectively to create and manipulate very large BDDs. To make efficient use of memory resources of a Now, while completing execution in a reasonable amount of wall clock time, extension of breadth-first technique is used to manipulate BDDs. BDDs are partitioned such that nodes for a set of consecutive variables are assigned to the same workstation. We present experimental results to demonstrate the capability of such an approach and point towards the potential impact for manipulating very large BDDs.

Archive | 1993

A New Exact Minimizer for Two-Level Logic Synthesis

Robert K. Brayton; Patrick C. McGeer; Jagesh V. Sanghavi; Alberto L. Sangiovanni-Vincentelli

We present a new algorithm for exact two-level logic optimization. It differs from the classical approach; rather than generating the set of all prime implicants of a function, and then deriving a covering problem, we derive the covering problem directly and implicitly, and then generate only those primes involved in the covering problem. We represent a set of primes by the cube of their intersection. We then derive some properties of the sets of primes which form this set covering problem. We prove that the set of sets of primes which forms the covering problem for an incompletely-specified logic function.F. is unique. Hence the corresponding set of cubes forms a minimum canonical cover for F. We give a successive reduction algorithm for finding the minimum canonical cover from any initial cover. Using the minimum canonical cover, we then generate only those primes involved in at least one minimal cover of F. We discuss two related heuristic minimization procedures; a relaxed form of the exact procedure, and then an improved form of the ESPRESSO-II procedure. We give experimental results for the exact minimizer. The method is effective; solutions for 10 of the 20 hard examples in the ESPRESSO benchmark set are derived and proved minimum. In addition, for 5 of the remaining examples the minimum canonical cover is derived, but the covering problem remains to be solved exactly.

conference on high performance computing (supercomputing) | 1994

A parallel iterative linear solver for solving irregular grid semiconductor device matrices

Eric Tomacruz; Jagesh V. Sanghavi; Alberto L. Sangiovanni-Vincentelli

Presents the use of parallel processors for the solution of drift-diffusion semiconductor device equations using an irregular grid discretization. Preconditioning, partitioning and communication scheduling algorithms are developed to implement an efficient and robust iterative linear solver with preconditioning. The parallel program is executed on a 64-node CM-5 and is compared with PILS (a solver for ill-conditioned systems) running on a single processor. We observe an efficiency increase in obtaining parallel speed-ups as the problem size increases. We obtain 60% efficiency for CGS (a fast Lanczos-type solver for nonsymmetric linear systems) with no preconditioning for large problems. Using CGS with processor ILU preconditioning and magnitude threshold-fill-in preconditioning for the CM-5, and CGS with ILU for PILS, we attain 50% efficiency for the solution of large matrices.<<ETX>>

[Proceedings] 1993 International Workshop on VLSI Process and Device Modeling (1993 VPAD) | 1993

Algorithms For Drift-diff-usion Device Simulation Using Massively Parallel Processors

E. Tomacruz; Jagesh V. Sanghavi; Alberto L. Sangiovanni-Vincentelli

E. Tomacruz, J. Sanghavi, A. Sangiovanni-Vincentelli Department of Electrical Engineering & Computer Sciences, University of California, Berkeley 94720 Massively parallel processor (MPP) drift-diffusion device simulators have been presented in [ 1,2]. In both cases, most of the CPU time was spent in solving linear systems of equations (up to 95% reported in [ 11). In this paper, we present methods for decreasing the total time for the iterative linear system solver by mo&fying the preconditioner for the iterative solver and by improving the initial guess for the Newton loop. In addition, we offer some general considerations regarding the parameters to use for the selection of a MPP architecture for device simulation. The steady-state drift-diffusion equations and discretization of [l] are used in this study. Our contributions for improving the preconditioning algorithm for a full Newton outer loop with a conjugate gradient squared inner loop are: 1. Extension of the partitioned natural ordering of [ 11 to the MIMD MPP CM-5 architecture. This ordering compared favorably with other well known techniques such as the red-black ordering and the natural ordering when implemented on a SIMD CM-2. The CM-5 mesh structure is divided into rectangular blocks each called a subdomain. Each subdomain is mapped to a processor and the dimensions of the subdomain are powers of 2. The dimensions in each axis are equal or almost equal in order to form cubic subdomains. This minimizes the total surface area which in turn minimizes the data length of communications between processors. A simple row ordering is used to map the subdomains to the CM-5 processors since the fat tree connections allow minimal penalty for communications between arbitrary processors [3]. 2. Evaluation of a three color ordering scheme and of nested dissection. These evaluations showed that while the three color scheme improved the quality of the preconditioner, the CPU time required did not make it competitive overall with the simpler red-black ordering. The nested dissection ordering, where the basic blocks of decomposition are ordered according to a red-black scheme, was also shown not to be effective. It can be deduced that better results are obtained when the discarded norm of the fill-ins is minimized instead of their number. 3. Introduction of a new ordering scheme tailored for the CM-5: the block partitioned natural ordering. To allow each processor of the CM-5 to execute in parallel, each subdomain mapped into each processor should be disconnected from other subdomains while doing forward and backward substitution. Using the idea of not having the same cut points for the forward and backward substitution proposed in [ 11, a new preconditioner called the block partitioned natural ordering preconditioner is proposed. This preconditioner still cuts the links at the boundary of subdomains for forward substitution. However, the cut planes for the backward substitution is moved by an offset of one which is illustrated by Figure 1. Natural ordering backward substitution is done consecutively from set 1 to set 4. Set 2 is composed of two planes of nodes, set 3 is composed of three lines of nodes, and set 4 has one node. Doing simple subdomain partitioning for backward substitution would have disconnected the set 4 node from its three neighbor subdomains by processing it first. This partitioning gave poor results. The offset of one allows information to travel from one subdomain to another during the preconditioning of the linear matrix. 4. Inclusion of fill-ins within each CM-5 subdomain. The architecture and larger memory of the CM-5 make it possible to accommodate several levels of fill-ins within each subdomain while doing incomplete LU decomposition, forward substitution, and backward substitution. Allowing fill-ins only in the incomplete LU decomposition did not improve the linear solver. A significant reduction in the total number of inner loop iterations is observed when fill-ins are also allowed in the forward and backward substitution process. Results for a bipolar transistor described in [1] with Vb,=0.8 and V,,=1.0 are shown in Figures 3 and 4. PNO, NO, and BPNO signify partitioned natural ordering, natural ordering, and block partitioned natural ordering respectively. The number attached to BPNO indicates the fill-in levels allowed. 64 processors with no vector units are used for CM-5 simulations and 8k processors with floating point accelerators are used for CM-2 simulations. Fill-ins decreased the total inner loop iterations but not the CPU time. BPNOO is two times faster than PNO-CM5 for the 32k mesh and produces the lowest CPU time for the CM5. It is also more robust since PNO does not converge for the 64k mesh. PNO still gives the best performance for the CM-2. The Newton algorithm is known to perform best when a good initial estimate of the solution is given. The best initial guess is usually obtained by a projection from two previous solutions whose bias conditions differ only at one contact to a new applied bias at that contact. The initial guess for the first two solutions may be obtained by the initialization described in [ 11. This can be accelerated by a multigrid initial guess which does not require any specific knowledge of the device and the region of operation upon which it is simulated. It should be noted that voltage sweeps do not necessarily need to start with zero bias. Measuring threshold voltage or device breakdown for example only involves the simulation of a certain segment of the IV curves. Hence, the first two bias points may significantly affect the total CPU time of voltage segment simulations. The scheme is based on two coarse grid mesh structures intertwined as shown in Figure 2. These coarse grids are constructed from 4 sets of discretization nodes. Coarse grid 1 is defined as the union of sets 1 and 2, and coarse grid 2 is defined as the union of sets 3 and 4. Set 1 is defined to be nodes with even coordinate values in all three grid axes, and set 3 is defined to be nodes with odd coordinate values in all three grid axes. Set 2 is defined to be the nodes connecting the nodes of set 1, and set 4 is the nodes connecting the nodes of set 3. If the solution of the equations on one of the coarse grids previously defined is carried out while the other coarse grid mesh structure is used as a boundary condition, the nodes in sets 2 and 4 will have only two active neighbors, thus making possible the static elimination of the variables associated with the nodes themselves. This ultimately allows the use of a smaller grid mesh structure composed solely of set 1 or 3. In addition, the elimination of set 2 or 4 decreases the number of variables which decreases the search space of the linear solver and hence, reduces the number of l inea solver iterations.

symposium on frontiers of massively parallel computation | 1995

A parallel graph partitioner on a distributed memory multiprocessor

Premal Buch; Jagesh V. Sanghavi; Alberto L. Sangiovanni-Vincentelli

In order to realize the full potential of speed-up by parallelization, it is essential to partition a problem into small tasks with minimal interactions without making this process itself a bottleneck. We present a method for graph partitioning that is suitable for parallel implementation and scales well with the number of processors and the problem size. Our algorithm uses hierarchical partitioning. It exploits the parallel resources to minimize the dependence on the starting point with multiple starts at the higher levels of the hierarchy. These decrease at the lower levels as it zeroes in on the final partitioning. This is followed by a last-gasp phase that randomly collapses partitions and repartitions to further improve the quality of the fmat solution. Each individual 2-way partitioning step can be performed by any standard partitioning algorithm. Results are presented on a set of benchmarks representing connectivity graphs of device and circuit simulation problems.<<ETX>>

international conference on vlsi design | 1993

Minimization of Logic Functions using essential Signature Sets

Patrick C. McGeer; Jagesh V. Sanghavi; Robert K. Brayton; Alberto Sangiovanni Vincentelli

Proceedings of International Workshop on Numerical Modeling of processes and Devices for Integrated Circuits: NUPAD V | 1994

Massively parallel device simulation using irregular grids

Jagesh V. Sanghavi; Eric Tomacruz; Alberto L. Sangiovanni-Vincentelli

Partitioning, communication scheduling, and preconditioning algorithms are developed to implement a parallel linear solver for an irregular grid drift-diffusion device simulator. The parallel program is executed on a 64 node CM-5 and is compared with PILS running on a single processor. We obtain an average CPU time speed-up of 46.1X for each CGS iteration with no preconditioning, and a speed-up of 33.6X for the solution of the matrix.<<ETX>>

Explore More