Rajeev Murgai
Fujitsu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rajeev Murgai.
design automation conference | 1990
Rajeev Murgai; Yoshihito Nishizaki; Narendra V. Shenoy; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
The problem of combinational logic synthesis is addressed for two interesting and popular classes of programmable gate array architectures: table-look-up and multiplexor-based. The constraints imposed by some of these architectures require new algorithms for minimization of the number of basic blocks of the target architecture, taking into account the wiring resources.
international conference on computer aided design | 1991
Rajeev Murgai; Narendra V. Shenoy; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
The authors address the problem of synthesis for a popular class of programmable gate array architecture-the table look-up architectures. These use lookup table memories to implement logic functions. The authors present improved techniques for minimizing the number of table look up blocks used to implement a combinational circuit. On average, the results obtained on a set of benchmarks are 15-29% better than results obtained by previous approaches.<<ETX>>
international conference on computer aided design | 1991
Rajeev Murgai; Narendra V. Shenoy; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
The authors address the problem of delay optimization for programmable gate arrays. The main considerations are the number of levels in the circuit and the wiring delay. The authors propose a two-phase approach: the first phase involves delay optimizations during logic synthesis before placement, while the second uses logic resynthesis in the case of a timing-driven placement technique. Results and comparisons on benchmarks are presented.<<ETX>>
international conference on computer aided design | 1991
Rajeev Murgai; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
The authors address the problem of clustering a circuit for minimizing its delay, subject to capacity constraints on the clusters. They present an algorithm for combinational circuits and give sufficient conditions under which it is optimum. In addition, they address the problem of minimizing the number of clusters and nodes without increasing the maximum delay found by the algorithm. Finally, they extend the clustering algorithm to minimize the clock cycle of a sequential synchronous circuit.<<ETX>>
design automation conference | 1992
Rajeev Murgai; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
The authors address the problem of synthesis for a popular class of programmable gate array architectures, the multiplexer-based architectures. They present improved techniques for minimizing the number of basic blocks used to implement a combinational circuit. One source of improvement is the use of if-then-else DAGs (directed acyclic graphs) as subject graphs along with BDDs (binary decision diagrams). An important contribution is a very fast algorithm which always gives a match for a function onto the basic block of the architecture, when one exists. Results obtained on a number of benchmark examples are given.<<ETX>>
international symposium on quality electronic design | 2006
Chao-Yang Yeh; Gustavo Wilke; Hongyu Chen; Subodh M. Reddy; Hoa-van Nguyen; Takashi Miyoshi; William W. Walker; Rajeev Murgai
This paper evaluates and compares different clock architectures such as mesh, tree and their hybrids, on several industrial designs. The goal of our study is to gain a quantitative understanding of engineering trade-offs between different architectures with respect to clock skew, latency, timing uncertainty, and power. This understanding will lead to guidelines for determining the best clock architecture for the design specification and constraints. To the best of our knowledge, no work has been published on evaluating and comparing these architectures on real industrial designs. Our study shows that mesh-based architectures are better than tree architectures for skew (< 1ps skew) and are more robust to variations (18% reduction in timing uncertainty as compared to tree). The power penalty associated with a mesh as compared to a tree was found to be between 10-40%. Use of multiple meshes can help reduce the power penalty
european design and test conference | 1995
Rajeev Murgai; Robert K. Brayton; Alberto L. Sangiovanni-Vincentelli
In this age of portable electronic systems, the problem of logic synthesis for low power has acquired great importance. The most popular approach has been to target the widely-accepted two-phase paradigm of technology-independent optimization and technology mapping for power minimization. Before mapping, each function of a multi-level network is decomposed into two-input gates. How this decomposition is done can have a significant impact on the power dissipation of the final implementation. The problem of decomposition for low power was recently addressed by Pedram et al. (1993). However, they ignore the power consumption due to glitches, which can be a sizeable fraction of the total power. In this paper, we show how to obtain a transition-optimum binary tree decomposition (i.e., the one which has minimum number of transitions in the worst case, including those due to glitches) for some specific functions (AND, OR, and EX-OR) for zero gate delay model. For a non-zero gate delay model, we present conditions under which our algorithm yields an optimum solution for such functions. We propose a straightforward extension of this algorithm for arbitrary functions and Boolean networks. Experimental results on a set of standard combinational benchmarks indicate that on average, our algorithm generates networks (using two-input gates) that have 16% fewer transitions in the worst case than the networks generated by a simple-minded two-input technology-decomposition algorithm implemented in sis, a widely used logic synthesis system.<<ETX>>
international conference on computer aided design | 1999
Rajeev Murgai
Fanout optimization is a fundamental problem in timing optimization. Most of the research has focussed on the fanout optimization problem for a single net (or the local fanout optimization problem-LFO). The real goal, however, is to optimize the delay through the entire circuit by fanout optimization, This is the global fanout optimization (GFO) problem. H. Touati (1990) claims that visiting nets of the network in a reverse topological order (from primary outputs to inputs), applying the optimum LFO algorithm to each net, computing the new required time at the source and propagating the delay changes to the fanins yields a provably optimum solution to the GFO problem. This result implies that GFO is solvable in polynomial time if LFO is. We show that that is not the case. We prove that GFO is NP-complete even if there are a constant number of buffering choices at each net. We analyze Touatis result and point out the flaw in his argument. We then present sufficient conditions for the optimality of the reverse topological algorithm.
design, automation, and test in europe | 2006
Subodh M. Reddy; Gustavo Wilke; Rajeev Murgai
Mesh architectures are used to distribute critical global signals on a chip, such as clock and power/ground. Redundancy created by mesh loops smooths out undesirable variations between signal nodes spatially distributed over the chip. However, one problem with the mesh architectures is the difficulty in accurately analyzing large instances. Furthermore, variations in process and temperature, supply noise and crosstalk noise cause uncertainty in the delay from clock source to flip-flops. In this paper, we study the problem of analyzing timing uncertainty in mesh-based clock architectures. We propose solutions for both pure mesh and (mesh + global-tree) architectures. The solutions can handle large design and mesh instances. The maximum error in uncertainty values reported by our solutions is 1-3ps with respect to the golden Monte Carlo simulations, which is at most 0.5% of the nominal clock latency of about 600ps
international conference on computer design | 1995
Rajeev Murgai; Masahiro Fujita; Fumiyasu Hirose
Logic synthesis for look-up tables (LUTs) has received much attention in the past few years, since Xilinx introduced its LUT-based field-programmable gate array (FPGA) architectures. An m-input LUT can implement any Boolean function of up to m inputs. So the goal of synthesis for such architectures has been to synthesize a circuit in which each function can be implemented by one m-LUT such that either the total number of functions or the number of levels of the circuit is minimized. In this work, we focus on a different though related problem: synthesize the given circuit on a single memory or LUT L, which has a capacity of M bits. In addition to satisfying the memory constraint M, we also wish to minimize the total number of functions to be implemented. The main motivation for the problem comes from the problem of minimizing the simulation time on a hardware accelerator for logic simulation. This accelerator uses memory as a logic primitive. In fact, the problem is also relevant in the context of compile-code or software simulation. Another situation where the problem arises is in synthesis for the FPGA architectures being proposed that have on-chip memory for storing programs and data. The unused memory locations can be used to store logic functions. We show that the existing LUT synthesis methods are inadequate to solve this problem. We propose techniques to solve the problem and present experimental evidence to demonstrate their effectiveness.