Shmuel Wimer
Bar-Ilan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shmuel Wimer.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1987
Shmuel Wimer; Ron Y. Pinter; Jack A. Feldman
We describe an algorithm that maps a CMOS circuit diagram into an area-efficient, high-performance layout in the style of a transistor chain. It is superior to other published algorithms of this kind in terms of the class of input circuits it accepts, its efficiency, and the quality of the results it produces. This algorithm is intended for the automatic generation of basic cells in a custom or semicustom design environment, thereby removing the burden of arduous mask definition from the designer. We show how our method was used to compose cells in a row into a functional slice (e.g. an adder) that can be used in, say, a data path.
IEEE Transactions on Circuits and Systems | 1988
Shmuel Wimer; Israel Koren; Israel Cederbaum
The topics discussed are minimization of the area occupied by a layout and related results concerning networks flow and rectilinear representation of planar graphs, based on a graph model of floorplans and layouts. Arbitrary floorplans are allowed. Given an arbitrary floorplan and the areas of the embedded building blocks, the existence and uniqueness of a zero wasted area layout are proved, and characterized by a necessary and sufficient condition. Based on this condition, a scheme is described to generate zero-wasted-area layouts. Given a family of dual network pairs for which the product of dual arc lengths are invariant, it is proved that the minimal product of their longest paths is not smaller than the maximal product of their shortest paths. It is also shown that the maximal product of the flows in such a family of dual network pairs is given by the total sum of the arc length product of each individual pair of dual arcs. An efficient procedure to derive a rectilinear representation for any planar graph is presented. >
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1989
Reuven Bar-Yehuda; Jack A. Feldman; Ron Y. Pinter; Shmuel Wimer
An algorithmic framework is presented for mapping CMOS circuit diagrams into area-efficient, high-performance layouts in the style of one-dimensional transistor arrays. Using efficient search techniques and accurate evaluation methods, the huge solution space that is typical to such problems is transversed extremely fast, yielding designs of hand-layout quality. In addition to generating circuits that meet prespecified layout constraints in the context of a fixed target image, on-the-fly optimizations are performed to meet secondary optimization criteria. A practical dynamic programming routing algorithm is utilized to accommodate the special conditions that arise in this context. This algorithm has been implemented and is currently used at IBM for cell-library generation. >
IEEE Transactions on Very Large Scale Integration Systems | 2012
Shmuel Wimer; Israel Koren
Gating of the clock signal in VLSI chips is nowadays a mainstream design methodology for reducing switching power consumption. In this paper we develop a probabilistic model of the clock gating network that allows us to quantify the expected power savings and the implied overhead. Expressions for the power savings in a gated clock tree are presented and the optimal gater fan-out is derived, based on flip-flops toggling probabilities and process technology parameters. The resulting clock gating methodology achieves 10% savings of the total clock tree switching power. The timing implications of the proposed gating scheme are discussed. The grouping of FFs for a joint clocked gating is also discussed. The analysis and the results match the experimental data obtained for a 3-D graphics processor and a 16-bit microcontroller, both designed at 65-nanometer technology.
design automation conference | 1988
Shmuel Wimer; Israel Koren; Israel Cederbaum
A discussion is presented of the problem of selecting an optimal implementation for each building block so that the area of the final layout is minimised. A polynomial algorithm that solves this problem for slicing floorplans was presented elsewhere, and it has been proved that for general (nonslicing) floorplans the problem is NP-complete. The authors suggest a branch-and-bound algorithm which proves to be very efficient and can handle successfully large general nonslicing floorplans. They show also how the nonslicing and slicing algorithms can be combined to handle very large general floorplans efficiently.<<ETX>>
IEEE Transactions on Very Large Scale Integration Systems | 2014
Shmuel Wimer; Israel Koren
Clock gating is a predominant technique used for power saving. It is observed that the commonly used synthesis-based gating still leaves a large amount of redundant clock pulses. Data-driven gating aims to disable these. To reduce the hardware overhead involved, flip-flops (FFs) are grouped so that they share a common clock enabling signal. The question of what is the group size maximizing the power savings is answered in a previous paper. Here we answer the question of which FFs should be placed in a group to maximize the power reduction. We propose a practical solution based on the toggling activity correlations of FFs and their physical position proximity constraints in the layout. Our data-driven clock gating is integrated into an Electronic Design Automation (EDA) commercial backend design flow, achieving total power reduction of 15%-20% for various types of large-scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technologies. These savings are achieved on top of the sClock gating is a predominant technique used for power saving. It is observed that the commonly used synthesis-based gating still leaves a large amount of redundant clock pulses. Data-driven gating aims to disable these. To reduce the hardware overhead involved, flip-flops (FFs) are grouped so that they share a common clock enabling signal. The question of what is the group size maximizing the power savings is answered in a previous paper. Here we answer the question of which FFs should be placed in a group to maximize the power reduction. We propose a practical solution based on the toggling activity correlations of FFs and their physical position proximity constraints in the layout. Our data-driven clock gating is integrated into an Electronic Design Automation (EDA) commercial backend design flow, achieving total power reduction of 15%-20% for various types of large-scale state-of-the-art industrial and academic designs in 40 and 65 manometer process technologies. These savings are achieved on top of the savings obtained by clock gating synthesis performed by commercial EDA tools, and gating manually inserted into the register transfer level design.avings obtained by clock gating synthesis performed by commercial EDA tools, and gating manually inserted into the register transfer level design.
digital systems design | 2011
Ran Manevich; Israel Cidon; Avinoam Kolodny; Isask’har Walter; Shmuel Wimer
As the number of applications and programmable units in CMPs and MPSoCs increases, the Network-on-Chip (NoC) encounters unpredictable, heterogeneous and time dependent traffic loads. This motivates the introduction of adaptive routing mechanisms that balance the NoCs loads and achieve higher throughput compared with traditional oblivious routing schemes. An effective adaptive routing scheme should be based on a global view of the network state. However, most current adaptive routing schemes, following off-chip networks, are based on distributed reactions to local congestion. In this paper we leverage the unique on-chip capabilities and introduce a novel paradigm of NoC centralized adaptive routing. Our scheme continuously monitors the global traffic load in the network and modifies the routing of packets to improve load balancing accordingly. We present a specific design for the case of mesh topology, where XY or YX routes are adaptively selected for each source-destination pair. We show that while our implementation is lightweight and scalable in hardware costs, it outperforms oblivious and distributed adaptive routing schemes in terms of load balancing and average packet delay.
Discrete Optimization | 2011
Eranda Çela; Nina S. Schmuck; Shmuel Wimer; Gerhard J. Woeginger
We investigate a special case of the maximum quadratic assignment problem where one matrix is a product matrix and the other matrix is the distance matrix of a one-dimensional point set. We show that this special case, which we call the Wiener maximum quadratic assignment problem, is NP-hard in the ordinary sense and solvable in pseudo-polynomial time. Our approach also yields a polynomial time solution for the following problem from chemical graph theory: find a tree that maximizes the Wiener index among all trees with a prescribed degree sequence. This settles an open problem from the literature.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 1988
Shmuel Wimer; Israel Koren
The problem of general block placement in VLSI is considered, using the constructive approach in which blocks are selected and located one at a time. Some well-known strategies are presented for the selection of the next block to be located, novel ones are proposed, and a methodology to evaluate them is established. It is then shown that the optimization problem arising in constructive placement can be reduced to several much simpler sub problems. Objective functions for locating the selected block to achieve a good layout are presented for three different metrics: the squared Euclidean, rectilinear, and Euclidean. Appropriate optimization problems are obtained and solved analytically, using efficient computation schemes. These solutions have been implemented and are used in a real VLSI chip design environment. It is shown that the squared Euclidean and the rectilinear metrics are preferable to the Euclidean one. >
Operations Research Letters | 2013
Shmuel Wimer
Abstract Data-driven clock gating is reducing the total power consumption of VLSI chips by 20%. There, flip-flops are grouped and share a common clock signal. Finding the optimal clusters is the key for maximizing the power savings. Clustering by the minimal cost perfect graph matching algorithm (MCPM) proposed by other works is not optimal. We show that the optimal clustering problem is NP-hard, and study the quality of MCPM heuristics, showing by experiments that it falls 5% above the optimal solution.