Ruibing Lu
Purdue University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ruibing Lu.
design, automation, and test in europe | 2002
Ruibing Lu; Guoan Zhong; Cheng-Kok Koh; Kai-Yuan Chao
We present a unified framework that considers flipflop and repeater insertion and the placement of flip-flop/repeater blocks during RT or higher level design. We introduce the concept of independent feasible regions in which flip-flops and repeaters can be inserted in an interconnect to satisfy both delay and cycle time constraints. Experimental results show that, with flip-flop insertion, we greatly increase the ability of interconnects to meet timing constraints. Our results also show that it is necessary to perform interconnect optimization at early design steps as the optimization will have even greater impact on the chip layout as feature size continually scales down.
IEEE Transactions on Very Large Scale Integration Systems | 2007
Ruibing Lu; Aiqun Cao; Cheng-Kok Koh
A high performance communication architecture, SAMBA-bus, is proposed in this paper. In SAMBA-bus architecture, multiple compatible bus transactions can be performed simultaneously with only a single bus access grant from the bus arbiter. Experimental results show that, compared with a traditional bus architecture, the SAMBA-bus architecture can have up to 3.5 times improvement in the effective bandwidth, and up to 15 times reduction in the average communication latency. In addition, the performance of SAMBA-bus architecture is affected only slightly by arbitration latency, because bus transactions can be performed without waiting for the bus access grant from the arbiter. This feature is desirable in SoC designs with large numbers of modules and long communication delay between modules and the bus arbiter
international conference on computer aided design | 2003
Ruibing Lu; Cheng-Kok Koh
This paper proposes for latency insensitive systems a performanceoptimization technique called channel buffer queue sizing, whichis performed after relay station insertion in the physical designstage. It can be shown that proper queue sizing can reduce or evencompletely avoid the performance loss due to imbalanced relaystations insertion in reconvergent paths. Moreover, the problemof queue sizing and placement of the additional buffers for maximumperformance is formulated and studied to properly allocateavailable chip areas in the layout to communication channels. Analgorithm based on mixed integer linear programming is proposed.Experimental results show that queue sizing is effective in improvingthe performance of latency insensitive systems even under tightarea constraints. Moreover, the proposed algorithm is sufficientlyefficient in obtaining the optimal solution for systems of practicalsizes.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006
Ruibing Lu; Cheng-Kok Koh
This paper formally models and studies latency-insensitive systems (LISs) through max-plus algebra. We introduce state traces to model behaviors of LISs and obtain a formally proved performance upper bound achievable by latency-insensitive design. An implementation of the latency-insensitive protocol that can provide robust communication through back-pressure is also proposed. The intrinsic performance of the proposed implementation is acquired based on state traces. It is also proved that the proposed implementation can always reach the best performance achievable by latency-insensitive design.
asia and south pacific design automation conference | 2004
Ruibing Lu; Cheng-Kok Koh
A split shared-bus architecture with multiple simultaneous bus accesses is proposed. Compared to traditional bus architectures, the performance of proposed architecture is higher because of the ability to deliver multiple bus transactions in one bus cycle. We also propose an implementation of the arbiter, which not only detects and grants multiple compatible bus transactions, but also controls splitters properly to establish the communication paths for those transactions. Experimental results show that the bus architecture can have up to 2.3 times improvement in the effective bandwidth and up to 5 times reduction in the communication latency. Moreover, the arbiter implementation has reasonable area and timing cost, making it suitable for high performance SoC applications.
asia and south pacific design automation conference | 2005
Ruibing Lu; Aiqun Cao; Cheng-Kok Koh
SAMBA bus (Lu and Koh, 2003) is a high performance bus architecture that can deliver multiple transactions in one bus cycle under single-winner bus arbitration. The bus architecture displays several advantages such as, high bandwidth, low latency, and low performance penalty from arbitration delay, all of which make it more scalable than traditional buses. However, its scalability may be limited by the bus access logic delay. As a module is connected to the bus through its interface unit, which is connected in series on the bus, the bus logic delay increases linearly as the bus size increases. In this paper, we propose to increase the scalability of SAMBA buses through two methods: control signal lookahead and module clustering. The control signal lookahead technique can determine the bus access control signal in advance, thereby reducing the effective delay of each interface unit. Module clustering, on the other hand, can reduce the number of interface units attached to a bus. Experimental results show that combining these two methods can effectively reduce the bus logic delay, and thus increase the scalability of SAMBA buses.
asia and south pacific design automation conference | 2005
Aiqun Cao; Ruibing Lu; Cheng-Kok Koh
Logic duplication to resolve the logic reconvergent paths problem encountered in Domino logic synthesis is expensive in terms of area and power. In this paper, we propose a combined logic duplication minimization and technology mapping scheme for Domino circuits with complex gates. The logic duplication is performed as a post-layout step as the duplication cost is minimized based on accurate timing information. Experimental results show significant improvements in area, power, and delay.
ACM Transactions on Design Automation of Electronic Systems | 2006
Aiqun Cao; Ruibing Lu; Chen Li; Cheng-Kok Koh
Logic duplication, a commonly used synthesis technique to remove trapped inverters in reconvergent paths of Domino circuits, incurs high area and power penalties. In this article, we propose a synthesis scheme to reduce the duplication cost by allowing inverters in Domino logic under certain timing constraints for both simple and complex gates. Moreover, we can include the logic duplication minimization during technology mapping for synthesis of Domino circuits with complex gates. In order to guarantee the robustness of such Domino circuits, we perform the logic optimization as a postlayout step. Experimental results show significant reduction in duplication cost, which translates into significant improvements in area and power. As a byproduct, the timing performance is also improved owing to smaller layout area and/or logic depth.
design, automation, and test in europe | 2003
Ruibing Lu; Cheng-Kok Koh
We present a framework that considers global routing, repeater insertion, and flip-flop relocation for early interconnect planning. We formulate the interconnect retiming and flip-flop placement problem as a local area constrained retiming problem and solve it as a series of weighted minimum area retiming problems. Our method for early interconnect planning can reduce and even avoid design iterations between physical planning and high level designs. Experimental results show that our method can reduce the number of area violations by an average of 84% in a single interconnect planning step.
design, automation, and test in europe | 2003
Ruibing Lu; Cheng-Kok Koh