Kenneth S. Stevens | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kenneth S. Stevens is active.

Explore More

Publication

Featured researches published by Kenneth S. Stevens.

hawaii international conference on system sciences | 1993

The Post Office experience: Designing a large asynchronous chip

Al Davis; B. Coates; Kenneth S. Stevens

The Post Office is an asynchronous, 300000 transistor, full-custom CMOS chip designed as the communication component for the Mayfly scalable parallel processor. Performance requirements led to the development of a design style which permits the design of sequential circuits operating under a restricted form of multiple input change signaling called burst-mode. The Post Office complexity forced the authors to develop a set of design tools capable of correctly synthesizing transistor circuits from state machine and equation specifications, and capable of verifying the correctness of the resultant circuitry using implementation specific timing assumptions. A case study of this design experience is provided. >

IEEE Journal of Solid-state Circuits | 2001

An asynchronous instruction length decoder

Kenneth S. Stevens; Shai Rotem; Ran Ginosar; Peter A. Beerel; Chris J. Myers; Kenneth Y. Yun; R. Koi; Charles E. Dike; Marly Roncken

This paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium/sup (R)/ Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II/sup (R)/ 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25 /spl mu/m CMOS process and tested successfully. Results show significant advantages - in particular, performance of 2.5-4.5 instructions per nanosecond - with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400 MHz clocked circuit fabricated on the same process.

IEEE Transactions on Very Large Scale Integration Systems | 2003

Relative timing [asynchronous design]

Kenneth S. Stevens; Ran Ginosar; Shai Rotem

Relative timing (RT) is introduced as a method for asynchronous design. Timing requirements of a circuit are made explicit using relative timing. Timing can be directly added, removed, and optimized using this style. RT synthesis and verification are demonstrated on three example circuits, facilitating transformations from speed-independent circuits to burst-mode and pulse-mode circuits. Relative timing enables improved performance, area, power, and functional testability of up to a factor of 3/spl times/ in all three cases. This method is the foundation of optimized timed circuit designs used in an industrial test chip, and may be formalized and automated.

symposium on asynchronous circuits and systems | 2002

Relative timing based verification of timed circuits and systems

Hoshik Kim; Peter A. Beerel; Kenneth S. Stevens

Aggressive timed circuits, including synchronous and asynchronous self-resetting circuits, are particularly challenging to design and verify due to complicated timing constraints that must hold to ensure correct operation. Identifying a small, sufficient, and easily verifiable set of relative timing constraints simplifies both design and verification. However, the manual identification of these constraints is a complex and error-prone process. This paper presents the first systematic algorithm to generate and optimize relative timing constraints sufficient to guarantee correctness. The algorithm has been implemented in our RTCG tool and has been applied to several real-life circuits. In all cases, the tool successfully generates a sufficient set of easily verifiable relative timing constraints. Moreover the generated constraint sets are the same size or smaller than that of the hand-optimized constraints.

international conference on computer design | 2009

Automatic synthesis of computation interference constraints for relative timing verification

Yang Xu; Kenneth S. Stevens

Asynchronous sequential circuit or protocol design requires formal verification to ensure correct behavior under all operating conditions. However, most asynchronous circuits or protocols cannot be proven conformant to a specification without adding timing assumptions. Relative Timing (RT) is an approach to model and verify circuits and protocols that require timing assumptions to operate correctly. The process of creating path-based RT constraints has previously been done by hand with the aid of a formal verification engine. This time consuming and error prone method vastly restricts the application of RT and the capability to implement circuits and protocols. This paper describes an algorithm for automatic generation of RT constraints based on signal traces generated from a formal verification (FV) engine that supports relative timing constraints. This algorithm has been implemented in a CAD tool called Automatic Relative Timing Identifier based on Signal Traces (ARTIST) which has been embedded into the FV engine. A set of asynchronous and clocked designs and protocols have been verified and proven to be hazard-free with the RT constraints generated by ARTIST which would have taken months to perform by hand. A comparison of RT constraints between hand-generated and ARTIST generated constraints is also described in terms of efficiency and quality.

Electronic Notes in Theoretical Computer Science | 2008

Performance Evaluation of Elastic GALS Interfaces and Network Fabric

Junbok You; Yang Xu; Hosuk Han; Kenneth S. Stevens

This paper reports on the design of a test chip built to test a) a new latency insensitive network fabric protocol and circuits, b) a new synchronizer design, and c) how efficiently one can synchronize into a clocked domain when elastic interfaces are utilized. Simulations show that the latency insensitive network allows excellent characterization of network performance in terms of the cost of routing, amount of blocking due to congestion, and message buffering. The network routers show that peak performance near 100% link utilization is achieved under congestion and combining. This enables accurate high-level modeling of the behavior of the network fabric so that optimized network design, including placement and routing, can occur through high-level network synthesis tools. The chip also shows that when elastic interfaces are used at the boundary of clock synchronization points then efficient domain crossings can occur. Buffering at the synchronization points are required to allow for variability in clocking frequencies and correct data transmission. The asynchronous buffering and synchronization scheme is shown to perform over four times faster than the clocked interface.

IEEE Transactions on Very Large Scale Integration Systems | 2011

Energy and Performance Models for Synchronous and Asynchronous Communication

Kenneth S. Stevens; Pankaj Golani; Peter A. Beerel

Communication costs, which have the potential to throttle design performance as scaling continues, are mathematically modeled and compared for various pipeline methodologies. First-order models are created for common pipeline protocols, including clocked flopped, clocked time-borrowing latch, asynchronous two-phase, four-phase, delay-insensitive, single-track, and source synchronous. The models are parameterized for throughput, energy, and bandwidth. The models share common parameters for different pipeline protocols and implementations to enable a fair apple-to-apple comparison. The accuracy of the models are demonstrated for complete implementations of a subset of the protocols by applying 65-nm process simulated parameter values against the SPICE simulation of full pipeline implementations. One can determine when asynchronous communication is superior at the physical level to synchronous communication in terms of energy for a given bandwidth by applying actual or expected values of the parameters to various design targets. Comparisons between protocols at fixed targets also allow designers to understand tradeoffs between implementations that have a varying process, timing, and design requirements.

symposium on asynchronous circuits and systems | 2003

Energy and performance models for clocked and asynchronous communication

Kenneth S. Stevens

Parameterized first-order models for throughput, energy, and bandwidth are presented in this paper. Models are developed for many common pipeline methodologies, including clocked flopped, clocked time-borrowing latch protocols, asynchronous two-cycle, four-cycle, delay-insensitive, and source synchronous. The paper focuses on communication costs which have the potential to throttle design performance as scaling continues. The models can also be applied to logic. The equations share common parameters to allow apples-to-apples comparisons against different design targets and pipeline methodologies. By applying the parameters to various design targets, one can determine when unclocked communication is superior at the physical level to clocked communication in terms of energy for a given bandwidth. Comparisons between protocols at fixed targets also allow designers to understand tradeoffs between implementations that have a varying degree of timing assumptions and design requirements.

international symposium on advanced research in asynchronous circuits and systems | 1998

Average-case optimized technology mapping of one-hot domino circuits

Wei-Chun Chou; Peter A. Beerel; Ran Ginosar; Rakefet Kol; Chris J. Myers; Shai Rotem; Kenneth S. Stevens; Kenneth Y. Yun

This paper presents a technology mapping technique for optimizing the average-case delay of asynchronous combinational circuits implemented using domino logic and one-hot encoded outputs. The technique minimizes the critical path for common input patterns at the possible expense of making less common critical paths longer. To demonstrate the application of this technique, we present a case study of a combinational length decoding block, an integral component of an Asynchronous Instruction Length Decoder (AILD) which can be used in Pentium(R) processors. The experimental results demonstrate that the average-case delay of our mapped circuits can be dramatically lower than the worst-case delay of the circuits obtained using conventional worst-case mapping techniques.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs

Daniel Gebhardt; Junbok You; Kenneth S. Stevens

The energy usage of on-chip interconnects is a concern for many system-on-chips targeting portable battery-powered devices. We have designed and evaluated a network-on-chip (NoC) for such an application, including tools to optimize for power and communication latency. Our asynchronous (clockless) network operates with efficient two-phase bundled-data links and four-phase routers. The topology and router floorplan is determined by our tool, ANetGen, which optimizes the network for energy and latency using simulated annealing and force-directed placement methods. We compare our solutions against a traditional synchronous NoC as specified by the COSI-2.0 framework and ORION 2.0 router and wire energy models. Traffic is simulated with SystemC functional models, and messages are generated with a “bursty” self-similar b-model. Results indicate our asynchronous network was more energy-efficient, lower in area, and provided comparable or superior message latency.

Explore More