Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shai Rotem is active.

Publication


Featured researches published by Shai Rotem.


IEEE Journal of Solid-state Circuits | 2001

An asynchronous instruction length decoder

Kenneth S. Stevens; Shai Rotem; Ran Ginosar; Peter A. Beerel; Chris J. Myers; Kenneth Y. Yun; R. Koi; Charles E. Dike; Marly Roncken

This paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium/sup (R)/ Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II/sup (R)/ 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25 /spl mu/m CMOS process and tested successfully. Results show significant advantages - in particular, performance of 2.5-4.5 instructions per nanosecond - with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400 MHz clocked circuit fabricated on the same process.


IEEE Transactions on Very Large Scale Integration Systems | 2003

Relative timing [asynchronous design]

Kenneth S. Stevens; Ran Ginosar; Shai Rotem

Relative timing (RT) is introduced as a method for asynchronous design. Timing requirements of a circuit are made explicit using relative timing. Timing can be directly added, removed, and optimized using this style. RT synthesis and verification are demonstrated on three example circuits, facilitating transformations from speed-independent circuits to burst-mode and pulse-mode circuits. Relative timing enables improved performance, area, power, and functional testability of up to a factor of 3/spl times/ in all three cases. This method is the foundation of optimized timed circuit designs used in an industrial test chip, and may be formalized and automated.


international symposium on advanced research in asynchronous circuits and systems | 1998

Average-case optimized technology mapping of one-hot domino circuits

Wei-Chun Chou; Peter A. Beerel; Ran Ginosar; Rakefet Kol; Chris J. Myers; Shai Rotem; Kenneth S. Stevens; Kenneth Y. Yun

This paper presents a technology mapping technique for optimizing the average-case delay of asynchronous combinational circuits implemented using domino logic and one-hot encoded outputs. The technique minimizes the critical path for common input patterns at the possible expense of making less common critical paths longer. To demonstrate the application of this technique, we present a case study of a combinational length decoding block, an integral component of an Asynchronous Instruction Length Decoder (AILD) which can be used in Pentium(R) processors. The experimental results demonstrate that the average-case delay of our mapped circuits can be dramatically lower than the worst-case delay of the circuits obtained using conventional worst-case mapping techniques.


symposium on code generation and optimization | 2010

TAO: two-level atomicity for dynamic binary optimizations

Edson Borin; Youfeng Wu; Cheng Wang; Wei Liu; Mauricio Breternitz; Shiliang Hu; Esfir Natanzon; Shai Rotem; Roni Rosner

Dynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution support. Frame-based optimizations leverage processor pipeline support to enable atomic execution of hot traces. Region level optimizations employ transactional-memory-like atomicity support to aggressively optimize large regions of code. In this paper we propose a two-level atomic optimization scheme which not only overcomes the limitations of the two approaches, but also boosts the benefits of the two approaches effectively. Our experiment shows that the combined approach can achieve a total of 21.5% performance improvement over an aggressive out-of-order baseline machine and improve the performance over the frame-based approach by an additional 5.3%.


international symposium on advanced research in asynchronous circuits and systems | 2000

CA-BIST for asynchronous circuits: a case study on the RAPPID asynchronous instruction length decoder

Marly Roncken; Kenneth S. Stevens; Rajesh Pendurkar; Shai Rotem; Parimal Pal Chaudhuri

This paper presents a case study in low-cost noninvasive Built-in Self Test (BIST) for RAPPID, a large-scale 120,000-transistor asynchronous version of the Pentium(R) Pro Instruction Length Decoded which runs at 3.6 GHz. RAPPID uses a synchronous 0.25 micron CMOS library for static and domino logic, and has no Design-for-Test hooks other than some debug features. We explore the use of Cellular Automata (CA) for on-chip test pattern generation and response evaluation. More specifically, we look for fast ways to tune the CA-BIST to the RAPPID design, rather than using pseudo-random testing. The metric for tuning the CA-BIST pattern generation is based on an abstract hardware description model of the instruction length decodes which is independent of implementation details, and hence also independent of the asynchronous circuit style. Our CA-BIST solution uses a novel bootstrap procedure for generating the test patterns, which give complete coverage for this metric, and cover 94% of the testable stuck-at faults for the actual design at switch level. Analysis of the undetected and untestable faults shows that the same fault effects can be expected for a similar clocked circuit. This is encouraging evidence that testability is no excuse to avoid asynchronous design techniques in addition to high-performance synchronous solutions.


design automation conference | 2002

Coordinated transformations for high-level synthesis of high performance microprocessor blocks

Sumit Gupta; Nick Savoiu; Nikil D. Dutt; Rajesh K. Gupta; Alexandru Nicolau; Timothy Kam; Michael Kishinevsky; Shai Rotem

High performance microprocessor designs are partially characterized by functional blocks consisting of a large number of operations that are packed into very few cycles (often single-cycle) with little or no resource constraints but tight bounds on the cycle time. Extreme parallelization, conditional and speculative execution of operations is essential to meet the processor performance goals. However, this is a tedious task for which classical high-level synthesis (HLS) formulations are inadequate and thus rarely used. In this paper, we present a new methodology for application of HLS targeted to such microprocessor functional blocks that can potentially speed up the design space exploration for microprocessor designs. Our methodology consists of a coordinated set of source-level and fine-grain parallelizing compiler transformations that targets these behavioral descriptions, specifically loop constructs in them and enables efficient chaining of operations and high-level synthesis of the functional blocks. As a case study in understanding the complexity and challenges in the use of HLS, we walk the reader through the detailed design of an instruction length decoder drawn from the Pentium®-family of processors. The chief contribution of this paper is formulation of a domain-specific methodology for application of high-level synthesis techniques to a domain that rarely, if ever, finds use for it.


international symposium on advanced research in asynchronous circuits and systems | 1999

Relative timing

Kenneth S. Stevens; Ran Ginosar; Shai Rotem


international symposium on advanced research in asynchronous circuits and systems | 1999

RAPPID: an asynchronous instruction length decoder

Shai Rotem; Kenneth S. Stevens; Ran Ginosar; Peter A. Beerel; Chris J. Myers; Kenneth Y. Yun; Rakefet Kol; Charles E. Dike; Marly Roncken; Boris Agapiev


Archive | 2015

Method And Apparatus For A Zero Voltage Processor Sleep State

Sanjeev Jahagirdar; Varghese George; John B. Conrad; Robert Milstrey; Stephen A. Fischer; Alon Naveh; Shai Rotem


Archive | 1993

Automatic design verification

Shai Rotem; Ze'ev Shtadler

Collaboration


Dive into the Shai Rotem's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ran Ginosar

Technion – Israel Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kenneth Y. Yun

University of California

View shared research outputs
Top Co-Authors

Avatar

Peter A. Beerel

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Rakefet Kol

Technion – Israel Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge