Michael A. Schuette | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael A. Schuette is active.

Explore More

Publication

Featured researches published by Michael A. Schuette.

IEEE Transactions on Computers | 1994

Exploiting instruction-level parallelism for integrated control-flow monitoring

Michael A. Schuette; John Paul Shen

Computer architectures are using increased degrees of instruction-level machine parallelism to achieve higher performance, e.g., superpipelined, superscalar and very long instruction word (VLIW) processors. Full utilization of such machine parallelism is difficult to achieve and sustain, resulting in the occurrence of idle resources at run time. This work explores the use of such idle resources for concurrent error detection in processors employing instruction-level machine parallelism. The Multiflow TRACE 14/300 processor, a VLIW machine, is chosen as an experimental vehicle. Experiments indicate that significant idle resources are likely to exist across a wide range of scientific applications for the TRACE 14/300. A methodology is presented for detecting transient control-flow errors, called available resource-driven control-flow monitoring (ARC), whose resource use can be tailored to the existence of idle resources in the processor. Results of applying ARC to the Multiflow TRACE 14/300 processor show that >99% of control-flow errors are detected with negligible performance overhead. These results demonstrate that ARC is highly effective in using the idle resources of a processor to achieve concurrent error detection at a very low cost. >

intelligent vehicles symposium | 2005

RSVP II: a next generation automotive vector processor

Silviu Chiricescu; S. Chai; Kent D. Moat; Brian G. Lucas; P. May; J. Norm; Raymond B. Essick; Michael A. Schuette

A large number of sensors (i.e., video, radar, laser, ultrasound, etc.) that continuously monitor the environment are finding their way in the average automobile. The algorithms processing the data captured by these sensors are streaming in nature and require a high rate of computation. Due to the characteristics of the automotive environment, this computation has to be delivered under very low energy and cost budgets. The reconfigurable streaming vector processing (RSVP/spl trade/) architecture is a vector coprocessor architecture which accelerates streaming data processing. This paper presents the RSVP architecture and its second implementation, RSVP II. Our results show significant speedups on data streaming functions running compiled code. On a lane tracking application, RSVP II shows impressive performance results. From a performance/

The Journal of Supercomputing | 1993

Instruction-level experimental evaluation of the multiflow TRACE 14/300 VLIW computer

Michael A. Schuette; John Paul Shen

and performance/mW perspective, RSVP architecture compares favorably with leading DSP architectures. The time to market is substantially reduced due to ease of programmability, elimination of hand-tuned assembly code, and support for software re-use through binary compatibility across multiple implementations.

international symposium on microarchitecture | 1991

An instruction-level performance analysis of the Multiflow TRACE 14/300

Michael A. Schuette; John Paul Shen

Advances in compiler technology have recently led to the introduction of the architectural paradigm known as thevery long instruction word (VLIW) architecture. The Multiflow Trace series of processors is the first commercial line of processors with this architecture. This article presents experimental results concerning the performance and resource utilization of the TRACE 14/300 on a set of 11 common scientific programs written in both C and FORTRAN. Several characteristics of the application, architecture, implementation, and compiler that contribute to the observed results are identified. These characteristics include a conservative approach by the compiler in determining the existence of data dependence and disambiguating memory references, memory latency and resource dependences resulting from the TRACE 14/300 implementation, and actual data dependences that exist within the code. Alleviating the effects of the first three of these bottlenecks is found to improve the TRACE 14/300 performance by a factor of 1.55 on average. Performance of the TRACE 14/300 is also measured on several standard benchmarks, including the SPEC89 benchmark suite. Performance on the SPEC89 benchmarks is found to be comparable to the superscalar IBM RS/6000 when differences in implementation technology are considered. Concluding remarks concerning instruction-level parallel processing are also presented.

international conference on computer aided design | 1994

Embedded systems design for low energy consumption

Michael A. Schuette; John R. Barr

Advances in compiler technology have recently led to the introduction of a new architectural paradigm, called the Very Long Instruction Word (VLIW) architecture. The Multijlow TRACE series of processors is the jirst commercial line of processors wv”th this architecture. Information on the performance of the TRACE is of sigtujicant value to the design of all processors intended to exploit jine-grain parallelism. This paper presents results concerning the performance and resource utilization of the TRACE 14/300 on a set of 11 common scienti~c programs written in both C and FORTRAN. Several characteristics of the application, architecture, implementation, and compiler that contribute to the observed results are identified. Performance of the TRACE 14/300 is also measured on several standard benchmarks, including the SPEC benchmark suite. Comparisons are made with results from other processors. The architectural effectiveness of the TRACE 141300 appears to be better than most existing RISC workstations and is comparable to the best current superscalar workstations.

ieee intelligent vehicles symposium | 2004

RSVP/spl trade/: an automotive vector processor

Silviu Chiricescu; Michael A. Schuette; Raymond B. Essick; Brian G. Lucas; P. May; Kent D. Moat; J. Norris

This tutorial covers the circuit fundamentals of CMOS circuits which contribute to the consumption of energy in portable products, as well as guidelines for the design of systems in order to reduce energy consumption and prolong battery life. Circuit fundamentals will include a definition of terms, basic circuit elements, laws of operation, and basic circuit theory applying energy consumption. We will then present three major principles of energy reduction: reducing number of transitions, reducing the amount of switched capacitance, and reducing the operating voltage. Several guidelines that can be applied during the system design process which utilize the three major principles.

international symposium on communications and information technologies | 2007

Techniques to improve motion compensation performance of H264 video decoder using a vector processor

Vijaya Yajnanarayana; Raghavan Subramaniyan; Michael A. Schuette

A myriad of sensors (i.e., video, radar, laser, ultrasound, etc.) continuously monitoring the environment are incorporated in future automobiles. The algorithms processing the data captured by these sensors are streaming in nature and require high levels of processing power. Due to the characteristics of the automotive market, this processing power has to be delivered under very low energy and cost budgets. The Reconfigurable Streaming Vector Processing (RSVP/spl trade/) is a vector coprocessor architecture which accelerates streaming data processing. This paper presents the RSVP architecture, programming model, and a first implementation. Our results show significant speedups on data streaming functions. Running compiled code, RSVP outperforms an ARM9 host processor on average by a factor of 31 on a set of kernels. From a performance/

international symposium on microarchitecture | 2003

The Reconfigurable Streaming Vector Processor (RSVPTM)

Silviu Ciricescu; Ray Essick; Brian Lucas; P. May; Kent D. Moat; J. Norris; Michael A. Schuette; Ali Saidi

and performance/mW perspective, RSVP compares favorably with leading DSP architectures. The time to market is substantially reduced due to ease of programmability, elimination of hand-tuned assembly code, and support for software re-use through binary compatibility across multiple implementations.

international test conference | 1983

On-Line Self-Monitoring Using Signatured Instruction Streams.

Michael A. Schuette; John Paul Shen

Motion Compensation for video decoding in standards like H.264 requires significant amount of computation. This is primarily because of H.264 six-tap FIR filtering for sub-sample computation. These algorithms typically take more than 50% of the computational time on a RISC processor like ARM. Novel algorithms proposed through this paper can be employed for systems which use vector processors as video decode accelerators to accelerate this process. The proposed algorithms are implemented on H264 video decode system with ARM9 host-processor and RSVP vector processor as an accelerator for key decode algorithms. By employing the proposed algorithms we were able to accelerate the motion compensation module by more than 4 times as compared to plain RISC implementation. This is achieved by efficiently vectorizing data on which FIR-filtering and reconstruction algorithm is operated on, together with optimal representation of FIR-filtering and reconstruction algorithm itself on vector processor.

field programmable logic and applications | 2002