Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter Petrov is active.

Publication


Featured researches published by Peter Petrov.


Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571) | 2001

Towards effective embedded processors in codesigns: customizable partitioned caches

Peter Petrov; Alex Orailoglu

This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we propose results in increased performance reduced power consumption and improved determinism of critical system parts while the fixed design ensures processor standardization. The resulting improvements help to enlarge the significant role of embedded processors in modern hardware/software codesign techniques by leading to increased processor utilization and reduced hardware cost. A novel methodology for static analysis and a field-reprogrammable implementation of a customizable cache controller that implements a partitioned cache structure is proposed. The simulation results show significant decrease of miss ratio compared to traditional cache organizations.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2001

Performance and power effectiveness in embedded processors - customizable partitioned caches

Peter Petrov; Alex Orailoglu

This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for. customizing the processor microarchitecture that we propose results in increased performance, reduced power consumption and improved determinism of critical system parts while the fixed design ensures processor standardization. The resulting improvements help to enlarge the significant role of embedded processors in modern hardware-software codesign techniques by leading to increased processor utilization and reduced hardware cost. A novel methodology for static analysis and a microarchitecturally field-reprogrammable implementation of a customizable cache controller that implements a partitioned cache structure is proposed. Partitioning the load/store instructions eliminates cache interference; hence, precise knowledge about the hit/miss behavior of the references within each partition becomes available, resulting in significant reduction in tag reads and comparisons. Moreover, eliminating cache interference naturally leads to a significant reduction in the miss rate. The paper presents an algorithm for defining cache partitions, hardware support for customizable cache partitions, and a set of experimental results. The experimental results indicate significant improvements in both power consumption and miss rate.


custom integrated circuits conference | 2001

Low-cost, software-based self-test methodologies for performance faults in processor control subsystems

Sobeeh Almukhaizim; Peter Petrov; Alex Orailoglu

A software-based testing methodology for processor control subsystems, targeting hard-to-test performance faults in high-end embedded and general-purpose processors, is presented. An algorithm for directly controlling, using the instruction-set architecture only, the branch-prediction logic, a representative example of the class of processor control subsystems particularly prone to such performance faults, is outlined. Experimental results confirm the viability of the proposed methodology as a low-cost and effective answer to the problem of hard-to-test performance faults in processor architectures.


international symposium on systems synthesis | 2001

Data cache energy minimizations through programmable tag size matching to the applications

Peter Petrov; Alex Orailoglu

An application-specific customization methodology for minimizing the energy dissipation in the data cache of embedded processors is presented. The data cache subsystem is one of the most power consuming microarchitectural parts of embedded processors. We target the data cache tag operations and show how an exceedingly small number of tag bits, if any, are needed to compute the miss/hit behavior for the vast majority of load/store instructions executed within application loops. The energy needed to perform the tag reads and comparisons can be thus dramatically reduced. We follow up this conceptual enhancement with a presentation of an efficient, reprogrammable implementation that utilizes application-specific information to apply the suggested energy minimization approach. The conducted experimental results confirm the expected significant decrease of energy dissipation for a set of important numerical kernels.


international conference on computer design | 2003

Virtual page tag reduction for low-power TLBs

Peter Petrov; Alex Orailoglu

We present a methodology for a power-optimized, software-controlled translation lookaside buffer (TLB) organization. A highly reduced number of virtual page number (VPN) bits sufficient to perform physical address translation is efficiently identified and used when performing TLB lookups, delivering significant power reductions. Information regarding the virtual address space of the program code and data provided by the compiler is augmented with information regarding the dynamically linked libraries and data allocated run-time by the loader, the dynamic linker, and the memory manager. The hardware support needed is constrained to disabling bitlines of the tag arrays associated to the 1-TLB and the D-TLB. Algorithms for identifying the reduced VPNs for power optimized TLB operations together with the required OS support are presented.


international conference on computer aided design | 2003

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

Peter Petrov; Alex Orailoglu

We present an algorithm for compiler-driven register name adjustmentwith the main goal of power minimization on instruction fetchand register file access. In most instruction set architecture (ISA) designs,the register fields reside in fixed positions within the instructionencoding, hence forming streams of indices on the instruction bus andto the register file address decoder. The number of bit transitions inthese streams greatly determines the power consumption on the addressbus and the register file decoder. While general-purpose registersare semantically indistinguishable and hence interchangeable,the particular register indices do have a direct impact on power consumption.The algorithms presented in this paper address this powerminimization problem by reassigning/encoding the registers so thatthe bit transitions within the register index streams are minimized.


digital systems design | 2003

Low-power branch target buffer for application-specific embedded processors

Peter Petrov; Alex Orailoglu

In this paper we present a methodology for a low-power branch identification mechanism, which enables the design of extremely power efficient branch predictors for embedded processors. The proposed technique utilizes application-specific information regarding the control-flow structure of the program major loops. Such information is used to completely eliminate the power hungry branch target buffer (BTB) lookups which normally occur at every execution cycle. Exact application knowledge regarding the control-flow structure of the program obviates the power expensive BTB operations, thus enabling the utilization of contemporary branch predictors in high-end, yet power-sensitive embedded processors. The utilization of exact application knowledge results not only in the complete elimination of the power hungry BTB structure but also in a perfect branch and target address identification. Cost-efficient and programmable hardware architecture for capturing the control-flow structure of the program is presented thereafter. The hardware complexity of the proposed architecture is carefully analyzed in terms of power, performance and area overhead. The proposed technique delivers power reductions in excess of 90% for a set of embedded benchmarks.


asian test symposium | 2001

Faults in processor control subsystems: testing correctness and performance faults in the data prefetching unit

Sobeeh Almukhaizim; Peter Petrov; Alex Orailoglu

The processor control subsystems have for a long time been recognized as a bottleneck in the process of achieving complete fault coverage through various functional test propagation approaches. The difficult-to-test corner cases are further accentuated in fault-resilient control subsystems as no functional effect is incurred as a result of the fault, even though performance suffers. We investigate the construction of software programs, capable of providing full fault coverage at minimal hardware cost, for one such fault resilient subsystem in processor architecture: the data prefetching unit. Experimental results confirm the efficacy of the proposed method.


international symposium on microarchitecture | 2004

Transforming binary code for low-power embedded processors

Peter Petrov; Alex Orailoglu

Two program code transformation methodologies reduce the power consumption of instruction communication buses in embedded processors. Aimed at deep-submicron process technologies, these techniques offer an efficient solution for applications in which low power consumption is the key quality factor. We have developed two techniques for power minimization on the instruction bus of embedded processors. The first is compiler-driven register name adjustment (RNA), with the main goal of power minimization on instruction fetch and register file access. The second technique, more general in nature, incorporates transformations into the binary program code and necessitates hardware support on the processor side to efficiently restore the power-optimized program code.


international symposium on systems synthesis | 2002

Low-power data memory communication for application-specific embedded processors

Peter Petrov; Alex Orailoglu

We propose a novel customization methodology for power reduction on the communication link between an embedded processor and its data memory. We target the address bus and show how by utilizing application information about the memory references in the data intensive program loops, a power efficient address communication protocol can be established between the processor core and the data memory. The data memory controller thus generates the addresses for the various data streams with minimal run-time information from the processor engine, achieving significant power reductions on the address bus. An efficient reprogrammable hard-ware ware support is presented for enabling the proposed methodology. The experimental results demonstrate the efficacy of the approach for a set of data intensive applications.

Collaboration


Dive into the Peter Petrov's collaboration.

Top Co-Authors

Avatar

Alex Orailoglu

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge