Roman L. Lysecky | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roman L. Lysecky is active.

Explore More

Publication

Featured researches published by Roman L. Lysecky.

design automation conference | 2003

Dynamic hardware/software partitioning: a first approach

Greg Stitt; Roman L. Lysecky; Frank Vahid

Partitioning an application among software running on a microprocessor and hardware co-processor in on-chip configurable logic has been shown to improve performance and energy consumption in embedded systems. Meanwhile, dynamic software optimization methods have shown the usefulness and feasibility of runtime program optimization, but those optimizations do not achieve as much as partitioning. We introduce a first approach to dynamic hardware/software partitioning. We describe our system architecture and initial on-chip tools, including profiler, decompiler, synthesis, and placement and routing tools for a simplified configurable logic fabric, able to perform dynamic partitioning of real benchmarks. We show speedups averaging 2.6 for five benchmarks taken from Powerstone, Netbench and our own benchmarks.

design automation conference | 2006

Warp Processors

Roman L. Lysecky; Greg Stitt; Frank Vahid

We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that also improve software using an FPGA but do so using a special compiler, a warp processor achieves these improvements completely transparently and operates from a standard binary. A warp processor dynamically detects the binarys critical regions, reimplements those regions as a custom hardware circuit in the FPGA, and replaces the software region by a call to the new hardware implementation of that region. While not all benchmarks can be improved using warp processing, many can, and the improvements are dramatically better than those achievable by more traditional architecture improvements. The hardest part of warp processing is that of dynamically reimplementing code regions on an FPGA, requiring partitioning, decompilation, synthesis, placement, and routing tools, all having to execute with minimal computation time and data memory so as to coexist on chip with the main processor. We describe the results of developing our warp processor. We developed a custom FPGA fabric specifically designed to enable lean place and route tools, and we developed extremely fast and efficient versions of partitioning, decompilation, synthesis, technology mapping, placement, and routing. Warp processors achieve overall application speedups of 6.3X with energy savings of 66p across a set of embedded benchmark applications. We further show that our tools utilize acceptably small amounts of computation and memory which are far less than traditional tools. Our work illustrates the feasibility and potential of warp processing, and we can foresee the possibility of warp processing becoming a feature in a variety of computing domains, including desktop, server, and embedded applications.

design, automation, and test in europe | 2005

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

Roman L. Lysecky; Frank Vahid

Field programmable gate arrays (FPGAs) provide designers with the ability to create hardware circuits quickly. Increases in FPGA configurable logic capacity and decreasing FPGA costs have enabled designers to incorporate FPGAs more readily in their designs. FPGA vendors have begun providing configurable soft processor cores that can be synthesized onto their FPGA products. While FPGAs with soft processor cores provide designers with increased flexibility, such processors typically have degraded performance and energy consumption compared to hard-core processors. Previously, we proposed warp processing, a technique capable of optimizing a software application by dynamically and transparently re-implementing critical software kernels as custom circuits in on-chip configurable logic. We now study the potential of a MicroBlaze soft-core based warp processing system to eliminate the performance and energy overhead of a soft-core processor compared to a hard-core processor. We demonstrate that the soft-core based warp processor achieves average speedups of 5.8 and energy reductions of 57% compared to the soft core alone. Our data shows that a soft-core based warp processor yields performance and energy consumption competitive with existing hard-core processors, thus expanding the usefulness of soft processor cores on FPGAs to a broader range of applications.

IEEE Computer | 2008

Warp Processing: Dynamic Translation of Binaries to FPGA Circuits

Frank Vahid; Greg Stitt; Roman L. Lysecky

Warp processing dynamically and transparently transforms an executing microprocessors binary kernels into customized field-programmable gate array (FPGA) circuits, commonly resulting in 2X to 100X speedup over executing on microprocessors. A new architecture and set of dynamic CAD tools demonstrate warp processings potential.

design, automation, and test in europe | 2004

A self-tuning cache architecture for embedded systems

Chuanjun Zhang; Frank Vahid; Roman L. Lysecky

Memory accesses can account for about half of a microprocessor systems power consumption. Customizing a microprocessor caches total size, line size and associativity to a particular program is well known to have tremendous benefits for performance and power. Customizing caches has until recently been restricted to core-based flows, in which a new chip will be fabricated. However, several configurable cache architectures have been proposed recently for use in pre-fabricated microprocessor platforms. Tuning those caches to a program is still however a cumbersome task left for designers, assisted in part by recent computer-aided design (CAD) tuning aids. We propose to move that CAD on-chip, which can greatly increase the acceptance of configurable caches. We introduce on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program. We carefully designed the heuristic to avoid any cache flushing, since flushing is power and performance costly. By simulating numerous Powerstone and MediaBench benchmarks, we show that such a dynamic self-tuning cache can reduce memory-access energy by 45% to 55% on average, and as much as 97%, compared with a four-way set-associative base cache, completely transparently to the programmer.

design, automation, and test in europe | 2004

A configurable logic architecture for dynamic hardware/software partitioning

Roman L. Lysecky; Frank Vahid

In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.

ACM Transactions in Embedded Computing Systems | 2009

Design and implementation of a MicroBlaze-based warp processor

Roman L. Lysecky; Frank Vahid

While soft processor cores provided by FPGA vendors offer designers with increased flexibility, such processors typically incur penalties in performance and energy consumption compared to hard processor core alternatives. The recently developed technology of warp processing can help reduce those penalties. Warp processing is the dynamic and transparent transformation of critical software regions from microprocessor execution to much faster circuit execution on an FPGA. In this article, we describe an implementation of a warp processor on a Xilinx Virtex-II Pro and Spartan3 FPGAs incorporating one or more MicroBlaze soft processor cores. We further provide a detailed analysis of the energy overhead of dynamically partitioning an applications kernels to hardware executing within an FPGA. Considering an implementation that periodically partitions the executing application once every minute, a MicroBlaze-based warp processor implemented on a Spartan3 FPGA achieves average speedups of 5.8× and energy reductions of 49% compared to the MicroBlaze soft processor core alone—providing competitive performance and energy consumption compared to existing hard processor cores.

international conference on computer aided design | 2006

Application-specific customization of parameterized FPGA soft-core processors

David Sheldon; Rakesh Kumar; Roman L. Lysecky; Frank Vahid; Dean M. Tullsen

Soft-core microprocessors mapped onto field-programmable gate arrays (FPGAs) represent an increasingly common embedded software implementation option. Modern FPGA soft-cores are parameterized to support application-specific customization, wherein pre-defined units, such as a multiplication unit or floating-point unit, may be included in the microprocessor architecture to speed up software execution at the expense of increased size. We introduce a methodology for fast application-specific customization of a parameterized FPGA soft core, using synthesis and execution to obtain size and performance data in order to create a tool that can be used across a variety of tool platforms and FPGA devices. As synthesizing a soft core takes tens of minutes, developing heuristics that execute in an acceptable time of an hour or two, yet find near-optimal results, is a challenge. We consider two approaches, one using a traditional CAD approach that does an initial characterization using synthesis to create an abstract problem model and then explores the solution space using a knapsack algorithm, and the other using a synthesis-in-the-loop exploration approach. We compare approaches for a variety of design constraints, on 11 EEMBC benchmarks, using an actual Xilinx soft-core processor, and for two different commercial Xilinx FPGA devices. Our results show that the approaches can generate a customized configuration exhibiting roughly 2x speedups over a base soft core, reaching within 4% of optimal in about 1.5 hours, including complete synthesis of the soft-core onto the FPGA, compared to over 11 hours for exhaustive search. Our results also show that including synthesis-in-the-loop, compared to a traditional CAD approach, improved speedups by an average of 20% when size constraints were tight. The approaches may also be applicable to soft-core processors targeted to ASICs in addition to FPGAs

Communications of The ACM | 2015

Security challenges for medical devices

Johannes Sametinger; Jerzy W. Rozenblit; Roman L. Lysecky; Peter Ott

Implantable devices, often dependent on software, save countless lives. But how secure are they?

design automation conference | 2003

On-chip logic minimization

Roman L. Lysecky; Frank Vahid

While Boolean logic minimization is typically used in logic synthesis, logic minimization can be useful in numerous other applications. However, many of those applications, such as Internet Protocol routing table and network access control list reduction, require logic minimization during the applications runtime, and hence could benefit from minimization executing on-chip alongside the application. On-chip minimization can even enable dynamic hardware/software partitioning. We discuss requirements of on-chip logic minimization, and present our new on-chip logic minimization tool, ROCM. We compare with the well-known Espresso logic minimizer and show that ROCM is 10 times smaller, executes 10-20 times faster, and uses 3 times less data memory, with a mere 2% quality penalty, for the routing table and access control list applications. We show that ROCM solves real-sized problems on an ARM7 embedded processor in just seconds.

Explore More