Oskar Mencer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oskar Mencer is active.

Explore More

Publication

Featured researches published by Oskar Mencer.

IEEE Computer | 1997

Seeking solutions in configurable computing

William H. Mangione-Smith; Brad L. Hutchings; David L. Andrews; André DeHon; Carl Ebeling; Reiner W. Hartenstein; Oskar Mencer; John Morris; Krishna V. Palem; Viktor K. Prasanna; Henk A. E. Spaanenburg

Configurable computing offers the potential of producing powerful new computing systems. Will current research overcome the dearth of commercial applicability to make such systems a reality? Unfortunately, no system to date has yet proven attractive or competitive enough to establish a commercial presence. We believe that ample opportunity exists for work in a broad range of areas. In particular, the configurable computing community should focus on refining the emerging architectures, producing more effective software/hardware APIs, better tools for application development that incorporate the models of hardware reconfiguration, and effective benchmarking strategies.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

Accuracy-Guaranteed Bit-Width Optimization

Dong-U Lee; Altaf Abdul Gaffar; Ray C. C. Cheung; Oskar Mencer; Wayne Luk; George A. Constantinides

An automated static approach for optimizing bit widths of fixed-point feedforward designs with guaranteed accuracy, called MiniBit, is presented. Methods to minimize both the integer and fraction parts of fixed-point signals with the aim of minimizing the circuit area are described. For range analysis, the technique in this paper identifies the number of integer bits necessary to meet range requirements. For precision analysis, a semianalytical approach with analytical error models in conjunction with adaptive simulated annealing is employed to optimize the number of fraction bits. The analytical models make it possible to guarantee overflow/underflow protection and numerical accuracy for all inputs over the user-specified input intervals. Using a stream compiler for field-programmable gate arrays (FPGAs), the approach in this paper is demonstrated with polynomial approximation, RGB-to-YCbCr conversion, matrix multiplication, B-splines, and discrete cosine transform placed and routed on a Xilinx Virtex-4 FPGA. Improvements for a given design reduce the area and the latency by up to 26% and 12%, respectively, over a design using optimum uniform fraction bit widths. Studies show that MiniBit-optimized designs are within 1% of the area produced from the integer linear programming approach

field-programmable custom computing machines | 1998

PAM-Blox: high performance FPGA design for adaptive computing

Oskar Mencer; Martin Morf; Michael J. Flynn

PAM-Blox are object-oriented circuit generators on top of the PCI Pamette design environment, PamDC. High-performance FPGA design for adaptive computing is simplified by using a hierarchy of optimized hardware objects described in C++. PAM-Blox consist of two major layers of abstraction. First, PamBlox are parameterizable simple elements such as counters and adders. Automatic placement of carry chains and flexible shapes are supported. PaModules are more complex elements possibly instantiating PamBlox. PaModules have fixed shapes and are usually optimized for a specific data-width. Examples for PaModules are multipliers, Coordinate Rotations (CORDICs), and special arithmetic units for encryption. The key difference of our approach to most other design tools for FPGAs is that the designer has total control over placement at each level of the design hierarchy, which is the key to high-performance FPGA design. Second, the object interface was chosen carefully to encourage code-reuse and simplify code-sharing between designers. PAM-Blox are intended to be part of an open library that allows design sharing between members of the adaptive computing community.

field-programmable custom computing machines | 2003

Floating point unit generation and evaluation for FPGAs

Jian Liang; Russell Tessier; Oskar Mencer

Most commercial and academic floating point libraries for FPGAs (field programmable gate arrays) provide only a small fraction of all possible floating point units. In contrast, the floating point unit generation approach outlined in this paper allows for the creation of a vast collection of floating point units with differing throughput, latency, and area characteristics. Given performance requirements, our generation tool automatically chooses the proper implementation algorithm and architecture to create a compliant floating point unit. Our approach is fully integrated into standard C++ using ASC, a stream compiler for FPGAs, and the PAM-Blox II module generation environment. The floating point units created by our approach exhibit a factor of two latency improvement versus commercial FPGA floating point units, while consuming only half of the FPGA logic area.

field-programmable custom computing machines | 2004

Unifying bit-width optimisation for fixed-point and floating-point designs

Altaf Abdul Gaffar; Oskar Mencer; Wayne Luk

This paper presents a method that offers a uniform treatment for bit-width optimisation of both fixed-point and floating-point designs. Our work utilises automatic differentiation to compute the sensitivities of outputs to the bit-width of the various operands in the design. This sensitivity analysis enables us to explore and compare fixed-point and floating-point implementation for a particular design. As a result, we can automate the selection of the optimal number representation for each variable in a design to optimize area and performance. We implement our method in the BitSize tool targeting reconfigurable architectures, which takes user-defined constraints to direct the optimisation procedure. We illustrate our approach using applications such as ray-tracing and function approximation.

field-programmable technology | 2002

Floating-point bitwidth analysis via automatic differentiation

Altaf Abdul Gaffar; Oskar Mencer; Wayne Luk; Peter Y. K. Cheung; Nabeel Shirazi

Automatic bitwidth analysis is a key ingredient for highlevel programming of FPGAs and high-level synthesis of VLSI circuits. The objective is to find the minimal number of bits to represent a value in order to minimise the circuit area and to improve efficiency of the respective arithmetic operations, while satisfying user-defined numerical constraints. We present a novel approach to bitwidth- or precision-analysis for floating-point designs. The approach involves analysing the dataflow graph representation of a design to see how sensitive the output of a node is to changes in the outputs of other nodes: higher sensitivity requires higher precision and hence more output bits. We automate such sensitivity analysis by a mathematical method called automatic differentiation, which involves differentiating variables in a design with respect to other variables. We illustrate our approach by optimising the bitwidth for two examples, a discrete Fourier transform (DFT) implementation and a Finite Impulse Response (FIR) filter implementation.

IEEE Transactions on Computers | 2005

Optimizing hardware function evaluation

Dong-U Lee; Altaf Abdul Gaffar; Oskar Mencer; Wayne Luk

We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and optimization metrics, such as area, throughput, and latency. Function evaluation f(x) typically consists of range reduction and the actual evaluation on a small convenient interval such as [0, /spl pi//2) for sin(x). We investigate the impact of hardware function evaluation with range reduction for a given range and precision of x and f(x) on area and speed. An automated bit-width optimization technique for minimizing the sizes of the operators in the data paths is also proposed. We explore a vast design space for fixed-point sin(x), log(x), and /spl radic/x accurate to one unit in the last place using MATLAB and ASC, a stream compiler for field-programmable gate arrays (FPGAs). In this study, we implement over 2,000 placed-and-routed FPGA designs, resulting in over 100 million application-specific integrated circuit (ASIC) equivalent gates. We provide optimal function evaluation results for range and precision combinations between 8 and 48 bits.

asilomar conference on signals, systems and computers | 1999

Fast division algorithm with a small lookup table

Patrick Hung; Hossam A. H. Fahmy; Oskar Mencer; Michael J. Flynn

This paper presents a new division algorithm, which requires two multiplication operations and a single lookup in a small table. The division algorithm takes two steps. The table lookup and the first multiplication are processed concurrently in the first step, and the second multiplication is executed in the next step. This divider uses a single multiplier and a lookup table with 2/sup m/(2m+1) bits to produce 2 m-bit results that are guaranteed correct to one ulp. By using a multiplier and a 12.5 KB lookup table, the basic algorithm generates a 24-bit result in two cycles.

field-programmable logic and applications | 2007

Automatic Accuracy-Guaranteed Bit-Width Optimization for Fixed and Floating-Point Systems

William George Osborne; Ray C. C. Cheung; José Gabriel F. Coutinho; Wayne Luk; Oskar Mencer

In this paper we present Minibit+, an approach that optimizes the bit-widths of fixed-point and floating-point designs, while guaranteeing accuracy. Our approach adopts different levels of analysis giving the designer the opportunity to terminate it at any stage to obtain a result. Range analysis is achieved using a combined affine and interval arithmetic approach to reduce the number of bits. Precision analysis involves a coarse-grain and fine-grain analysis. The best representation, in fixed-point or floating-point, for the numbers is then chosen based on the range, precision and latency. Three case studies are used: discrete cosine transform, B-Splines and RGB to YCbCr color conversion. Our analysis can run over 200 times faster than current approaches to this problem while producing more accurate results, on average within 2-3% of an exhaustive search.

Communications of The ACM | 2013

Moving from petaflops to petadata

Michael J. Flynn; Oskar Mencer; Veljko Milutinovic; Goran Rakocevic; Per Stenström; Roman Trobec; Mateo Valero

The race to build ever-faster supercomputers is on, with more contenders than ever before. However, the current goals set for this race may not lead to the fastest computation for particular applications.

Explore More