Ralph D. Wittig | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ralph D. Wittig is active.

Explore More

Publication

Featured researches published by Ralph D. Wittig.

international symposium on computer architecture | 2009

Performance and power of cache-based reconfigurable computing

Andrew Putnam; Susan J. Eggers; Dave Bennett; Eric F. Dellinger; Jeff Mason; Henry E. Styles; Prasanna Sundararajan; Ralph D. Wittig

Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment. Many-cache creates multiple, multi-banked caches on top of an FGPAs small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler. This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

ACM Transactions on Reconfigurable Technology and Systems | 2010

MPI as a Programming Model for High-Performance Reconfigurable Computers

Manuel Saldaña; Arun Patel; Christopher A. Madill; Daniel Nunes; Danyao Wang; Paul Chow; Ralph D. Wittig; Henry E. Styles; Andrew Putnam

High-Performance Reconfigurable Computers (HPRCs) consist of one or more standard microprocessors tightly-coupled with one or more reconfigurable FPGAs. HPRCs have been shown to provide good speedups and good cost/performance ratios, but not necessarily ease of use, leading to a slow acceptance of this technology. HPRCs introduce new design challenges, such as the lack of portability across platforms, incompatibilities with legacy code, users reluctant to change their code base, a prolonged learning curve, and the need for a system-level Hardware/Software co-design development flow. This article presents the evolution and current work on TMD-MPI, which started as an MPI-based programming model for Multiprocessor Systems-on-Chip implemented in FPGAs, and has now evolved to include multiple X86 processors. TMD-MPI is shown to address current design challenges in HPRC usage, suggesting that the MPI standard has enough syntax and semantics to program these new types of parallel architectures. Also presented is the TMD-MPI Ecosystem, which consists of research projects and tools that are developed around TMD-MPI to further improve HPRC usability. Finally, we present preliminary communication performance measurements.

international workshop on high-performance reconfigurable computing technology and applications | 2008

MPI as an abstraction for software-hardware interaction for HPRCs

Manuel Saldaña; Aarun Patel; Christopher A. Madill; Daniel Nunes; Danyao Wang; Henry E. Styles; Andrew Putnam; Ralph D. Wittig; Paul Chow

High performance reconfigurable computers (HPRCs) consist of one or more standard microprocessors tightly coupled with one or more reconfigurable FPGAs. HPRCs have been shown to provide good speedups and good cost/performance ratios, but not necessarily ease of use, leading to a slow acceptance of this technology. HPRCs introduce new design challenges, such as the lack of portability across platforms, incompatibilities with legacy code, users reluctant to change their code base, a prolonged learning curve, and the need for a system-level hardware/software co-design development flow. This paper presents the evolution and current work on TMD-MPI, which started as an MPI-based programming model for multiprocessor systems-on-chip implemented in FPGAs, and has now evolved to include multiple X86 processors. TMD-MPI is shown to address current design challenges in HPRC usage, suggesting that the MPI standard has enough syntax and semantics to program these new types of parallel architectures. Also presented is the TMD-MPI ecosystem, which consists of research projects and tools that are developed around TMD-MPI to further improve HPRC usability.

IEEE Micro | 2016

A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform

Sagheer Ahmad; Vamsi Boppana; Ilya K. Ganusov; Vinod K. Kathail; Vidya Rajagopalan; Ralph D. Wittig

This article presents the Zynq UltraScale+ MPSoC (multiprocessor system on chip), which builds on the Zynq-7000 family. Compared to the first-generation Zynq, MPSoC increases performance and power efficiency while significantly improving the integration level between the SoC and the field-programmable gate array (FPGA). It also further raises the programming abstraction with the introduction of a new heterogeneous system-wide compiler. At the hardware level, system-wide coherency and shared virtual memory bridge across the processor subsystem into the programmable logic array. The new SDSoC (software-designed SoC) environment combines the ARM compiler with a high-level synthesis technology-based FPGA compiler and a full-system optimizing compiler to target all elements of the heterogeneous SoC from a common program source.

international workshop on high-performance reconfigurable computing technology and applications | 2008

Evaluating FPGAs for floating-point performance

Dave Strenski; Jim Simkins; Richard L. Walke; Ralph D. Wittig

Field programmable gate arrays (FPGAs) have been available for more than 25 years. Initially they were used to simplify embedded processing circuits and then expanded into simulating application specific integrated circuit (ASIC) designs. In the past few years they have grown in density and speed to replace ASICs in some applications and to assist microprocessors as attached accelerators. This paper will calculate the floating-point peak performance for three types of FPGAs using 64-bit, 32-bit, and 24-bit word lengths and compare this with a reference quad-core microprocessor. These calculations are further refined to estimate the actual performance of these FPGAs at floating-point calculations and compared with the microprocessor at its optimal design point and also away from this design point. Lastly, the paper explores the nature of floating-point calculations and looks at examples where the same algorithmic accuracy can be achieved with non-floating-point calculations.

field-programmable technology | 2015

OpenCL library of stream memory components targeting FPGAs

Jasmina Vasiljevic; Ralph D. Wittig; Paul R. Schumacher; Jeff Fifield; Fernando Martinez Vallina; Henry E. Styles; Paul Chow

In recent years, high-level languages and compilers, such as OpenCL have improved both productivity and FPGA adoption on a wider scale. One of the challenges in the design of high-performance stream FPGA applications is iterative manual optimization of the numerous application buffers (e.g., arrays, FIFOs and scratch-pads). First, to achieve the desired throughput, the programmer faces the burden of analyzing the memory accesses of each application buffer, and based on observed data locality determines the optimal on-chip buffering, and off-chip read/write data access strategy. Second, to minimize throughput bottlenecks, the programmer has to carefully partition the limited on-chip memory resources among many application buffers. In this work we present an FPGA OpenCL library of pre-optimized stream memory components (SMCs). The library contains three types of SMCs, which implement frequently applied data transformations: 1) stencil, 2) transpose and 3) tiling. The library generates SMCs that are optimized both for the specific data transformation they perform as well as the user specified data set size. Further, to ease the partitioning of on-chip memory resources among many application memories, the library automatically maps application buffers to on-chip and off-chip memory resources. This is achieved by enabling the programmer to specify an on-chip memory budget for each component. In terms of on-chip memory, the SMCs perform data buffering to exploit data locality and maximize reuse. In terms of off-chip memory accesses, the SMCs optimize read/write memory operations by performing data coalescing, bursting and prefetching. We show that using the SMC library, the programmer can quickly generate scalable, pre-optimized stream application memory components, thus reaching throughput targets without time consuming manual memory optimization.

international symposium on microarchitecture | 2010

Guest Editors' Introduction: Hot Chips 21

Krste Asanovic; Ralph D. Wittig

Hot Chips has emerged as the leading forum to present new processor architectures and other developments in chip design, with an emphasis on deep technical discussion of the detailed workings of upcoming commercial products. A few selected presentations from leading academic researchers complement the industry focus. To encourage busy industry engineers to submit to the conference, the submission guidelines require only a short abstract and the conference proceedings contain only the presentation slides. From more than 75 submissions, the program committee selected 27 presentations for the main conference. This special issue contains seven full-length articles representing the important themes discussed at the 21st annual Hot Chips conference.

ieee hot chips symposium | 2010

28nm generation programmable families

Brad Taylor; Ralph D. Wittig

This article consists of a collection of slides from the authors conference presentation on the Xilinx 7 Series FPGA family of products. Some of the specific topics discussed include: the special features, system specifications, and system design for these products; system architectures; applications for use; platforms supported; processing capabilities; memory capabilities; and targeted markets.

IEEE Micro | 2011

Big Chips

Krste Asanovic; Ralph D. Wittig

This introduction to the special issue provides a snapshot and a sampling of current activity related to the architecture and design of big chips.

Customizable Embedded Processors#R##N#Design Technologies and Applications | 2007

Designing Soft Processors for FPGAs

Goran Bilski; Sundarajarao Mohan; Ralph D. Wittig

Publisher Summary This chapter shows the implementation of architectural elements, such as buses, muxes, ALUs, register files, and FIFOs on FPGAs, and the tradeoffs involved in selecting different size parameters for these architectural elements. The architecture of the soft processor is constrained by the FPGA implementation fabric. For example, the number of pipe stages is limited to three in our example, because adding more pipe stages increases the number and the size of multiplexers in the processor, and the relatively high cost of these multiplexers (relative to ASIC implementations) reduces the possible speed advantage. However, these relative costs can change as the FPGA fabric evolves, and new optimizations might be necessary. The speed and area of these implementations dictate whether a bus-based or mux-based processor implementation is chosen. The use of lookup tables (LUTs) instead of logic gates implies that some additional logic is free, but beyond a certain point there is variation in terms of the speed/area of an implementation.

Explore More