Stefan G. Berg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stefan G. Berg is active.

Explore More

Publication

Featured researches published by Stefan G. Berg.

field programmable custom computing machines | 1997

Mapping applications to the RaPiD configurable architecture

Carl Ebeling; Darren C. Cronquist; Paul Franklin; Jason Secosky; Stefan G. Berg

The goal of the RaPiD (Reconfigurable Pipelined Datapath) architecture is to provide high performance configurable computing for a range of computationally-intensive applications that demand special-purpose hardware. This is accomplished by mapping the computation into a deep pipeline using a configurable array of coarse-grained computational units. A key feature of RaPiD is the combination of static and dynamic control. While the underlying computational pipelines are configured statically, a limited amount of dynamic control is provided which greatly increases the range and capability of applications that can be mapped to RaPiD. This paper illustrates this mapping and configuration for several important applications including a FIR filter, 2-D DCT, motion estimation, and parametric curve generation; it also shows how static and dynamic control are used to perform complex computations.

field-programmable custom computing machines | 1998

Specifying and compiling applications for RaPiD

Darren C. Cronquist; Paul Franklin; Stefan G. Berg; Carl Ebeling

Efficient, deeply pipelined implementations exist for a wide variety of important computation-intensive applications, and many special-purpose hardware machines have been built that take advantage of these pipelined computation structures. While these implementations achieve high performance, this comes at the expense of flexibility. On the other hand, flexible architectures proposed thus far have not been very efficient. RaPiD is a reconfigurable pipelined datapath architecture designed to provide a combination of performance and flexibility for a variety of applications. It uses a combination of static and dynamic control to efficiently implement pipelined computations. This control, however, is very complicated; specifying a computations control circuitry directly would be prohibitively difficult. This paper describes how specifications of a pipelined computation in a suitably high-level language are compiled into the control required to implement that computation in the RaPiD architecture. The compiler extracts a statically configured datapath from this description, identifies the dynamic control signals required to execute the computation, and then produces the control program and decoding structure that generates these dynamic control signals.

international conference on computer design | 2000

A register file with transposed access mode

Yoochang Jung; Stefan G. Berg; Donglok Kim; Yongmin Kim

We introduce a new register file architecture that provides both row-wise and column-wise accesses, thus allowing partitioned instructions to be used in column-wise processing without transposition overhead. This feature can accelerate 2D separable image and video processing algorithms, such as 2D convolution and 2D discrete cosine transform (DCT), by eliminating the transposition steps.

field programmable logic and applications | 2001

An Emulator for Exploring RaPiD Configurable Computing Architectures

Chris Fisher; Kevin Rennie; Guanbin Xing; Stefan G. Berg; Kevin Bolding; John Naegle; Daniel Parshall; Dmitriy Portnov; Adnan Sulejmanpasic; Carl Ebeling

The RaPiD project at the University of Washington has been studying configurable computing architectures optimized for coarse-grained data and computation units and deep computation pipelines. This research targets applications in the signal and image-processing domain since they make the greatest demand for computation and power in embedded and mobile computing applications, and these demands are increasing faster than Moores law. This paper describes the RaPiD Emulator, a system that will allow the exploration of alternative configurable architectures in the context of benchmark applications running in real-time. The RaPiD emulator provides enough FPGA gates to implement large RaPiD arrays, along with a high-performance streaming memory architecture and high-bandwidth data interfaces to a host processor and external devices. Running at 50 MHz, the emulator is able to achieve over 1 GMACs/second.

conference on multimedia computing and networking | 1998

Critical review of programmable media processor architectures

Stefan G. Berg; Weiyun Sun; Donglok Kim; Yongmin Kim

In the past several years, there has been a surge of new programmable mediaprocessors introduced to provide an alternative solution to ASICs and dedicated hardware circuitries in the multimedia PC and embedded consumer electronics markets. These processors attempt to combine the programmability of multimedia-enhanced general purpose processors with the performance and low cost of dedicated hardware. We have reviewed five current multimedia architectures and evaluated their strengths and weaknesses.

Journal of Systems Architecture | 2003

Use of embedded DRAMs in video and image computing

Coskun Mermer; Donglok Kim; Stefan G. Berg; Robert J. Gove; Yongmin Kim

Abstract We have evaluated the role of embedded dynamic random access memory (eDRAM) in the performance of programmable mediaprocessors, focusing on video/image computing. eDRAM’s contributions to improving the total system performance can be assessed by measuring the number of CPU stall cycles caused by the memory transactions. We decomposed the CPU stall cycles into three components: latency due to row access, latency due to the pipeline of memory transactions, and burst transfer time. We used a cycle-accurate cache and eDRAM model to measure the system performance in executing selected low-level video/image computing functions on a mediaprocessor core. We simulated various values for data bus width, page size, and row-access time of eDRAM, pipeline delay of a memory transaction, and data cache line size. While the wider data width of eDRAM does reduce the burst transfer time, the actual reduction in the total stall cycles when the width was expanded from 8 to 16 bytes was lower than expected, ranging from 6.2% to 18.9%. Instead, we found that the row-access latency and memory transaction pipeline delay represent the major portion of the CPU stall cycles. For example, in case of 32-byte wide data bus, they account for 85.3–95.1% of the memory busy time during which data cache misses are serviced. We show how to lower the CPU stall time further, e.g., using no-write-allocate data cache to reduce the total burst transfer time, efficient memory banking to reduce the number of eDRAM page misses, and various software/hardware methods to bring data to the cache before they are needed by the CPU. In particular, the regular memory access pattern in video/image computing allows several methods to enhance the memory performance in using eDRAM, e.g., enlarging the cache line size and data prefetching. This paper presents our methodology, experimental results, and findings, which would be useful to the design of highly integrated systems on a chip with eDRAM in the future.

Archive | 2004