Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Damir A. Jamsek is active.

Publication


Featured researches published by Damir A. Jamsek.


IEEE Journal of Solid-state Circuits | 2008

An 8T-SRAM for Variability Tolerance and Low-Voltage Operation in High-Performance Caches

Leland Chang; Robert K. Montoye; Yutaka Nakamura; Kevin A. Batson; Richard J. Eickemeyer; Robert H. Dennard; Wilfried Haensch; Damir A. Jamsek

An eight-transistor (8T) cell is proposed to improve variability tolerance and low-voltage operation in high-speed SRAM caches. While the cell itself can be designed for exceptional stability and write margins, array-level implications must also be considered to achieve a viable memory solution. These constraints can be addressed by modifying traditional 6T-SRAM techniques and conceding some design complexity and area penalties. Altogether, 8T-SRAM can be designed without significant area penalty over 6T-SRAM while providing substantially improved variability tolerance and low-voltage operation with no need for secondary or dynamic power supplies. The proposed 8T solution is demonstrated in a high-performance 32 kb subarray designed in 65 nm PD-SOI CMOS that operates at 5.3 GHz at 1.2 V and 295 MHz at 0.41 V.


symposium on vlsi circuits | 2007

A 5.3GHz 8T-SRAM with Operation Down to 0.41V in 65nm CMOS

Leland Chang; Yutaka Nakamura; Robert K. Montoye; Jun Sawada; Andrew K. Martin; Kiyofumi Kinoshita; Fadi H. Gebara; Kanak B. Agarwal; Dhruva Acharyya; Wilfried Haensch; Kohji Hosokawa; Damir A. Jamsek

A 32 kb subarray demonstrates practical implementation of a 65 nm node 8T-SRAM cell for variability tolerance in highspeed caches. Ideal cell stability allows single-supply operation down to 0.41 V at 295 MHz without dynamic voltage techniques. Despite a larger cell, array area is competitive with 6T-SRAM due to higher array efficiency. With an LSDL decoder, a gated diode sense amplifier, and design tradeoffs enabled by the 8 T cell, 5.3 GHz operation at 1.2 V is achieved.


Ibm Journal of Research and Development | 2006

Limited switch dynamic logic circuits for high-speed low-power circuit design

Wendy Belluomini; Damir A. Jamsek; Andrew K. Martin; Chandler Todd McDowell; Robert K. Montoye; Hung C. Ngo; Jun Sawada

This paper describes a new circuit family--limited switch dynamic logic (LSDL). LSDL is a hybrid between a dynamic circuit and a static latch that combines the desirable properties of both circuit families. The paper also describes many enhancements and extensions to LSDL that increase its logical capability. Finally, it presents the results of two multiplier designs, one fabricated in 130- nm technology and one in 90-nm technology. The 130- and 90-nm designs respectively reach speeds up to 2.2 GHz and 8 GHz.


international solid-state circuits conference | 2005

An 8GHz floating-point multiply

Wendy Belluomini; Damir A. Jamsek; Andrew K. Martin; Chandler Todd McDowell; Robert K. Montoye; Tuyet Nguyen; Hung Ngo; Jun Sawada; Ivan Vo; R. Datta

The implementation of the mantissa portion of a floating-point multiply (54/spl times/54b) is described. The 0.124mm/sup 2/ multiplier is implemented using limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm SOI technology. The multiplier dissipates between 150mW and 1.8W as it scales between 2GHz and 8GHz.


asia and south pacific design automation conference | 2009

Designing and optimizing compute kernels on NVIDIA GPUs

Damir A. Jamsek

The availability of high performance compute capability in NVIDIA GPUs has expanded their use in CAD environments. We will describe the basic compute models including host/device programming models, device multi-thread programming models, as well optimization and performance tuning techniques.


Ibm Journal of Research and Development | 2015

Feature detection for image analytics via FPGA acceleration

H.-Y. Chang; I. H.-R. Jiang; H. P. Hofstee; Damir A. Jamsek; G.-J. Nam

With the growth of multimedia data generation and consumption, image-based data analytics plays an increasingly important role in big data analytics systems. For image analytics, feature detection algorithms provide a foundation for a variety of image-based applications. These algorithms are typically computationally intensive and thus are good candidates for acceleration with field programmable gate arrays (FPGAs). In this paper, we investigate a Harris-Laplace variant of scale-invariant feature detection, a widely used image analytics algorithm, to demonstrate the capability of acceleration. Based on stream computing, we construct a fully pipelined implementation that can process one pixel per FPGA clock cycle. Our implementation significantly outperforms the existing published work. The proposed implementation adopts a single-precision floating-point representation and can detect the features of 640


international conference on conceptual structures | 2013

Hardware Acceleration of an Efficient and Accurate Proton Therapy Monte Carlo

Thomas H. Osiecki; Min-yu Tsai; Anne E. Gattiker; Damir A. Jamsek; Sani R. Nassif; W. Evan Speight; Cliff C. N. Sze

\times


very large data bases | 2017

ExtraV: boosting graph processing near storage with a coherent accelerator

Jinho Lee; Heesu Kim; Sungjoo Yoo; Kiyoung Choi; H. Peter Hofstee; Gi-Joon Nam; Mark Richard Nutter; Damir A. Jamsek

480-pixel images at 540 frames per second. This throughput is sufficient for multistream real-time video interpretation.


Ibm Journal of Research and Development | 2015

Integrated high-performance data compression in the IBM z13

Anthony T. Sofia; C. C. Lewis; C. Jacobi; Damir A. Jamsek; Dale F. Riedy; Joerg-Stephan Vogt; Peter G. Sutton; R. W. St. John

Abstract Proton radiation therapy is one of the more effective forms of cancer treatment because of the high degree of selectivity afforded by the behavior of energetic protons in matter. But because radiation does not distinguish between tumor cells and healthy body tissue, it is important to insure that the radiation energy is deposited in the appropriate locations within a patient. This is even more important for proton beams because of the concentrated nature of the radiation energy dose they leave in a body. Predicting such dose distributions can be accurately done via complex and slow Monte Carlo based simulation (using tools such as Geant), but such simulators are too slow for use in interactive situations where a doctor is trying to determine the best beams to use for a particular patient. In this paper we report on an accurate but extremely fast Monte Carlo based proton dose distribution simulator code named Jack. The simulator uses the same physics as more complex tools, but leverages massive parallelization and a streamlined code architecture. The paper describes the state of Jack and shows runtime results for it with and without various hardware acceleration techniques. We benchmark Jack against Geant4.9.4.p01, a well established particle transport code, on a water phantom. Future plans are presented at the end for further speed enhancement and model development.


formal methods | 2003

Verisym: Verifying Circuits by Symbolic Simulation

William Adams; Warren A. Hunt; Damir A. Jamsek

In this paper, we propose ExtraV, a framework for near-storage graph processing. It is based on the novel concept of graph virtualization, which efficiently utilizes a cache-coherent hardware accelerator at the storage side to achieve performance and flexibility at the same time. ExtraV consists of four main components: 1) host processor, 2) main memory, 3) AFU (Accelerator Function Unit) and 4) storage. The AFU, a hardware accelerator, sits between the host processor and storage. Using a coherent interface that allows main memory accesses, it performs graph traversal functions that are common to various algorithms while the program running on the host processor (called the host program) manages the overall execution along with more application-specific tasks. Graph virtualization is a high-level programming model of graph processing that allows designers to focus on algorithm-specific functions. Realized by the accelerator, graph virtualization gives the host programs an illusion that the graph data reside on the main memory in a layout that fits with the memory access behavior of host programs even though the graph data are actually stored in a multi-level, compressed form in storage. We prototyped ExtraV on a Power8 machine with a CAPI-enabled FPGA. Our experiments on a real system prototype offer significant speedup compared to state-of-the-art software only implementations.

Researchain Logo
Decentralizing Knowledge