Damir A. Jamsek
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Damir A. Jamsek.
IEEE Journal of Solid-state Circuits | 2008
Leland Chang; Robert K. Montoye; Yutaka Nakamura; Kevin A. Batson; Richard J. Eickemeyer; Robert H. Dennard; Wilfried Haensch; Damir A. Jamsek
An eight-transistor (8T) cell is proposed to improve variability tolerance and low-voltage operation in high-speed SRAM caches. While the cell itself can be designed for exceptional stability and write margins, array-level implications must also be considered to achieve a viable memory solution. These constraints can be addressed by modifying traditional 6T-SRAM techniques and conceding some design complexity and area penalties. Altogether, 8T-SRAM can be designed without significant area penalty over 6T-SRAM while providing substantially improved variability tolerance and low-voltage operation with no need for secondary or dynamic power supplies. The proposed 8T solution is demonstrated in a high-performance 32 kb subarray designed in 65 nm PD-SOI CMOS that operates at 5.3 GHz at 1.2 V and 295 MHz at 0.41 V.
symposium on vlsi circuits | 2007
Leland Chang; Yutaka Nakamura; Robert K. Montoye; Jun Sawada; Andrew K. Martin; Kiyofumi Kinoshita; Fadi H. Gebara; Kanak B. Agarwal; Dhruva Acharyya; Wilfried Haensch; Kohji Hosokawa; Damir A. Jamsek
A 32 kb subarray demonstrates practical implementation of a 65 nm node 8T-SRAM cell for variability tolerance in highspeed caches. Ideal cell stability allows single-supply operation down to 0.41 V at 295 MHz without dynamic voltage techniques. Despite a larger cell, array area is competitive with 6T-SRAM due to higher array efficiency. With an LSDL decoder, a gated diode sense amplifier, and design tradeoffs enabled by the 8 T cell, 5.3 GHz operation at 1.2 V is achieved.
Ibm Journal of Research and Development | 2006
Wendy Belluomini; Damir A. Jamsek; Andrew K. Martin; Chandler Todd McDowell; Robert K. Montoye; Hung C. Ngo; Jun Sawada
This paper describes a new circuit family--limited switch dynamic logic (LSDL). LSDL is a hybrid between a dynamic circuit and a static latch that combines the desirable properties of both circuit families. The paper also describes many enhancements and extensions to LSDL that increase its logical capability. Finally, it presents the results of two multiplier designs, one fabricated in 130- nm technology and one in 90-nm technology. The 130- and 90-nm designs respectively reach speeds up to 2.2 GHz and 8 GHz.
international solid-state circuits conference | 2005
Wendy Belluomini; Damir A. Jamsek; Andrew K. Martin; Chandler Todd McDowell; Robert K. Montoye; Tuyet Nguyen; Hung Ngo; Jun Sawada; Ivan Vo; R. Datta
The implementation of the mantissa portion of a floating-point multiply (54/spl times/54b) is described. The 0.124mm/sup 2/ multiplier is implemented using limited switch dynamic logic and operates at speeds up to 8GHz in a 90nm SOI technology. The multiplier dissipates between 150mW and 1.8W as it scales between 2GHz and 8GHz.
asia and south pacific design automation conference | 2009
Damir A. Jamsek
The availability of high performance compute capability in NVIDIA GPUs has expanded their use in CAD environments. We will describe the basic compute models including host/device programming models, device multi-thread programming models, as well optimization and performance tuning techniques.
Ibm Journal of Research and Development | 2015
H.-Y. Chang; I. H.-R. Jiang; H. P. Hofstee; Damir A. Jamsek; G.-J. Nam
With the growth of multimedia data generation and consumption, image-based data analytics plays an increasingly important role in big data analytics systems. For image analytics, feature detection algorithms provide a foundation for a variety of image-based applications. These algorithms are typically computationally intensive and thus are good candidates for acceleration with field programmable gate arrays (FPGAs). In this paper, we investigate a Harris-Laplace variant of scale-invariant feature detection, a widely used image analytics algorithm, to demonstrate the capability of acceleration. Based on stream computing, we construct a fully pipelined implementation that can process one pixel per FPGA clock cycle. Our implementation significantly outperforms the existing published work. The proposed implementation adopts a single-precision floating-point representation and can detect the features of 640
international conference on conceptual structures | 2013
Thomas H. Osiecki; Min-yu Tsai; Anne E. Gattiker; Damir A. Jamsek; Sani R. Nassif; W. Evan Speight; Cliff C. N. Sze
\times
very large data bases | 2017
Jinho Lee; Heesu Kim; Sungjoo Yoo; Kiyoung Choi; H. Peter Hofstee; Gi-Joon Nam; Mark Richard Nutter; Damir A. Jamsek
480-pixel images at 540 frames per second. This throughput is sufficient for multistream real-time video interpretation.
Ibm Journal of Research and Development | 2015
Anthony T. Sofia; C. C. Lewis; C. Jacobi; Damir A. Jamsek; Dale F. Riedy; Joerg-Stephan Vogt; Peter G. Sutton; R. W. St. John
Abstract Proton radiation therapy is one of the more effective forms of cancer treatment because of the high degree of selectivity afforded by the behavior of energetic protons in matter. But because radiation does not distinguish between tumor cells and healthy body tissue, it is important to insure that the radiation energy is deposited in the appropriate locations within a patient. This is even more important for proton beams because of the concentrated nature of the radiation energy dose they leave in a body. Predicting such dose distributions can be accurately done via complex and slow Monte Carlo based simulation (using tools such as Geant), but such simulators are too slow for use in interactive situations where a doctor is trying to determine the best beams to use for a particular patient. In this paper we report on an accurate but extremely fast Monte Carlo based proton dose distribution simulator code named Jack. The simulator uses the same physics as more complex tools, but leverages massive parallelization and a streamlined code architecture. The paper describes the state of Jack and shows runtime results for it with and without various hardware acceleration techniques. We benchmark Jack against Geant4.9.4.p01, a well established particle transport code, on a water phantom. Future plans are presented at the end for further speed enhancement and model development.
formal methods | 2003
William Adams; Warren A. Hunt; Damir A. Jamsek
In this paper, we propose ExtraV, a framework for near-storage graph processing. It is based on the novel concept of graph virtualization, which efficiently utilizes a cache-coherent hardware accelerator at the storage side to achieve performance and flexibility at the same time. ExtraV consists of four main components: 1) host processor, 2) main memory, 3) AFU (Accelerator Function Unit) and 4) storage. The AFU, a hardware accelerator, sits between the host processor and storage. Using a coherent interface that allows main memory accesses, it performs graph traversal functions that are common to various algorithms while the program running on the host processor (called the host program) manages the overall execution along with more application-specific tasks. Graph virtualization is a high-level programming model of graph processing that allows designers to focus on algorithm-specific functions. Realized by the accelerator, graph virtualization gives the host programs an illusion that the graph data reside on the main memory in a layout that fits with the memory access behavior of host programs even though the graph data are actually stored in a multi-level, compressed form in storage. We prototyped ExtraV on a Power8 machine with a CAPI-enabled FPGA. Our experiments on a real system prototype offer significant speedup compared to state-of-the-art software only implementations.