Is this you? Create Your Porfile

Jeff Sondeen

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jeff Sondeen is active.

Explore More

Publication

Featured researches published by Jeff Sondeen.

IEEE Transactions on Nuclear Science | 2007

Models and Algorithmic Limits for an ECC-Based Approach to Hardening Sub-100-nm SRAMs

Michael Bajura; Younes Boulghassoul; Riaz Naseer; Sandeepan DasGupta; Arthur F. Witulski; Jeff Sondeen; Scott Stansberry; Jeffrey Draper; Lloyd W. Massengill; John N. Damoulakis

A mathematical bit error rate (BER) model for upsets in memories protected by error-correcting codes (ECCs) and scrubbing is derived. This model is compared with expected upset rates for sub-100-nm SRAM memories in space environments. Because sub-100-nm SRAM memory cells can be upset by a critical charge (Qcrit) of 1.1 fC or less, they may exhibit significantly higher upset rates than those reported in earlier technologies. Because of this, single-bit-correcting ECCs may become impractical due to memory scrubbing rate limitations. The overhead needed for protecting memories with a triple-bit-correcting ECC is examined relative to an approximate 2X ldquoprocess generationrdquo scaling penalty in area, speed, and power.

international symposium on circuits and systems | 2005

Design trade-offs in floating-point unit implementation for embedded and processing-in-memory systems

Taek-Jun Kwon; Jeff Sondeen; Jeffrey Draper

Hardware support for floating-point (FP) arithmetic is a mandatory feature of modern microprocessor design. There are many alternatives in floating-point unit (FPU) design, and overall performance can be greatly affected by the organization of a floating-point unit. In this paper, design considerations and trade-off factors are evaluated for two types of floating-point unit architecture and implementation optimized under different design goals. The implementation results of the proposed FPUs based on standard cell methodology in TSMC 0.18 /spl mu/m technology exhibit that both designs are well optimized for their target applications. A single-instruction issue design is implemented in very small area; however, a design capable of concurrently executing FP add and multiply instructions is achievable with only a modest 24% area increase.

international conference on electronics, circuits, and systems | 2008

Floating-point division and square root implementation using a Taylor-series expansion algorithm

Taek-Jun Kwon; Jeff Sondeen; Jeffrey Draper

Hardware support for floating-point (FP) arithmetic is an essential feature of modern microprocessor design. Although division and square root are relatively infrequent operations in traditional general-purpose applications, they are indispensable and becoming increasingly important in many modern applications. In this paper, a fused floating-point multiply/divide/square root unit based on Taylor-series expansion algorithm is presented. The implementation results of the proposed fused unit based on standard cell methodology in IBM 90 nm technology exhibits that the incorporation of square root function to an existing multiply/divide unit requires only a modest 23% area increase and the same low latency for divide and square root operation can be achieved (12 cycles). The proposed arithmetic unit also exhibits a reasonably good area-performance balance.

european solid-state circuits conference | 2003

An area-efficient standard-cell floating-point unit design for a processing-in-memory system

Joong-Seok Moon; Taek-Jun Kwon; Jeff Sondeen; Jeffrey Draper

The data-intensive architecture (DIVA) system incorporates processing-in-memory (PIM) chips as smart-memory coprocessors to a microprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications. One of the key capabilities of this architecture is wideword floating-point computation, which enables aggregate floating-point operations. Each PIM chip includes eight basic instructions and IEEE-754 compliant rounding and exceptions. Through pipeline scheduling and a hardware-efficient division algorithm, the resulting FPU is well-balanced between area and performance. This paper details the design and implementation of this FPU based on standard cell methodology in 0.18/spl mu/m CMOS technology. Area, power dissipation and performance are also discussed.

international symposium on circuits and systems | 2004

A 0.18 /spl mu/m implementation of a floating-point unit for a processing-in-memory system

Taek-Jun Kwon; Joong-Seok Moon; Jeff Sondeen; Jeffrey Draper

The Data-Intensive Architecture (DIVA) system incorporates Processing-In-Memory (PIM) chips as smart-memory coprocessors to a microprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications. A key capability of this architecture is the support of parallel single-precision floating-point operations. Each PIM chip includes eight single-precision FPUs, each of which supports eight basic instructions and IEEE-754 compliant rounding and exceptions. Through block sharing and a hardware-efficient division algorithm, the resulting FPU is well-balanced between area and performance. This paper focuses on the novel divide algorithm implemented and documents the fabrication and testing of a prototype FPU based on standard cell methodology in TSMC 0.18 /spl mu/m CMOS technology.

great lakes symposium on vlsi | 2009

Multicast routing with dynamic packet fragmentation

Young Hoon Kang; Jeff Sondeen; Jeffrey Draper

Networks-on-Chip (NoCs) become a critical design factor as chip multiprocessors (CMPs) and systems on a chip (SoCs) scale up with technology. With fundamental benefits of high bandwidth and scalability in on-chip networks, a newly added multicast capability can further enhance the performance by reducing the network load and facilitate coherence protocols of many-core CMPs [10]. This paper proposes a novel multicast router with dynamic packet fragmentation in on-chip networks. Packet fragmentation is performed to avoid deadlock in blocking situations, releasing the hold of an output virtual channel (VC) and allowing another packet to use the freed VC. From circuit simulation of the design implemented with IBM 90nm technology, the proposed router reduces latency by 38.6% and consumes 9% less energy than a unicast baseline router at the baseline saturation.

signal processing systems | 2005

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Jeffrey Draper; Tim Barrett; Jeff Sondeen; Sumit Dharampal Mediratta; Chang Woo Kang; Ihn Kim; Gokhan Daglikoca

The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smart-memory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project has built a prototype development system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently ported several demonstration kernels to this platform and have exhibited a speedup of 35X on a matrix transpose operation.This paper focuses on the 32-bit scalar and 256-bit WideWord integer processing components of the first DIVA prototype PIM chip, which was fabricated in TSMC 0.18 μm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little “smart” logic added to memory devices. A second PIM prototype that includes WideWord floating-point capability is scheduled to tape out in August 2003.

midwest symposium on circuits and systems | 2007

Floating-point division and square root using a Taylor-series expansion algorithm

Taek-Jun Kwon; Jeff Sondeen; Jeffrey Draper

Hardware support for floating-point (FP) arithmetic is a mandatory feature of modern microprocessor design. Although division and square root are relatively infrequent operations in traditional general-purpose applications, they are indispensable and becoming increasingly important in many modern applications. Therefore, overall performance can be greatly affected by the algorithms and the implementations used for designing FP-div and FP-sqrt units. In this paper, a fused floating-point multiply/divide/square root unit based on Taylor-series expansion algorithm is proposed. We extended an existing multiply/divide fused unit to incorporate the square root function with little area and latency overhead since Taylors theorem enables us to compute approximations for many well-known functions with very similar forms. The proposed arithmetic unit exhibits a reasonably good area- performance balance.

international conference on vlsi design | 2004

An area-efficient router for the Data-Intensive Architecture (DIVA) system

Sumit Dharampal Mediratta; Jeff Sondeen; Jeffrey Draper

A key component of the Data-Intensive Architecture (DIVA) is the Processing-In-Memory (PIM) Routing Component (PiRC) that is responsible for efficient communication between PIM chips. This paper presents the design of a low area, delay and power router for DIVA. A 58.5% saving in area and 86% reduction in load on the clock as compared to an earlier PIM router design makes the presented design ideal for use in the second version of DIVA, with low area being a critical design requirement for DIVA. This paper also gives a comparison of the presented design with an earlier PIM router design in terms of delay and power to justify the new design choice.

international symposium on circuits and systems | 2006

A double-data rate (DDR) processing-in-memory (PIM) device with wideword floating-point capability

Tim Barrett; Sumit Dharampal Mediratta; Taek-Jun Kwon; Ravinder Singh; Sachit Chandra; Jeff Sondeen; Jeffrey Draper

The data-intensive architecture (DIVA) system incorporates processing-in-memory (PIM) chips as smart-memory coprocessors to a microprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications. A recently developed PIM chip in TSMC 0.18mum technology incorporates a DDR SDRAM interface for its inclusion in commodity systems, such as the HP zx6000 workstation used on this project. Each PIM chip includes eight single-precision floating-point units (FPU) in the wideword pipeline, enabling significant speedups in the target system. This paper focuses on the integration of new subcomponents into the PIM chip design, system integration, and measured system results, demonstrating the significant GFLOP/W feature offered by PIM computing

Explore More