Jeff H. Derby
Research Triangle Park
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jeff H. Derby.
international conference on acoustics, speech, and signal processing | 2003
Jeff H. Derby; Jaime H. Moreno
A low-power, high-performance, compiler-friendly DSP core has been under development in the IBM Communications Research & Development Center, as part of its eLite DSP project. This DSP incorporates instruction-level parallelism through the packing of multiple instructions in 64-bit long-instruction words, while data-level parallelism is realized through the use of SIMD techniques, such that SIMD operations can be applied to both dynamically composed vectors and packed vectors. Dynamic composition of vectors is made possible through the use of a vector pointer mechanism, which permits the addressing in a very flexible way of groups of four 16-bit elements in a large, multiport, scalar register file. This paper provides an overview of the architecture of this DSP core, with a focus on its SIMD features. We describe these features in some detail and discuss how they are used, with a block FIR filter and a radix-4 FFT taken as examples.
global communications conference | 2001
Jeff H. Derby
Parallelization of the linear-feedback shift register used to compute the CRC has long been recognized as a way to increase throughput. In all applications of this technique reported previously, the achievable increase in throughput is limited by an increase in the circuit complexity within the feedback loop; for a circuit that processes M bits of the input sequence in parallel, the throughput increase, or speed-up, appears to be asymptotically limited to M/2. In this paper, we develop a state-space transformation for the M-bits-at-a-time CRC system that reduces the complexity of its feedback loop to exactly that of the original bit-at-a-time system. This simplification comes at the cost of increased circuit complexity outside the feedback loop; however, these blocks can be pipelined. Thus the transformed system can achieve a full speed-up factor of M compared to the bit-at-a-time system. The transformation introduced in this paper is general in that it is valid in any field. It can thus be applied to encoders for cyclic codes over arbitrary finite fields that process M elements of an input sequence in parallel.
signal processing systems | 2000
John Glossner; Jaime H. Moreno; Mayan Moudgill; Jeff H. Derby; Erdem Hokenek; David Meltzer; Uzi Shvadron; Malcolm Scott Ware
We review the evolution of DSP architectures and compiler technology, and describe how compiler techniques are being used to optimize emerging DSP architectures. Such new architectures are characterized by the exploitation of data and instruction level parallelism while being an amenable target for a compiler, thereby reducing or eliminating the need to rely on assembly language programming and/or architecture-specific compiler intrinsics to achieve highly efficient code. We also summarize our research results on an ultra low power compilable DSP architecture.
international parallel and distributed processing symposium | 2007
Stephen L. Olivier; Jan F. Prins; Jeff H. Derby; Ken V. Vu
The Cell processor offers substantial computational power which can be effectively utilized only if application design and implementation are tuned to the Cell architecture. In this paper, we examine application characteristics which facilitate efficient use of the Cell processor, and those which present obstacles to it. Moreover, we consider possible solutions designed to mitigate inefficiencies. The target application in our study is the GROMACS molecular dynamics package. We have accelerated the most-often used compute-intensive kernel while maintaining the constraints imposed by the structure of the surrounding program. The significant contribution of this paper is the consideration of the kernel in the context of a complex end-to-end application, with irregular data and code patterns, rather than an isolated kernel code. For this challenging scenario, our results show a 2X speedup versus hand-tuned VMX/SSE code running on high-end PowerPC and x86 uniprocessor machines.
Journal of High Speed Networks | 1992
Israel Cidon; Jeff H. Derby; Inder S. Gopal; Bharath Kumar Kadaba
Fast Packet Switching (FPS) is emerging as the preferred technology for future high speed, integrated networks. Asynchronous Transfer Mode (ATM) is an approach to FPS that is in the process of standardization and is the preferred approach of the carrier community. Concurrently, alternative approaches to FPS based on variable sized packets have been proposed by segments of the data communication industry. These approaches include Frame Relay and an approach developed by IBM called PARIS. The purpose of this paper is to examine the suitability of ATM for data communications relative to some of these alternative approaches.
Computer Networks and Isdn Systems | 1997
Willibald A. Doeringer; Douglas Dykeman; M. Peters; Haldon J Sandick; Ken V. Vu; Jeff H. Derby
We present a comprehensive architectural model of a modular communication infrastructure for providing interconnection between a wide variety of networks with like protocols across a common backbone network. Our concepts are demonstrated to be an appropriate framework for the provision of (a-)synchronous virtual channels, LAN interconnection, and standard internetworking, thus covering the most typical current connectivity requirements. The base architecture has been implemented, and first products are being offered with interfaces for Frame Relay, Fiber Channel Standard, ATM, voice and clear channel services.
computing frontiers | 2006
Jeff H. Derby; Robert K. Montoye; José E. Moreira
There is increasing interest in the use of accelerators in computer systems. Accelerators are processor-attached hardware units that can perform certain functions faster than the conventional general purpose processor. In this paper, we describe the VICTORIA PowerPC architecture, which is based on the iVMX accelerator technology. The iVMX accelerator extends the existing VMX architecture with indirect register addressing. That approach greatly extends the architected space of registers and opens the door for highly optimized vector algorithms that can sustain very high processing rates. The large space of registers is directly controlled by the executing code and offers a sufficiently large storage to hold sizeable intermediate results. This helps reduce the negative effects of limited memory bandwidth and high memory latency. The iVMX accelerator is an example of in-line accelerator; that is, the instructions that drive the accelerator are part of the same stream that drives the main processor. Compared to off-line accelerators, which execute their own instruction stream, in-line accelerators present a much more convenient programming model.
global communications conference | 1988
R.J. Cherukuri; Jeff H. Derby; D. Japel
Addresses the potential for compatibility between the link layer protocols used by LAN (local area network) terminals and those used by ISDN (integrated services digital network) terminals. The authors find that although the LAN logical-link control (IEEE 802.2 Type 2) and the ISDN D-channel link-access protocol (LAPD) are not completely compatible with one another, it is possible to define an extended version of LAPD, referred to as LAPD+, that can interoperate with both LAPD and IEEE 802.2. They describe the appropriate modifications to LAPD, and point out that all but one are already included in a modified LAPD, referred to as LAP-V.120, that is defined in CCITT Draft Recommendation V.120. They suggest that the remaining incompatibility be taken care of through modification of IEEE 802.2. Finally, the authors describe several additional extensions to LAPD that could enhance the general utility of LAPD+ for high-speed data communication in the ISDN environment.<<ETX>>
computing frontiers | 2010
Alejandro Rico; Jeff H. Derby; Robert K. Montoye; Timothy H. Heil; Chen-Yong Cher; Pradip Bose
In this paper we evaluate the performance and power of a processor-attached in-line accelerator. The accelerator provides high-performance SIMD computing and power efficiency by means of a very large register file and a set of vector multimedia extensions based on IBMs PowerPC VMX. Our experiments show significant performance improvements and power reduction, compared to a baseline vector execution unit, mainly due to the drastic decrease of memory accesses caused by the software-managed locality of the very large register file. Total execution time is, on average, reduced by 61%, while consuming 55% less energy.
IEEE Transactions on Signal Processing | 1996
Jeff H. Derby
The present paper comments on a paper by Blommer and Wakefield (IEEE Trans. Signal Processing, vol.42, no.11, p.3245-8, 1994 Nov.). The present author shows that the log spectral matching technique described in Blommer and Wakefield is equivalent to least-squares cepstral matching, whose application to the construction of pole-zero approximations was reported in an earlier publication. We review several key characteristics of the cepstral matching procedure and indicate that it is computationally much simpler than the log spectral matching technique.