Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Duncan G. Elliott is active.

Publication


Featured researches published by Duncan G. Elliott.


IEEE Design & Test of Computers | 1999

Computational RAM: implementing processors in memory

Duncan G. Elliott; M. Stumm; W.M. Snelgrove; C. Cojocaru; R. Mckenzie

Computational RAM is a processor-in-memory architecture that makes highly effective use of internal memory bandwidth by pitch-matching simple processing elements to memory columns. Computational RAM can function either as a conventional memory chip or as a SIMD (single-instruction stream, multiple-data stream) computer. When used as a memory, computational RAM is competitive with conventional DRAM in terms of access time, packaging and cost. Adding logic to memory is not a simple question of bolting together two existing designs. The paper considers how computational RAM integrates processing power with memory by using an architecture that preserves and exploits the features of memory.


Integration | 2008

A scalable LDPC decoder ASIC architecture with bit-serial message exchange

Tyler L. Brandon; Robert Hang; Gary Block; Vincent C. Gaudet; Bruce F. Cockburn; Sheryl L. Howard; Christian Giasson; Keith Boyle; Paul A. Goud; Siavash Sheikh Zeinoddin; Anthony Rapley; Stephen Bates; Duncan G. Elliott; Christian Schlegel

We present a scalable bit-serial architecture for ASIC realizations of low-density parity check (LDPC) decoders. Supporting the architectures potential, we describe a decoder implementation for a (256,128) regular-(3,6) LDPC code that has a decoded information throughput of 250Mbps, a core area of 6.96mm^2 in 180-nm 6-metal CMOS, and an energy efficiency of 7.56nJ per uncoded bit at low signal-to-noise ratios. The decoder is fully block-parallel, with all bits of each 256-bit codeword being processed by 256 variable nodes and 128 parity check nodes that together form an 8-stage iteration pipeline. Extrinsic messages are exchanged bit-serially between the variable and parity check nodes to significantly reduce the interleaver wiring. Parity check node processing is also bit-serial. The silicon implementation performs 32 iterations of the min-sum decoding algorithm on two staggered codewords in the same pipeline. The results of a supplementary layout study show that the reduced wiring congestion makes the decoder readily scaleable up to the longer kilobit-size LDPC codewords that appear in important emerging communication standards.


IEEE Transactions on Circuits and Systems | 2006

Termination Sequence Generation Circuits for Low-Density Parity-Check Convolutional Codes

Stephen Bates; Duncan G. Elliott; Ramkrishna Swamy

Low-density parity-check convolutional codes (LDPC-CCs) complement their popular block-oriented counterparts and may be more suitable in certain communication applications. These include streaming voice, video, and packet switching networks. In order to use these codes efficiently we must generate termination sequences similar to those used in conventional convolutional codes. In this paper, we present a construction method for termination sequence generation circuits suitable for field-programmable gate arrays and application-specific integrated circuits. This method uses linear algebra to determine the termination sequence for a small number of states of the encoder and converts these solutions into a sequential circuit. Results are presented for several realizations of termination circuits for a (128,3,6) LDPC-CC


memory technology design and testing | 2002

An investigation into crosstalk noise in DRAM structures

Michael Redeker; Bruce F. Cockburn; Duncan G. Elliott

The 2001 ITRS roadmap predicts continued aggressive progress towards deep submicron linewidths for at least the next 15 years. In this article we describe the results of a simulation study into the effects of crosstalk among DRAM wordlines and bitlines for present and future technology nodes predicted by the roadmap. An analog simulator was used to solve the associated transmission line equations derived from Maxwells equations in the time domain. Hence, we not only considered interconnect resistances and capacitances, but also inductances and realistic wave propagation effects. The circuit parameters of the simulation models were extracted from standard DRAM geometries implied by the roadmap data. Various bitline-bitline and wordline-wordline coupling scenarios were then studied in simulation. Our results suggest that down until the 22-nm node, single bitline twisting will continue to be effective against bitline-bitline coupling, but that wordline-wordline coupling effects will become more problematic.


compound semiconductor integrated circuit symposium | 2007

Design and Test of a 175-Mb/s, Rate-1/2 (128,3,6) Low-Density Parity-Check Convolutional Code Encoder and Decoder

Ramkrishna Swamy; Stephen Bates; Tyler L. Brandon; Bruce F. Cockburn; Duncan G. Elliott; John C. Koob; Zhengang Chen

Low-density parity-check block codes (LDPC-BCs) are quickly becoming the forward error correcting code of choice for emerging communication standards. However, low-density parity-check convolutional codes (LDPC-CCs), the convolutional counterpart of LDPC-BCs, seem to be better suited in applications with streaming data or variable sized packets. A rate-1/2, (128,3,6) LDPC-CC ASIC has been implemented in 180-nm, 1.8-V CMOS technology. We present the VLSI architecture of a register-based LDPC-CC encoder and decoder that includes an on-chip, pseudo-random additive white Gaussian noise channel emulator. The decoder comprises a pipeline of ten identical processing units and attains up to 175 Mb/s of decoded throughput.


IEEE Transactions on Circuits and Systems | 2010

Jointly Designed Architecture-Aware LDPC Convolutional Codes and High-Throughput Parallel Encoders/Decoders

Zhengang Chen; Tyler L. Brandon; Duncan G. Elliott; Stephen Bates; Witold A. Krzymien; Bruce F. Cockburn

A novel design approach is proposed for low-density parity-check convolutional codes (LDPC-CCs), that jointly optimizes the code, encoder and decoder to achieve high-throughput parallel encoding and decoding. A series of implementation-oriented constraints are applied to construct architecture-aware (AA) codes by introducing algebraic structures into the parity-check matrix. The resulting AA codes have bit error rate performance comparable to other published LDPC-CCs. Given these AA LDPC-CCs, new architectures are proposed for a parallel LDPC-CC encoder with built-in termination and an LDPC-CC decoder that is parallel in the node dimension as well as pipelined in the iteration dimension. ASIC synthesis results for a 90-nm CMOS process show that the proposed encoder and the decoding processor achieve 2.0-Gbps throughputs at 250-MHz clock frequencies within silicon areas of 0.1 mm2 and 1 mm2 respectively.


global communications conference | 2006

CTH08-5: Efficient Encoding and Termination of Low-Density Parity-Check Convolutional Codes

Zhengang Chen; Stephen Bates; Duncan G. Elliott; Tyler L. Brandon

Low-density parity-check convolutional codes (LDPC-CCs) have been shown to have similar capacity-approaching performance to LDPC block codes. Their encoder structure is simple and efficient. However, the encoder termination, which is required when applied to finite length data frames, increases the encoder complexity and reduces the effective code rate. The LDPC-CC encoding and termination problems are discussed in this paper. A novel all-phase termination scheme is proposed with less implementation complexity and less loss in code rate, compared to existing methods. Finally a system architecture for the LDPC-CC encoder with all-phase termination is given with some analyses.


IEEE Transactions on Very Large Scale Integration Systems | 2005

Design of a 3-D fully depleted SOI computational RAM

John C. Koob; Daniel A. Leder; Raymond J. Sung; Tyler L. Brandon; Duncan G. Elliott; Bruce F. Cockburn; Lisa G. McIlrath

We introduce a three-dimensional (3-D) processor-in-memory integrated circuit design that provides progressively increasing processing power as the number of stacked dies increases, while incurring no extra design effort or mask sets. Innovative techniques for processor/memory redundancy and fast global bus evaluation are described. The architecture can be augmented with a nearest-neighbor physical 3-D communications network that can substantially reduce interconnect lengths and relieve routing congestion. The test chip, with 128 Kb of memory and 512 processing elements (PEs) on two fully depleted silicon-on-insulator (SOI) dies, can achieve a peak of 170 billion-bit-operations per second at 400 MHz.


memory technology design and testing | 1999

A comparative simulation study of four multilevel DRAMs

Gershom Birk; Duncan G. Elliott; Bruce F. Cockburn

Multilevel DRAM (MLDRAM) attempts to increase storage density by recording more than one bit per cell. Several different two-bit-per-cell schemes have been described in the literature; however it is difficult to compare them directly because the original papers use different technologies and operating conditions. This paper presents a detailed simulation study that compares three published MLDRAM schemes, along with a new MLDRAM scheme that combines the speed of a MLDRAM proposed by Furuyama et al. (1989) and the noise cancellation techniques of a MLDRAM proposed by Gillingham (1996). Our SPICE simulation models use the same array size and process models for each to allow us to make direct comparisons.


IEEE Transactions on Circuits and Systems | 2012

Deeply Pipelined Digit-Serial LDPC Decoding

Philip A. Marshall; Vincent C. Gaudet; Duncan G. Elliott

Highly parallel VLSI implementations of low-density parity-check (LDPC) decoders have a large number of interconnections, which can result in designs with low logic density. Bit-serial architectures have been developed that reduce the number of wires needed, however, they do not fully realize the potential for deeply pipelined serial data processing. Digit- online arithmetic allows operations to be performed in a serial, digit-by-digit manner, making it ideal for use in implementing a digit-serial LDPC decoder. Digit-online circuits for the primitive operations required for an offset min-sum LDPC decoder are simple, and allow deep pipelining at the digit level. A new hardware architecture for LDPC decoding is demonstrated, using redundant number systems for the internal representation of values. We present post-layout decoder results for the (2048, 1723) 10GBASE-T LDPC code in a general-purpose 65 nm CMOS technology. The decoder requires a core area of 10.89 mm and operates at a clock frequency of 980 MHz. The decoder can simultaneously decode two 4-bit frames at 41.8 Gbit/s or one 10-bit frame at 20.9 Gbit/s.

Collaboration


Dive into the Duncan G. Elliott's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jie Han

University of Alberta

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge