Ahmad Darabiha
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ahmad Darabiha.
IEEE Journal of Solid-state Circuits | 2008
Ahmad Darabiha; A. Chan Carusone; Frank R. Kschischang
This paper investigates VLSI architectures for low-density parity-check (LDPC) decoders amenable to low-voltage and low-power operation. First, a highly-parallel decoder architecture with low routing overhead is described. Second, we propose an efficient method to detect early convergence of the iterative decoder and terminate the computations, thereby reducing dynamic power. We report on a bit-serial fully-parallel LDPC decoder fabricated in a 0.13-mum CMOS process and show how the above techniques affect the power consumption. With early termination, the prototype is capable of decoding with 10.4 pJ/bit/iteration, while performing within 3 dB of the Shannon limit at a BER of 10-5 and with 3.3 Gb/s total throughput. If operated from a 0.6 V supply, the energy consumption can be further reduced to 2.7 pJ/bit/iteration while maintaining a total throughput of 648 Mb/s, due to the highly-parallel architecture. To demonstrate the applicability of the proposed architecture for longer codes, we also report on a bit-serial fully-parallel decoder for the (2048, 1723) LDPC code in 10 GBase-T standard synthesized with a 90-nm CMOS library.
computer vision and pattern recognition | 2003
Ahmad Darabiha; Jonathan Rose; W. James MacLean
This paper describes the implementation of a stereo depth measurement algorithm in hardware on field programmable gate arrays (FPGAs). This system generates 8 bit sub-pixel disparities on 256 by 360 pixel images at video rate (30 frames/sec). The algorithm implemented is a multi-resolution, multi-orientation phase-based technique called local weighted phase-correlation (Fleet, 1994). Hardware implementation speeds up the performance more than 300 times that of the same algorithm running in software. In this paper, we describe the programmable hardware platform, the base stereo vision algorithm and the design of the hardware. We include various trade-offs required to make the hardware small enough to fit on our system and fast enough to work at video rate. We also show sample outputs from the functioning hardware. Although this paper is specifically focused on phase-based stereo vision FPGA realizations, most of the design issues are common to other DSP and vision applications.
international symposium on circuits and systems | 2006
Ahmad Darabiha; Anthony Chan Carusone; Frank R. Kschischang
We propose a bit-serial LDPC decoding scheme to reduce interconnect complexity in fully-parallel low-density parity-check decoders. Bit-serial decoding also facilitates efficient implementation of wordlength-programmable LDPC decoding which is essential for gear shift decoding. To simplify the implementation of bit-serial decoding we propose a new approximation to the check update function in the min-sum decoding algorithm. The new check update rule computes only the absolute minimum and applies a correction to outgoing messages if required. We present a 650-Mbps bit-serial (480, 355) RS-based LDPC decoder implemented on a single Altera Stratix EP1S80 FPGA device. To our knowledge, this is the fastest FPGA-based LDPC decoder reported in the literature
international symposium on circuits and systems | 2005
Ahmad Darabiha; Anthony Chan Carusone; Frank R. Kschischang
A 3.2-Gbit/sec 2048-bit parallel LDPC decoder is implemented in a 0.18 /spl mu/m CMOS process. We employ two new techniques to address the interconnect problem: A broadcasting technique reduces the total amount of check-to-variable interconnect wires by more than 40%. A hierarchical placement algorithm places the variable and check nodes in the top-level hierarchy of the design and reduces the maximum wire length by up to 50%.
field-programmable custom computing machines | 2004
Navid Azizi; Ian Kuon; Aaron Egier; Ahmad Darabiha; Paul Chow
Current high-performance applications are typically implemented on large-scale general-purpose distributed or multiprocessing systems often based on commodity microprocessors. Field-Programmable Gate Arrays (FPGAs) have now reached a level of sophistication that they too could be used for such applications. In this paper we explore the feasibility of using FPGAs to implement large-scale application-specific computations by way of a case study that implements a novel molecular dynamics system. The system has been designed such that it is scalable and parallelizable. On the Transmogrifier 3 (TM3), the system performs calculations on an 8,192 particle system in 37 seconds at 26 MHz. This implementation shows that by scaling to more modern parts running at 100 MHz, a speedup of over 20 x can be achieved compared to a state-of-the-art microprocessor. This can also be achieved at less cost, using less power and taking less space than a standard microprocessor-based system, while maintaining the computational precision required.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2008
Ahmad Darabiha; Anthony Chan Carusone; Frank R. Kschischang
Two design techniques are proposed for high-throughput low-density parity-check (LDPC) decoders. A broadcasting technique mitigates routing congestion by reducing the total global wirelength. An interlacing technique increases the decoder throughput by processing two consecutive frames simultaneously. The brief discusses how these techniques can be used for both fully parallel and partially parallel LDPC decoders. For fully parallel decoders with code lengths in the range of a few thousand bits, the half-broadcasting technique reduces the total global wirelength by about 26% without any hardware overhead. The block interlacing scheme is applied to the design of two fully parallel decoders, increasing the throughput by 60% and 71% at the cost of 5.5% and 9.5% gate count overhead, respectively.
machine vision applications | 2006
Ahmad Darabiha; W. James MacLean; Jonathan Rose
This paper describes the implementation of a stereo-vision system using Field Programmable Gate Arrays (FPGAs). Reconfigurable hardware, including FPGAs, is an attractive platform for implementing vision algorithms due to its ability to exploit parallelism often found in these algorithms, and due to the speed with which applications can be developed as compared to hardware. The system outputs 8-bit, subpixel disparity estimates for 256× 360 pixel images at 30,fps. A local-weighted phase correlation algorithm for stereo disparity [Fleet, D. J.: {Int. Conf. Syst. Man Cybernetics 1:48–54 (1994)] is implemented. Despite the complexity of performing correlations on multiscale, multiorientation phase data, the system runs as much as 300 times faster in hardware than its software implementation. This paper describes the hardware platform used, the algorithm, and the issues encountered during its hardware implementation. Of particular interest is the implementation of multiscale, steerable filters, which are widely used in computer vision algorithms. Several trade-offs (reducing the number of filter orientations from three to two, using fixed-point computation, changing the location of one localized low-pass filter, and using L1 instead of L2 norms) were required to both fit the design into the available hardware and to achieve video-rate processing. Finally, results from the system are given both for synthetic data sets as well as several standard stereo-pair test images.
custom integrated circuits conference | 2007
Ahmad Darabiha; Anthony Chan Carusone; Frank R. Kschischang
A bit-serial architecture for multi-Gbps LDPC decoding is demonstrated to alleviate the routing congestion which is the main limitation for LDPC decoders. We report on a 3.3-Gbps 0.13-μm CMOS prototype. It occupies 7.3-mm2 core area with 1416-mW maximum power consumption from a 1.2-V supply. We demonstrate how early termination and supply voltage scaling can improve the decoder energy efficiency. Finally, the same architecture is applied to a (2048, 1723) LDPC code compliant with the 10GBase-T standard.
field programmable gate arrays | 2004
Ian Kuon; Navid Azizi; Ahmad Darabiha; Aaron Egier; Paul Chow
Current high-performance supercomputing applications are typically implemented on large-scale general-purpose distributed or multiprocessing systems often based on commodity microprocessors. FPGAs have now reached a level of sophistication that they too could be used for such applications. We explore the feasibility of using FPGAs to implement large-scale application-specific computations by way of a case study that implements a novel Molecular Dynamics system. The system has been designed such that it is scalable and parallelizable. On the Transmogrifier 3, the system performs calculations on an 8,192 particle system in 37 seconds at 26MHz. This implementation shows that by scaling to more modern parts running at 100MHz and using a better architecture, a speedup of over 20x can be achieved compared to a state-of-the-art microprocessor. This can also be achieved at less cost, using less power and taking less space than a standard microprocessor-based system, while maintaining the computational precision required.
Archive | 2003
Ahmad Darabiha