Saeid Nooshabadi
Michigan Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saeid Nooshabadi.
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2007
Victor Navarro-Botello; Juan A. Montiel-Nelson; Saeid Nooshabadi
This brief presents a new CMOS logic family using the feedthrough evaluation concept and analyzes its sensitivity against technology parameters for practical applications. The feedthrough logic (FTL) allows for a partial evaluation in a computational block before its input signals are valid, and does a quick final evaluation as soon as the inputs arrive. The FTL is well suited to arithmetic circuits where the critical path is made of a large cascade of inverting gates. Furthermore, FTL based circuits perform better in high fanout and high switching frequencies due to both lower delay and dynamic power consumption. Experimental results, for practical circuits, demonstrate that low-power FTL provides for smaller propagation time delay (4.1 times), lower energy consumption (35.6%), and similar combined delay, power consumption and active area product (0.7% worst), while providing lower sensitivity to power supply, temperature, capacitive load and process variations than the standard CMOS technologies.
IEEE Transactions on Circuits and Systems | 2006
Michael Dyer; David Taubman; Saeid Nooshabadi
JPEG2000 is a recently standardized image compression algorithm. The heart of this algorithm is the coding scheme known as embedded block coding with optimal truncation (EBCOT). This contributes the majority of processing time to the compression algorithm. The EBCOT scheme consists of a bit-plane coder coupled to a MQ arithmetic coder. Recent bit-plane coder architectures are capable of producing symbols at a higher rate than the existing MQ arithmetic coders can absorb. Thus, there is a requirement for a high throughput MQ arithmetic coder. We examine the existing MQ arithmetic coder architectures and develop novel techniques capable of absorbing the high symbol rate from high performance bit-plane coders, as well as providing flexible design choices
IEEE Transactions on Circuits and Systems Ii-express Briefs | 2006
Jose C. Garcia; Juan A. Montiel-Nelson; Saeid Nooshabadi
A high speed and low power driver employing a single bootstrap capacitor is reported. It provides a six-fold improvement in the power dissipation, 15% higher speed, and 8.7% reduction in the active area when compared with the fastest reported driver (Chen et al., 2003) using bootstrap techniques, under similar loading conditions and circuit parameters
IEEE Transactions on Circuits and Systems for Video Technology | 2006
Saeid Nooshabadi; David Taubman; Michael Dyer
The block coder, which is a key module in the JPEG2000 image compression system, presents challenges for realization of a high-throughput, low-hardware-cost VLSI architecture. Though efficient architectures have been proposed for a block coder operating in specific modes, existing generic block coder architectures have low throughput versus hardware cost performance. In this paper, we present a low-cost, high-throughput VLSI architecture for a generic block coder. Concurrent symbol processing (CSP) is used to improve throughput of the block coders submodules, the bit plane coder (BPC) and arithmetic coder (AC). The proposed BPC processes one stripe-column/clock-cycle during every coding pass and generates up to 10 context-data (CxD) pairs/clock-cycle. The proposed AC processes two CxD/clock-cycles. Throughput is then further increased by using column speedup and novel run-mode skipping techniques at the BPC module. Hardware cost for the proposed block coder is reduced by using an optimal two-subbank BPC memory architecture. Additionally, image statistics are used to choose efficient configuration parameters for the VLSI architecture. The proposed block coder is implemented on Altera stratix FPGA and TSMC ASIC 0.18-mum platforms. Implementation results show that our block coder has average throughputs of 16.23 and 73.42 Msamples/s, respectively, on the FPGA and ASIC platforms. The block-coder test chip has 22515 gates and 2.33 mm 2 chip area. In comparison with similar existing architectures, it has the highest throughput versus hardware cost performance
midwest symposium on circuits and systems | 2004
David Taubman; Saeid Nooshabadi
The bit plane coder is a part of the JPEG2000 embedded block coder. Its throughput plays a key role in deciding the overall throughput of a JPEG2000 encoder. In this paper we present a parallel pipeline VLSI architecture for the bit plane encoder which processes a complete stripe-column concurrently during every pass. The hardware requirements and the critical path delay of the proposed technique are compared with the existing solutions. The experimental results show that the proposed architecture has 2.6 times greater throughput than existing architectures, with a comparatively small increase in hardware cost.
IEEE Transactions on Computers | 2011
Todor Mladenov; Saeid Nooshabadi; Keseon Kim
Raptor codes have been proven very suitable for mobile broadcast and multicast multimedia content delivery, and yet their computational complexity has not been investigated in the context of embedded systems. At the heart of Raptor codes are the matrix inversion and vector decoder operations. This paper analyzes the performance, energy profile, and resource implication of two matrix inversion and decoding algorithms; Gaussian elimination (GE) and third Generation Partnership Group (3GPP) standard (SA), for the Raptor decoder on a system on a chip (SoC) platform with a soft-core embedded processor. We investigate the effect of the cache size, memory type, and mapping on the performance of the two algorithms under consideration. We show that with an appropriate data to memory mapping, a speedup factor of 5.77 can be obtained for GE with respect to SA. This paper also proposes a dedicated peripheral hardware block that achieves 5.90 times better performance compared with the software, requiring an energy consumption that is lower by a factor of 5.5, when the symbol size and the data path word length are small (32 bits). We show that with parallel processing in hardware, using the wider word lengths, and employing bigger symbol sizes T, we can improve the performance, while reducing the energy consumption. Extending the hardware word length and symbol size T to 128 bits will result in a performance improvement factor of 6.73 in favor of the hardware; while energy consumption reduces by a factor of 3.8.
IEEE Transactions on Biomedical Circuits and Systems | 2010
Chul Kim; Saeid Nooshabadi
A novel tunable all-digital, ultrawideband pulse generator (PG) has been implemented in a standard 0.18-¿ m complementary metal-oxide semiconductor (CMOS) process for implantable medical applications. The chip shows that an ultra-low dynamic energy consumption of 27 pJ per pulse without static current flow at a 200-MHz pulse repetition frequency (PRF) with a 1.8-V power supply and low area of 90 × 50 ¿m2. The PG generates tunable pulsewidth, amplitude, and transmit (Tx) power by using simple circuitry, through precise timing control of the H-bridge output stage. The all-digital architecture allows easy integration into a standard CMOS process, thus making it the most suitable candidate for in-vivo biotelemetry applications.
international symposium on circuits and systems | 2007
Jose C. Garcia; Juan A. Montiel-Nelson; Saeid Nooshabadi
This paper reports the design of a high performance, adaptive low/high swing CMOS driver circuit (mj-driver) suitable for driving of global interconnects with large capacitive load. When implemented on 0.13mum CMOS technology, mj-driver performs 16% faster, reduces the power consumption by 3%, and energy delay product by 19% when compared with a counterpart driver in diode-connected configuration. On the other hand, mj-driver has 47% lower active area and only requires one set of sizing for optimum performance at 1 and 0.8V. Furthermore, unlike its counter part which exhibits 30% variation in output swing voltage with variation in the load, the output voltage swing for the proposed driver remains unchanged with the output load. Comparisons of the proposed driver with conventional full swing CMOS driver are presented as well, indicating a significant saving in energy, due to the reduced swing voltage. The proposed driver has the ability to switch from a low swing to high swing mode, through a line monitoring mechanism
design, automation, and test in europe | 2006
Jose C. Garcia; Juan A. Montiel-Nelson; Saeid Nooshabadi
This paper reports a high speed and low power consumption direct-indirect bootstrapped full-swing CMOS inverter driver circuit (bfi-driver). The simulation results, based on 0.13mum triple well CMOS technology, show that, when operated at IV, bfi-driver is 94% faster and consumes 22% less power compared to a counterpart direct bootstrap circuit (Bellaour, 1995)
IEEE Transactions on Consumer Electronics | 2012
Zafar Iqbal; Saeid Nooshabadi; Heung-No Lee
Use of Wireless communications for Metropolitan Area Network (MAN) in consumer electronics has increased significantly in the recent past. This paper, presents the performance analysis of four different channel coding and interleaving schemes for MIMO-OFDM communications systems. A comparison is done based on the BER, hardware implementation resources requirement, and power dissipation. It also presents a memory-efficient and low-latency interleaver implementation technique for the MIMO-OFDM communication system. It is shown that among the four coding and interleaving schemes studied, the cross-antenna coding and per-antenna interleaving performs the best under all SNR conditions and for all modulation schemes. It is also the best scheme as far as the hardware resource implication and power dissipation are concerned, which are particularly important in the context of consumer electronics. Next, using the proposed interleaver, a MIMO-OFDM based transmitter employing a double data stream 2×2 MIMO spatial multiplexing system is built.