Navid Azizi
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Navid Azizi.
IEEE Transactions on Very Large Scale Integration Systems | 2003
Navid Azizi; Farid N. Najm; Andreas Moshovos
We introduce a novel family of asymmetric dual-V/sub t/ static random access memory cell designs that reduce leakage power in caches while maintaining low access latency. Our designs exploit the strong bias toward zero at the bit level exhibited by the memory value stream of ordinary programs. Compared to conventional symmetric high-performance cells, our cells offer significant leakage reduction in the zero state and, in some cases, also in the one state, albeit to a lesser extent. A novel sense amplifier, in combination with dummy bitlines, allows for read times to be on par with conventional symmetric cells. With one cell design, leakage is reduced by 7/spl times/ (in the zero state) with no performance degradation, but with a stability degradation of 6%. Another cell design reduces leakage by 2/spl times/ (in the zero state) with no performance or stability loss. An alternative cell design reduces leakage by 58/spl times/ (in the zero state) with a performance degradation of 1% and an area increase of 2.4% and no stability degradation.
field-programmable custom computing machines | 2004
Navid Azizi; Ian Kuon; Aaron Egier; Ahmad Darabiha; Paul Chow
Current high-performance applications are typically implemented on large-scale general-purpose distributed or multiprocessing systems often based on commodity microprocessors. Field-Programmable Gate Arrays (FPGAs) have now reached a level of sophistication that they too could be used for such applications. In this paper we explore the feasibility of using FPGAs to implement large-scale application-specific computations by way of a case study that implements a novel molecular dynamics system. The system has been designed such that it is scalable and parallelizable. On the Transmogrifier 3 (TM3), the system performs calculations on an 8,192 particle system in 37 seconds at 26 MHz. This implementation shows that by scaling to more modern parts running at 100 MHz, a speedup of over 20 x can be achieved compared to a state-of-the-art microprocessor. This can also be achieved at less cost, using less power and taking less space than a standard microprocessor-based system, while maintaining the computational precision required.
custom integrated circuits conference | 2006
Navid Azizi; Farid N. Najm
Modern integrated circuits require careful attention to the soft-error rate (SER) resulting from bit upsets, which are normally caused by alpha particle or neutron hits. These events, also referred to as single-event upsets (SEUs), will become more problematic in future technologies. This paper presents a binary content-addressable memory (CAM) design with high immunity to SEUs. Conventionally, error-correcting codes (ECC) have been used in SRAMs to address this issue, but these techniques are not immediately applicable to CAMs because they depend on processing the full contents of the memory word outside the array, which is not possible in a normal CAM access. The proposed design consists of a new matching technique that uses coding to increase the Hamming distance between words, in conjunction with a modified matchline sensing scheme. The result is a CAM design that reduces the SER with no increase in delay or power dissipation, and with only a 12% increase in area
international symposium on signals circuits and systems | 2004
Navid Azizi; Farid N. Najm
We introduce a new Static Random Access Memory (SRAM) cell that offers high stability and reduces gate leakage power in caches while maintaining low access latency. Our design exploits the strong bias towards zero at the bit level exhibited by the memory value stream of ordinary programs. Compared to conventional symmetric high-performance cell, our new cell reduces total leakage by more than 24% in the zero state at high temperature. With one cell design, total cache leakage is reduced by 24% at high temperature with no performance or stability loss. At low temperatures, where gate leakage is dominant, our cell reduces total cache leakage by 43%. We show that the new cell can be combined in an orthogonal fashion with asymmetric dual-V/sub t/ cells to lower both gate and subthreshold leakage, reducing total leakage by 45% to 60% with comparable performance and stability.
IEEE Transactions on Very Large Scale Integration Systems | 2005
Andreas Moshovos; Babak Falsafi; Farid N. Najm; Navid Azizi
In this paper, we make the case for building high-performance asymmetric-cell caches (ACCs) that employ recently-proposed asymmetric SRAMs to reduce leakage proportionally to the number of resident zero bits. Because ACCs target memory value content (independent of cell activity and access patterns), they complement prior proposals for reducing cache leakage that target memory access characteristics. Through detailed simulation and leakage estimation using a commercial 0.13-/spl mu/m CMOS process model, we show that: 1) on average 75% of resident data cache bits and 64% of resident instruction cache bits are zero; 2) while prior research carefully evaluated the fraction of accessed zero bytes, we show that a high fraction of accessed zero bytes is neither a necessary nor a sufficient condition for a high fraction of resident zero bits; 3) the zero-bit program behavior persists even when we restrict our attention to live data, thereby complementing prior leakage-saving techniques that target inactive cells; and 4) ACCs can reduce leakage on the average by 4.3/spl times/ compared to a conventional data cache without any performance loss, and by 9/spl times/ at the cost of a 5% increase in overall cache access latency.
design automation conference | 2006
Georges Nabaa; Navid Azizi; Farid N. Najm
Process induced threshold voltage variations bring about fluctuations in circuit delay that affect the FPGA timing yield. We propose an adaptive FPGA architecture that compensates for these fluctuations. The architecture includes an additional characterizer circuit that classifies logic and routing blocks on each die according to their performance. Base on this classification, the architecture adaptively body-biases these resources by either speeding up the slow blocks or by slowing down the leaky ones. This procedure mitigates the effect of the variations and provides a better yield. We further diminish leakage by slowing down areas of the FPGA that have a positive slack. Overall, this architecture minimizes the timing variance of within-die and die-to-die Vth variations by up to 3.45times and reduces leakage power in the non-critical areas of the FPGA by 3times with no effect on frequency
design automation conference | 2005
Navid Azizi; Muhammad M. Khellah; Vivek De; Farid N. Najm
We present a new methodology which takes into consideration the effect of within-die (WID) process variations on a low-voltage parallel system. We show that in the presence of process variations one should use a higher supply voltage than would otherwise be predicted to minimize the power consumption of parallel systems. Previous analyses, which ignored WID process variations, provide a lower nonoptimal supply voltage which can underestimate the energy/operation by 8.2X. We also present a novel technique to limit the effect of temperature variations in a parallel system. As temperatures increases, the scheme reduces the power increase by 43% allowing the system to remain at its optimal supply voltage across different temperatures.
design automation conference | 2006
Navid Azizi; Farid N. Najm
Modern integrated circuits require careful attention to the soft-error rate (SER) resulting from bit upsets, which are normally caused by alpha particle or neutron hits. These events, also referred to as single-event upsets (SEUs), will become more problematic in future technologies. This paper presents a ternary content-addressable memory (CAM) design with high immunity to SEU. Conventionally, error-correcting codes (ECC) have been used in SRAMs to address this issue, but these techniques are not immediately applicable to CAMs because they depend on processing the full contents of the memory word outside the array, which is not possible in a normal CAM access. We propose a family of TCAM cells that reduce the SER at the cost of some area increase. An SER reduction of up to 40% can be obtained with a 18% increase of area; another design reduces the SER by 16% with only a 5% increase in area
ieee international newcas conference | 2005
Navid Azizi; Farid N. Najm
We propose a fine-grained scheme to compensate for within-die variations in dynamic logic to reduce the variation in leakage, delay and noise margin through body-biasing. We first show that the amount of body-bias compensation needed depends on the correlation that exists between gates, and then analytically show the possible reduction in the variance of the leakage of both a single and multiple dynamic logic gates. We then design a circuit to implement the system which provides the reduction in the variance of the leakage, delay and noise margin of dynamic logic gates and show that it produces a close match to the analytical results. In our design, the variance of a typical test circuit is reduced by 27% and the variance of the path delay is reduced by 33%.
custom integrated circuits conference | 2005
Navid Azizi; Farid N. Najm
We propose new programmable FPGA look-up tables (LUTs) that can operate in two different modes: high-performance or low-power. Selection between the two modes is realized by an extra SRAM cell that can be shared by a number of LUTs. In high-performance mode, the LUTs provide similar power and performance to a conventional LUT. In low-power mode, one LUT reduces leakage by 53%, while another reduces leakage by 53% and 80% when outputting a logic-0 and logic-1 respectively, which can lead to an average leakage reduction of up to 76%. In low-power mode, delay is increased by 5% to 20% compared to a conventional LUT. The technique scales well and reduces further leakage for new FPGA architectures that use larger size LUTs.