Is this you? Create Your Porfile

Alireza Shafaei

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alireza Shafaei is active.

Explore More

Publication

Featured researches published by Alireza Shafaei.

design automation conference | 2013

Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures

Alireza Shafaei; Mehdi Saeedi; Massoud Pedram

Optimization of the interaction distance between qubits to map a quantum circuit into one-dimensional quantum architectures is addressed. The problem is formulated as the Minimum Linear Arrangement (MinLA) problem. To achieve this, an interaction graph is constructed for a given circuit, and multiple instances of the MinLA problem for selected subcircuits of the initial circuit are formulated and solved. In addition, a lookahead technique is applied to improve the cost of the proposed solution which examines different subcircuit candidates. Experiments on quantum circuits for quantum Fourier transform and reversible benchmarks show the effectiveness of the approach.

ieee computer society annual symposium on vlsi | 2014

FinCACTI: Architectural Analysis and Modeling of Caches with Deeply-Scaled FinFET Devices

Alireza Shafaei; Yanzhi Wang; Xue Lin; Massoud Pedram

This paper presents FinCACTI, a cache modeling tool based on CACTI which also supports deeply-scaled FinFET devices as well as more robust SRAM cells. In particular, FinFET devices optimized using advanced device simulators for 7nm process serve as the case study of the paper. Based on this 7nm FinFET process, characteristics of 6T and 8T SRAMs are calculated, and comparison results show that under the same stability requirements the 8T cell has smaller area and leakage power. SRAM and technological parameters of the 7nm FinFET are then incorporated into FinCACTI. According to architecture-level simulations, the 8T SRAM is suggested as the choice of memory cell for 7nm FinFET. Moreover, a 4MB cache in 7nm FinFET compared with 22nm (32nm) CMOS under same access latencies achieves 5x (9x) and 11x (24x) reduction in read energy and area, respectively.

ieee computer society annual symposium on vlsi | 2014

5nm FinFET Standard Cell Library Optimization and Circuit Synthesis in Near-and Super-Threshold Voltage Regimes

Qing Xie; Xue Lin; Yanzhi Wang; Mohammad Javad Dousti; Alireza Shafaei; Majid Ghasemi-Gol; Massoud Pedram

FinFET device has been proposed as a promising substitute for the traditional bulk CMOS-based device at the nanoscale, due to its extraordinary properties such as improved channel controllability, high ON/OFF current ratio, reduced short-channel effects, and relative immunity to gate line-edge roughness. In addition, the near-ideal subthreshold behavior indicates the potential application of FinFET circuits in the near-threshold supply voltage regime, which consumes an order of magnitude less energy than the regular strong-inversion circuits operating in the super-threshold supply voltage regime. This paper presents a design flow of creating standard cells by using the FinFET 5nm technology node, including both near-threshold and super-threshold operations, and building a Liberty-format standard cell library. The circuit synthesis results of various combinational and sequential circuits based on the 5nm FinFET standard cell library show up to 40X circuit speed improvement and three orders of magnitude energy reduction compared to those of 45nm bulk CMOS technology.

asia and south pacific design automation conference | 2014

Qubit placement to minimize communication overhead in 2D quantum architectures

Alireza Shafaei; Mehdi Saeedi; Massoud Pedram

Regular, local-neighbor topologies of quantum architectures restrict interactions to adjacent qubits, which in turn increases the latency of quantum circuits mapped to these architectures. To alleviate this effect, optimization methods that consider qubit-to-qubit interactions in 2D grid architectures are presented in this paper. The proposed approaches benefit from Mixed Integer Programming (MIP) formulation for the qubit placement problem. Simulation results on various benchmarks show 27% on average reduction in communication overhead between qubits compared to best results of previous work.

design, automation, and test in europe | 2013

Reversible logic synthesis of k -input, m -output lookup tables

Alireza Shafaei; Mehdi Saeedi; Massoud Pedram

Improving circuit realization of known quantum algorithms by CAD techniques has benefits for quantum experimentalists. In this paper, we address the problem of synthesizing a given k-input, m-output lookup table (LUT) by a reversible circuit. This problem has interesting applications in the famous Shors number-factoring algorithm and in quantum walk on sparse graphs. For LUT synthesis, our approach targets the number of control lines in multiple-control Toffoli gates to reduce synthesis cost. To achieve this, we propose a multi-level optimization technique for reversible circuits to benefit from shared cofactors. To reuse output qubits and/or zero-initialized ancillae, we un-compute intermediate cofactors. Our experiments reveal that the proposed LUT synthesis has a significant impact on reducing the size of modular exponentiation circuits for Shors quantum factoring algorithm, oracle circuits in quantum walk on sparse graphs, and the well-known MCNC benchmarks.

high-performance computer architecture | 2017

Pilot Register File: Energy Efficient Partitioned Register File for GPUs

Mohammad Abdel-Majeed; Alireza Shafaei; Hyeran Jeon; Massoud Pedram; Murali Annavaram

GPU adoption for general purpose computing hasbeen accelerating. To support a large number of concurrentlyactive threads, GPUs are provisioned with a very large registerfile (RF). The RF power consumption is a critical concern. Oneoption to reduce the power consumption dramatically is touse near-threshold voltage(NTV) to operate the RF. However, operating MOSFET devices at NTV is fraught with stabilityand reliability concerns. The adoption of FinFET devices inchip industry is providing a promising path to operate theRF at NTV while satisfactorily tackling the stability andreliability concerns. However, the fundamental problem of NTVoperation, namely slow access latency, remains. To tackle thischallenge in this paper we propose to build a partitioned RFusing FinFET technology. The partitioned RF design exploitsour observation that applications exhibit strong preference toutilize a small subset of their registers. One way to exploitthis behavior is to cache the RF content as has been proposedin recent works. However, caching leads to unnecessary areaoverheads since a fraction of the RF must be replicated. Furthermore, we show that caching is not efficient as weincrease the number of issued instructions per cycle, which isthe expected trend in GPU designs. The proposed partitionedRF splits the registers into two partitions: the highly accessedregisters are stored in a small RF that switches betweenhigh and low power modes. We use the FinFETs back gatecontrol to provide low overhead switching between the twopower modes. The remaining registers are stored in a largeRF partition that always operates at NTV. The assignment ofthe registers to the two partitions will be based on statisticscollected by the a hybrid profiling technique that combines thecompiler based profiling and the pilot warp profiling techniqueproposed in this paper. The partitioned FinFET RF is able tosave 39% and 54% of the RF leakage and the dynamic energy, respectively, and suffers less than 2% performance overhead.

great lakes symposium on vlsi | 2014

Squash: a scalable quantum mapper considering ancilla sharing

Mohammad Javad Dousti; Alireza Shafaei; Massoud Pedram

Quantum algorithms for solving problems of interesting size often result in circuits with a very large number of qubits and quantum gates. Fortunately, these algorithms also tend to contain a small number of repetitively-used quantum kernels. Identifying the quantum logic blocks that implement such quantum kernels is critical to the complexity management for realizing the corresponding quantum circuit. Moreover, quantum computation requires some type of quantum error correction coding to combat decoherence, which in turn results in a large number of ancilla qubits in the circuit. Sharing the ancilla qubits among quantum operations (even though this sharing can increase the overall circuit latency) is important in order to curb the resource demand of the quantum algorithm. This paper presents a multi-core reconfigurable quantum processor architecture, called Requp, which supports a layered approach to mapping a quantum algorithm and ancilla sharing. More precisely, a scalable quantum mapper, called Squash, is introduced, which divides a given quantum circuit into a number of quantum kernels--each kernel comprises k parts such that each part will run on exactly one of k available cores. Experimental results demonstrate that Squash can handle large-scale quantum algorithms while providing an effective mechanism for sharing ancilla qubits.

IEEE Circuits and Systems Magazine | 2016

Layout Optimization for Quantum Circuits with Linear Nearest Neighbor Architectures

Massoud Pedram; Alireza Shafaei

This paper is concerned with the physical design of quantum logic circuits. More precisely, it addresses the problem of minimizing the number of required qubit reorderings (achieved by inserting explicit SWAP gates) when mapping a quantum circuit into a linear nearest neighbor quantum archi-tecture. First, an interaction graph that captures the interaction distances among various qubits in the quantum circuit is constructed. The interaction graph is utilized to partition the quantum circuit into a set of subcircuits such that the number of required qubit reoderings within each subcircuit is provably no more than a given threshold. Next, a Minimum Linear Arrangement problem for each subcircuit is formulated and solved to achieve the minimum number of internal qubit reorderings and determine the subcircuit input and output qubit orderings. Finally, a bubble sort algorithm is repeatedly employed to minimize the number of qubit reorderings that are required between the consecutive subcircuits. Experiments done on various quantum Fourier transform circuits as well as various reversible logic circuits demonstrate the effectiveness of the proposed approach.

reversible computation | 2013

Constant-Factor optimization of quantum adders on 2d quantum architectures

Mehdi Saeedi; Alireza Shafaei; Massoud Pedram

Quantum arithmetic circuits have practical applications in various quantum algorithms. In this paper, we address quantum addition on 2-dimensional nearest-neighbor architectures based on the work presented by Choi and Van Meter (JETC 2012). To this end, we propose new circuit structures for some basic blocks in the adder, and reduce communication overhead by adding concurrency to consecutive blocks and also by parallel execution of expensive Toffoli gates. The proposed optimizations reduce total depth from

international symposium on quality electronic design | 2016