Richard Dorrance
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard Dorrance.
international electron devices meeting | 2012
Juan G. Alzate; P. Khalili Amiri; Pramey Upadhyaya; Sergiy Cherepov; Jian Zhu; Mark Lewis; Richard Dorrance; J. A. Katine; J. Langer; K. Galatsis; Dejan Markovic; Ilya Krivorotov; Kang L. Wang
We demonstrate voltage-induced (non-STT) switching of nanoscale, high resistance voltage-controlled magnetic tunnel junctions (VMTJs) with pulses down to 10 ns. We show ~10x reduction in switching energies (compared to STT) with leakage currents <; 105 A/cm2. Switching dynamics, from quasi-static to the nanosecond regime, are studied in detail. Finally, a strategy for eliminating the need for external magnetic-fields, where switching is performed by set/reset voltages of different amplitudes but same polarity, is proposed and verified experimentally.
field programmable gate arrays | 2014
Richard Dorrance; Fengbo Ren; Dejan Markovic
Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing units (GPUs) have become the status quo for computing SpMxV. However, the computational throughput of these libraries for sparse matrices tends to be significantly lower than that of dense matrices, mostly due to the fact that the compression formats required to efficiently store sparse matrices mismatches traditional computing architectures. This paper describes an FPGA-based SpMxV kernel that is scalable to efficiently utilize the available memory bandwidth and computing resources. Benchmarking on a Virtex-5 SX95T FPGA demonstrates an average computational efficiency of 91.85%. The kernel achieves a peak computational efficiency of 99.8%, a >50x improvement over two Intel Core i7 processors (i7-2600 and i7-4770) and showing a >300x improvement over two NVIDA GPUs (GTX 660 and GTX Titan), when running the MKL and cuSPARSE sparse-BLAS libraries, respectively. In addition, the SpMxV FPGA kernel is able to achieve higher performance than its CPU and GPU counterparts, while using only 64 single-precision processing elements, with an overall 38-50x improvement in energy efficiency.
IEEE Transactions on Magnetics | 2015
Hochul Lee; Juan G. Alzate; Richard Dorrance; Xue Qing Cai; Dejan Markovic; Pedram Khalili Amiri; Kang L. Wang
A high-speed and low-power prepared and write sense amplifier (PWSA) is presented for magnetoresistive RAM (MRAM). The sense amplifier incorporates a writing circuit for MRAM bits switched via timing of precessional dynamics (~GHz speed) in a magnetic tunnel junction (MTJ). By combining read and write functions in a single power-efficient circuit, the PWSA allows for fast read and write operations while minimizing the bit error rate after data programming. The PWSA circuit is designed based on a 65 nm CMOS technology, and the magnetic dynamics are captured by a Verilog-A compact model based on macrospin behavior for MTJs. Using the preread and comparison steps in the data program operation, we are able to reduce write power consumption by up to 50% under random data input conditions. Furthermore, using the voltage-controlled magnetic anisotropy effect for precessional switching, more than 10× reduction of write power and transistor size both in the memory cell and the write circuit is achieved, compared with using the spin transfer torque effect. The circuit achieves 2 ns read time, 1.8 ns write time, and 8 ns total data program operation time (consisting of two read steps, one write step, and a pass/fail check step) using this PWSA concept, and a 2× larger sensing margin through the current feedback circuit.
international symposium on quality electronic design | 2012
Fengbo Ren; Henry Park; Richard Dorrance; Yuta Toriyama; C.-K. Ken Yang; Dejan Markovic
With scaling of CMOS and Magnetic Tunnel Junction (MTJ) devices, conventional low-current reading techniques for STT-RAMs face challenges in achieving reliability and performance improvements that are expected from scaled devices. The challenges arise from the increasing variability of the CMOS sensing current and the reduction in MTJ switching current. This paper proposes a short-pulse reading circuit, based on a body-voltage sensing scheme to mitigate the scaling issues. Compared to existing sensing techniques, our technique shows substantially higher read margin (RM) despite a much shorter sensing time. A narrow current pulse applied to an MTJ significantly reduces the probability of read disturbance. The RM analysis is validated by Monte-Carlo simulations in a 65-nm CMOS technology with both CMOS and MTJ variations considered. Simulation results show that our technique is able to provide over 300 mV RM at a GHz frequency across process-voltage-temperature (PVT) variations, while the reference designs require 4.3 ns and 2.3 ns sensing time for a 200 mV RM, respectively. The effective read energy per bit required by the proposed sensing circuit is around 195 ft in the nominal case.
IEEE Electron Device Letters | 2013
Richard Dorrance; Juan G. Alzate; Sergiy Cherepov; Pramey Upadhyaya; Ilya Krivorotov; J. A. Katine; Juergen Langer; Kang L. Wang; Pedram Khalili Amiri; Dejan Markovic
This letter presents a diode-magnetic tunnel junction (MTJ) magnetic random access memory cell in a 65-nm complimentary metal-oxide-semiconductor compatible process. A voltage-controlled magnetic anisotropy switching mechanism, in addition to STT, allows for a unipolar set/reset write scheme, where voltage pulses of the same polarity, but different amplitudes, are used to switch the MTJs. A small crossbar array is constructed from 65-nm MTJs fabricated on a silicon wafer, with switching voltages ~ 1 V and thermal stability greater than 10 years, with discrete germanium diodes as access devices to allow for read/write operations. The crossbar architecture can be extended to multiple layers to create a 3-D stackable, nonvolatile memory with a sub-1F2 effective cell size.
international symposium on nanoscale architectures | 2011
Henry Park; Richard Dorrance; Amr Amin; Fengbo Ren; Dejan Markovic; C.-K. Ken Yang
Density of STT-RAMs is limited by the area cost and width of the access device in a cell since it needs to support the programming currents. This paper explores a cell structure that shares each cells access transistor with multiple MTJ memory elements. Feasibility and limitations of such a cell structure is explored for both reading and writing of the memory. The analytical and simulation results indicate that only small amount of sharing is possible and having MTJs that can handle a high read current without disturbing the cell is needed.
symposium on vlsi circuits | 2016
Richard Dorrance; Dejan Markovic
A DSP for sparse-BLAS is realized in 40nm CMOS. Featuring an efficient data stream reordering scheme and an intelligent, CSC-aware memory controller, the DSP achieves a peak energy efficiency of 190 GFLOPS/W at 0.6V, 160MHz, and a peak performance of 4.12 GFLOPS at 1V, 515MHz showing more than 6,600×, 2,700×, 1,100×, and 450× higher energy efficiency than state-of-the-art CPU, GPU, DSP, and FPGA hardware designs, respectively.
IEEE Transactions on Electron Devices | 2012
Richard Dorrance; Fengbo Ren; Yuta Toriyama; Amr Amin Hafez; Chih Kong Ken Yang; Dejan Markovic
Archive | 2013
Pedram Khalili Amiri; Richard Dorrance; Dejan Markovic; Kang L. Wang
Archive | 2013
Pedram Khalili Amiri; Richard Dorrance; Dejan Markovic; Kang L. Wang