Mehdi Saligane
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mehdi Saligane.
international solid-state circuits conference | 2016
Yiqun Zhang; Mahmood Khayatzadeh; Kaiyuan Yang; Mehdi Saligane; Nathaniel Ross Pinckney; Massimo Alioto; David T. Blaauw; Dennis Sylvester
It is well known that technology scaling has led to increasing process/voltage/temperature/aging margins that substantially degrade performance and power in modern processors and SoCs. One approach to address these large timing margins is the use of specialized registers on critical paths that perform error detection and correction (EDAC) [1-5]. While promising, the previously proposed implementations have been limited in several ways. Most notably, they often incur large overheads beyond conventional register designs (e.g., 8-to-44 additional transistors per register). This becomes an obstacle for commercial designs and, hence, there have been no reported implementations of EDAC approaches within substantial commercial processors. Finally, the performance gain from EDAC approaches has not been thoroughly quantified in relation to competing, lower overhead approaches such as frequency binning and canary circuits/critical path monitors [6].
symposium on vlsi circuits | 2016
Yiqun Zhang; Kaiyuan Yang; Mehdi Saligane; David T. Blaauw; Dennis Sylvester
An AES hardware accelerator targeting energy efficient, low cost mobile and IoT applications is fabricated in 40nm CMOS. The proposed design eliminates the ShiftRow stage in conventional AES implementations and replaces flip-flops in data and key storage with latches using re-timing, saving 25% area and 69% power. Along with a 2-stage Sbox in native GF(24)2 composite-field computation and glitch reduction techniques, this results in a compact 2228 gate design achieving 446 Gbps/W and 46.2 Mbps throughput at 0.47V.
international solid-state circuits conference | 2016
Mahmood Khayatzadeh; Mehdi Saligane; Jingcheng Wang; Massimo Alioto; David T. Blaauw; Dennis Sylvester
SRAM is a key building block in systems-on-chip and usually limits their voltage scalability, due to the major impact of process/voltage/temperature (PVT) variations at low voltages [1]. Assist techniques to extend SRAM operating voltage range improve the bit cell read/write stability [1-5], but cannot mitigate variations in the internal sensing delay that is needed to develop the targeted bitline (BL) voltage. Hence, large guard bands and performance margins are still needed to ensure correct operation. These margins increase as supply voltage is lowered (Fig. 17.3.1) and must be addressed especially when the SRAM is coupled with margin-less processor designs (e.g., Razor).
custom integrated circuits conference | 2015
Mehdi Saligane; Mahmood Khayatzadeh; Yiqun Zhang; Seokhyeon Jeong; David T. Blaauw; Dennis Sylvester
Accurate, compact thermal sensors are desirable in many applications, including on-chip temperature monitoring for processors with dynamic throttling and reliability management. Modern thermal sensors are limited in either area, robustness, or accuracy. This work sidesteps strong linearity requirements for reference and PTAT elements in the sensor by performing a higher-order fitting of more relaxed PTAT and CTAT elements using an embedded calculation compute engine. A compact 24 × 10μm sensing element (40nm CMOS) achieves inaccuracy of <;1°C across 30 chips with 2-point calibration and a resolution of 0.02°C.
symposium on vlsi circuits | 2017
Qing Dong; Supreet Jeloka; Mehdi Saligane; Yejoong Kim; Masaru Kawaminami; Akihiko Harada; Satoru Miyoshi; David T. Blaauw; Dennis Sylvester
A 4+2T SRAM is proposed that offers searching and logic functions. The cell uses the N-well as the write wordline (WL) and eliminates the access transistors. Decoupled read paths enable reliable multi-word activation for in-memory Boolean logic functions. The SRAM can reconfigure to BCAM/TCAM for searching operations, with 0.13fJ/search/bit at 0.35V. Forty test chips in 55nm deeply depleted channel (DDC) technology achieve worst-case 0.3 V VDDmin.
IEEE Journal of Solid-state Circuits | 2018
Yiqun Zhang; Mahmood Khayatzadeh; Kaiyuan Yang; Mehdi Saligane; Nathaniel Ross Pinckney; Massimo Alioto; David T. Blaauw; Dennis Sylvester
This paper presents iRazor, a lightweight error detection and correction approach, to suppress the cycle time margin that is traditionally added to very large scale integration systems to tolerate process, voltage, and temperature variations. iRazor is based on a novel current-based detector, which is embedded in flip-flops on potentially critical paths. The proposed iRazor flip-flop requires only three additional transistors, yielding only 4.3% area penalty over a standard D flip-flop. The proposed scheme is implemented in an ARM Cortex-R4 microprocessor in 40 nm through an automated iRazor flip-flop insertion flow. To gain an insight into the effectiveness of the proposed scheme, iRazor is compared to other popular techniques that mitigate the impact of variations, through the analysis of the worst case margin in 40 silicon dies. To the best of the authors’ knowledge, this is the first paper that compares the measured cycle time margin and the power efficiency improvements offered by frequency binning and various canary approaches. Results show that iRazor achieves 26%–34% performance gain and 33%–41% energy reduction compared to a baseline design across the 0.6- to 1-V voltage range, at the cost of 13.6% area overhead.
IEEE Journal of Solid-state Circuits | 2018
Ziyun Li; Qing Dong; Mehdi Saligane; Benjamin P. Kempke; Luyao Gong; Zhengya Zhang; Ronald G. Dreslinski; Dennis Sylvester; David T. Blaauw; Hun-Seok Kim
This paper presents a single-chip, high-performance, and energy-efficient stereo vision depth-estimation processor for micro aerial vehicles (MAVs). The proposed processor implements the state-of-the-art semi-global matching (SGM) algorithm to deliver full high-definition (HD, 1920
symposium on vlsi circuits | 2017
Yu Zeng; Tae-Kwang Jang; Qing Dong; Mehdi Saligane; Dennis Sylvester; David T. Blaauw
{\times }
international solid-state circuits conference | 2017
Ziyun Li; Qing Dong; Mehdi Saligane; Benjamin P. Kempke; Shijia Yang; Zhengya Zhang; Ronald G. Dreslinski; Dennis Sylvester; David T. Blaauw; Hun-Seok Kim
1080) stereo-depth outputs with a maximum of 38 frames/s throughput. Algorithm-architecture co-optimization is conducted, introducing overlapping block-based processing that eliminates very large on-chip memory and off-chip DRAM. We exploit inherent data parallelism in the algorithm by processing 128 local disparity costs and aggregating the SGM costs along four paths for all 128 disparities in parallel. A dependence-resolving scan associated with 16-stage deep pipeline is introduced to hide the data dependence between neighboring pixels in the SGM algorithm. Moreover, we propose a customized ultra-high bandwidth dual-port SRAM that utilizes the unique memory access characteristic of SGM to achieve highly energy-efficient memory access at a very high on-chip memory bandwidth of 1.64 Tb/s. The fabricated processor produces 512 levels of depth information for each pixel at full HD resolution with 30-frames/s performance, consuming 836 mW from a 0.75-V supply in TSMC 40-nm GP CMOS. We ported the design on a quadcopter MAV to demonstrate its performance in realistic real-time flight.
IEEE Journal of Solid-state Circuits | 2018
Qing Dong; Supreet Jeloka; Mehdi Saligane; Yejoong Kim; Masaru Kawaminami; Akihiko Harada; Satoru Miyoshi; Makoto Yasuda; David T. Blaauw; Dennis Sylvester
This paper presents a PLL-assisted crystal oscillator using a current switching phase detector (PD) with intrinsic 90° phase offset for IoT applications. The PLL provides accurate pulse injection timing into the XO, sustaining its oscillation at only 100mV amplitude and ensuring robustness operation across PVT. This technique achieves high energy injection efficiency and avoids the use of power hungry amplifiers. Measured power is 1.7nW at room temperature and operation is demonstrated from −20–80°C and across 3 corner wafers.