Mihai Lefter
Delft University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mihai Lefter.
international conference on optimization of electrical and electronic equipment | 2012
Saleh Safiruddin; Demid Borodin; Mihai Lefter; George Razvan Voicu; Sorin Cotofana
Achieving dependable computing systems is becoming increasingly more difficult as CMOS integrated circuits technology scaling reaches sub-22nm ranges and faces physical limitations. Dependable computing is also a major concern with the various new technologies that are being investigated to overcome the physical limitations of CMOS technology. 3D integration, though initially proposed as a way of achieving speedup of integrated circuits without the need for scaling, offers many new opportunities for dependable computing. 3D integration adds two new dimensions to the design space: (i) the z-dimension, as now the application can be mapped on parts of the circuit that are placed in different planes, and (ii) the R-dimension as different planes can be selected with different reliabilities. This greatly expands the solution space and provides many opportunities to deal with new and existing challenges. In this paper we identify important strategies to achieve dependable computing by exploring the opportunities that 3D integration offers. We present systems level approaches for alleviating underlying technology reliability shortcomings and investigate the opportunities opened up by TSV-based 3D integration with emphasis on the system reliability point of view. Our investigation clearly indicates that the proposed 3D dependable computing paradigms, if developed and further explored, can facilitate the continuation of the trend of reducing package size and increasing transistor densities, and allow for the successful utilization of novel emerging unreliable devices.
international symposium on circuits and systems | 2013
Marius Enachescu; Mihai Lefter; Antonios Bazigos; Adrian M. Ionescu; Sorin Cotofana
In this paper, we introduce a Nano-Electro-Mechanical Field Effect Transistor (NEMFET) based logic family tailored to the implementation of low speed and ultra low energy functional units and processors. Basic Boolean gates implemented with NEMFETs only are analysed and compared against equivalent CMOS realisations. Our simulations suggest that the proposed short-circuit current free NEMFET gates exhibit up to 10x dynamic energy reduction and up to 2 orders of magnitude less leakage, at the expense of 10 to 20x slower operation, when compared with CMOS counterparts. We also analyse the fan-in influence on gate performance and observe that NEMFET the gate energy advantage increases with fan-in. Finally, we consider a 3D-Stacked hybrid NEMFET-CMOS computation platform running a heartbeat rate monitor application and demonstrate that NEMFET based logic is an enabling factor for the implementation of “zero-energy” operated systems.
Intelligent Decision Technologies | 2013
Mottaqiallah Taouil; Mihai Lefter; Said Hamdioui
3D-Stacked IC (3D-SIC) based on Through-Silicon-Vias (TSV) is an emerging technology that provides many benefits such as low power, high bandwidth 3D memories and heterogeneous integration. One of the attractive applications making used of such benefits is the stacking of memory dies on logic. System integrators for such application have to provide appropriate test strategy. However, they have to deal with block box IPs as IP providers usually refuse to share the IP content. Moreover, they dislike including JTAG in memory dies. Therefore, developing a low cost and high quality test approaches, while taking these constraints into consideration, is of great importance. This paper presents a framework of interconnect test approaches for memories stacked on logic, and look further than the only proposed JTAG solutions. The benefits and drawbacks of each possible solution is extensively discusses for stacked memories both with and without MBISTs, placed on the memory dies or on a separate logic die.
design, automation, and test in europe | 2013
Mihai Lefter; George Razvan Voicu; Mottaqiallah Taouil; Marius Enachescu; Said Hamdioui; Sorin Cotofana
In this paper we address lower level issues related to 3D inter-die memory repair in an attempt to evaluate the actual potential of this approach for current and foreseeable technology developments. We propose several implementation schemes both for inter-die row and column repair and evaluate their impact in terms of area and delay. Our analysis suggests that current state-of-the-art TSV dimensions allow inter-die column repair schemes at the expense of reasonable area overhead. For row repair, however, most memory configurations require TSV dimensions to scale down at least with one order of magnitude in order to make this approach a possible candidate for 3D memory repair. We also performed a theoretical analysis of the implications of the proposed 3D repair schemes on the memory access time, which indicates that no substantial delay overhead is expected and that many delay versus energy consumption tradeoffs are possible.
application specific systems architectures and processors | 2013
George Razvan Voicu; Mihai Lefter; Marius Enachescu; Sorin Cotofana
In this paper, we address the design of wide-operand addition units in the context of the emerging Through-Silicon Vias (TSV) based 3D Stacked IC (3D-SIC) technology. To this end we first identify and classify the potential of the direct folding approach on existing fast prefix adders, and then discuss the cost and performance of each strategy. Our analysis identifies as a major direct folding drawback the utilization of different structures on each tier. Thus, in order to alleviate this, we propose a novel 3D Stacked Hybrid Prefix/Carry-Select Adder with identical tier structure, which potentially makes the manufacturing of hardware wide-operand adders a reality. Such an N-bit carry select adder can be implemented with K identical tier stacked ICs, where each tier contains two N/K-bit fast prefix adders operating in parallel according to the computation anticipation principle. Their carry-out signals are cascaded through TSVs in order to perform the selection of the sums accordingly, which results in a delay with the asymptotic notation of O(log(N/K) + K). To evaluate the practical implications of direct folding and of the hybrid prefix/carry-select approaches we perform a thorough case study of 65 nm CMOS 3D adder implementations for different operand sizes and number of tiers, and analyze various possible design tradeoffs. Our simulations indicate the hybrid prefix/carry-select approach can achieve speed gains over 3D folding based designs of between 29% and 54%, for 512-bit up to 4096-bit adders, respectively. Even though 3D folding requires less real estate, when considering a more appropriate metric for 3D design, i.e., delay-footprint-cost product, the hybrid prefix/carry-select approach substantially outperforms the folding one and provides delay-footprint-cost reductions between 17.97% and 94.05%.
international symposium on circuits and systems | 2015
Mihai Lefter; George Razvan Voicu; Sorin Cotofana
3D-Stacked IC (3D-SIC) based on Through-Silicon-Vias (TSV) is an emerging technology that enables heterogeneous integration and high bandwidth low latency interconnection. In this paper we propose a 3D novel cache architecture that leverages a wide TSV-based data link distributed on the entire memory array to support two orthogonal interfaces: (i) a vertical one, with a large data width, and, (ii) a side one, with a lower data width, but with more bank-type access ports, which reduces the bank conflict probability. Our simulations indicate that our proposal substantially outperforms planar counterparts in terms of access time, energy, and footprint while providing high bandwidth, low bank conflict rate, and an enriched access mechanism set.
international symposium on nanoscale architectures | 2012
Saleh Safiruddin; Mihai Lefter; Demid Borodin; George Razvan Voicu; Sorin Cotofana
In this paper we present a zero-performance-overhead online fault detection and diagnosis scheme that exploits the vertical proximity of hardware inherent in 3D stacked integrated circuits (3D-SIC). We consider a 3D stacked processor executing independent instruction streams from different threads, on each die. We propose the vertical clustering of functionally identical computational blocks in order to enable the utilization of the 3D specific low-latency interlayer communication infrastructure. The clustering facilitates the parallel re-execution of instructions on idle units located in the proximity of the units which initially computed them and in this way creates the means for fault diagnosis and detection. We detail the control, interconnection communication infrastructure, instruction distribution, and results processing policies required for our scheme. To determine the effectiveness of the approach, we evaluate its performance in terms of diagnosis latency and percentage of verified operations on 3 to 8 core processors implemented on 3 to 8 tier 3D-SICs, respectively, by means of simulations. Our experiments indicate that the diagnosis latency ranges from 9 to 5 cycles, for 3 to 8 cores, respectively. For transient fault detection our simulations indicate that 86% to 94% of all executed instructions are verified, for 3 to 8 cores, respectively. When only one of the layers is protected against transient faults the number of verified operations increases to 94% to 99%, for the same simulation conditions. This suggests that, if certain conditions are fulfilled at design time, our approach can completely protect one instruction stream identified as being critical for the application. Our simulations clearly indicate that the proposed scheme has the potential to improve the 3D stacked integrated circuits dependability with no performance overhead and at the expense of little area overhead.
international symposium on nanoscale architectures | 2017
Mihai Lefter; Thomas Marconi; George Razvan Voicu; Sorin Cotofana
In this paper we propose a novel error correction scheme/architecture specially tailored for polyhedral memories which: (i) allows for the formation of long codewords without interfering with the memory architecture/addressing mode/data granularity and (ii) make use of codecs located on a dedicated tier of the 3D memory stack. For a transparent error correction process we propose an online memory scrubbing policy that performs the error detection and correction decoupled from the normal memory operation. To evaluate our proposal we consider as a case study a 4-die 4-MB polyhedral memory and simulate various data width codes implementations. The simulations indicate that our proposal outperforms state of the art single error correction schemes in terms of error correction capability, being able to diminish the Word Error Rates (WER) by many orders of magnitude, e.g., WER from 10−10 to 10−21 are achieved for bit error probabilities between 10−4 and 10−6, while requiring less redundancy overhead. The scrubbing mechanism hides the codec latency and provides up to 10% and 25% write and read latency reductions, respectively. In addition, by relocating the encoders/decoders from the memory dies to a dedicated one a 13% footprint reduction is obtained and parallel energy effective scrubbing can be enabled, which results in further WER reductions.
international symposium on nanoscale architectures | 2014
Mihai Lefter; Marius Enachescu; George Razvan Voicu; Sorin Cotofana
In this paper we propose to utilise 3D-stacked hybrid memories as alternative to traditional CMOS SRAMs in L1 and L2 cache implementations and analyse the potential implications of this approach on the processor performance, measured in terms of Instructions-per-Cycle (IPC) and energy consumption. The 3D hybrid memory cell relies on: (i) a Short Circuit Current Free Nano-Electro-Mechanical Field Effect Transistor (SCCF NEMFET) based inverter for data storage; and (ii) adjacent CMOS-based logic for read/write operations and data preservation. We compare 3D Stacked Hybrid NEMFET-CMOS Caches (3DS-HNCC) of various capacities against state of the art 45 nm low power CMOS SRAM counterparts (2D-CC). All the proposed implementations provide two orders of magnitude static energy reduction (due to NEMFETs extremely low OFF current), a slightly increased dynamic energy consumption, while requiring an approximately 55% larger footprint. The read access time is equivalent, while for write operations it is with about 3 ns higher, as it is dominated by the mechanical movement of the NEMFETs suspended gate. In order to determine if the write latency overhead inflicts any performance penalty, we consider as evaluation vehicle a state of the art mobile out-of-order processor core equipped with 32-kB instruction and data L1 caches, and a unified 2-MB L2 cache. We evaluate different scenarios, utilizing both 3DS-HNCC and 2D-CC at different hierarchy levels, on a set of SPEC 2000 benchmarks. Our simulations indicate that for the considered applications, despite of their increased write access time, 3DS-HNCC L2 caches inflict insignificant IPC penalty while providing, on average, 38% energy savings, when compared with 2D-CC. For L1 instruction caches the IPC penalty is also almost insignificant, while for L1 data caches IPC decreases between 1% to 12% were measured.
international conference on computer design | 2017
Mihai Lefter; George Razvan Voicu; Thomas Marconi; Valentin Savin; Sorin Cotofana
In this paper we introduce a novel error resilient memory architecture potentially applicable to a large range of memory technologies. In contrast with state of the art memory error correction schemes, which rely on (extended Hamming) Error Correcting Codes (ECC), we make use of Low Density Parity Check (LDPC) codes due to their close to the Shannon performance limit error correction capabilities. To allow for a cost-effective implementation we build our approach on top of a 3D memory organization which inherently fast and customizable wide-I/O vertical access allows for a smooth transfer of the required LDPC long code-words to/from an error correction dedicated die. To make the error correction process transparent to the memory users, e.g., processing cores, we propose an online memory scrubbing policy that performs the LDPC-based error detection and correction decoupled from the normal memory operation. For evaluation purposes we consider 3D memories protected by the proposed LDPC mechanism with various data width codes implementations. Simulation results indicate that our proposal clearly outperforms state of the art ECC schemes with fault tolerance improvements by a 4710× factor being obtained when compared to extended Hamming ECC. Furthermore, we evaluate instances of the proposed memory concept equipped with different LDPC codecs implemented on a commercial 40nm low-power CMOS technology and evaluate them on actual memory traces in terms of error correction capability, area, latency, and energy. Our results indicate that the LDPC protected memories offer substantially improved error correction capabilities, when compared to state of the art extended Hamming ECC, being able to assure clean runs for memory error rates α