Qiuling Zhu
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qiuling Zhu.
ieee international d systems integration conference | 2013
Qiuling Zhu; Berkin Akin; H. Ekin Sumbul; Fazle Sadi; James C. Hoe; Lawrence T. Pileggi; Franz Franchetti
This paper introduces a 3D-stacked logic-in-memory (LiM) system that integrates the 3D die-stacked DRAM architecture with the application-specific LiM IC to accelerate important data-intensive computing. The proposed system comprises a fine-grained rank-level 3D die-stacked DRAM device and extra LiM layers implementing logic-enhanced SRAM blocks that are dedicated to a particular application. Through silicon vias (TSVs) are used for vertical interconnections providing the required bandwidth to support the high performance LiM computing. We performed a comprehensive 3D DRAM design space exploration and exploit the efficient architectures to accelerate the computing that can balance the performance and power. Our experiments demonstrate orders of magnitude of performance and power efficiency improvements compared with the traditional multithreaded software implementation on modern CPU.
hardware-oriented security and trust | 2014
Kaushik Vaidyanathan; Renzhi Liu; H. Ekin Sumbul; Qiuling Zhu; Franz Franchetti; Lawrence T. Pileggi
Split fabrication, the process of splitting an IC into an untrusted and trusted tier, facilitates access to the most advanced semiconductor manufacturing capabilities available in the world without requiring disclosure of design intent. While researchers have investigated the security of logic blocks in the context of split fabrication, the security of IP blocks, another key component of an SoC, has not been examined. Our security analysis of IP block designs, specifically embedded memory and analog components, shows that they are vulnerable to “recognition attacks” at the untrusted foundry due to the use of standardized floorplans and leaf cell layouts. We propose methodologies to design these blocks efficiently and securely, and demonstrate their effectiveness using 130nm split fabricated testchips.
ieee high performance extreme computing conference | 2013
Qiuling Zhu; Tobias Graf; H. Ekin Sumbul; Lawrence T. Pileggi; Franz Franchetti
This paper introduces a 3D-stacked logic-in-memory (LiM) system to accelerate the processing of sparse matrix data that is held in a 3D DRAM system. We build a customized content addressable memory (CAM) hardware structure to exploit the inherent sparse data patterns and model the LiM based hardware accelerator layers that are stacked in between DRAM dies for the efficient sparse matrix operations. Through silicon vias (TSVs) are used to provide the required high inter-layer bandwidth. Furthermore, we adapt the algorithm and data structure to fully leverage the underlying hardware capabilities, and develop the necessary design framework to facilitate the design space evaluation and LiM hardware synthesis. Our simulation demonstrates more than two orders of magnitude of performance and energy efficiency improvements compared with the traditional multithreaded software implementation on modern processors.
application specific systems architectures and processors | 2012
Qiuling Zhu; Kaushik Vaidyanathan; Ofer Shacham; Mark Horowitz; Lawrence T. Pileggi; Franz Franchetti
This paper presents a design methodology forhardware synthesis of application-specific logic-in-memory(LiM) blocks. Logic-in-memory designs tightly integrate specializedcomputation logic with embedded memory, enablingmore localized computation, thus save energy consumption. Asa demonstration, we present an end-to-end design frameworkto automatically synthesize an interpolation based logic-in-memoryblock named interpolation memory, which combinesa seed table with simple arithmetic logic to efficiently evaluatefunctions. In order to support multiple consecutive seed dataaccess that is required in the interpolation operation, wesynthesize the physical memory into the novel rectangular accesssmart memory blocks. We evaluated a large designspace of interpolation memories in sub-20 nm commercialCMOS technology by using the proposed design framework.Furthermore, we implemented a logic-in-memory based computedtomography (CT) medical image reconstruction systemand our experimental results show that the logic-in-memorycomputing method achieves orders of magnitude of energysaving compared with the traditional in-processor computing.
international conference on acoustics, speech, and signal processing | 2012
Qiuling Zhu; Christian R. Berger; Eric Turner; Lawrence T. Pileggi; Franz Franchetti
In this paper we present a local interpolation-based variant of the well-known polar format algorithm used for synthetic aperture radar (SAR) image formation. We develop the algorithm to match the capabilities of the application-specific logic-in-memory processing paradigm, which off-loads lightweight computation directly into the SRAM and DRAM. Our proposed algorithm performs filtering, an image perspective transformation, and a local 2D interpolation and supports partial and low-resolution reconstruction. We implement our customized SAR grid interpolation logic-in-memory hardware in advanced 32nm silicon technology. Our high-level design tools allow to instantiate various optimized design choices to fit image processing and hardware needs of application designers. Our simulation results show that the logic-in-memory approach has the potential to enable substantial improvements in energy efficiency without sacrificing image quality.
Journal of Micro-nanolithography Mems and Moems | 2014
Kaushik Vaidyanathan; Qiuling Zhu; Lars W. Liebmann; Kafai Lai; Stephen Wu; Renzhi Liu; Yandong Liu; Andzrej Strojwas; Lawrence T. Pileggi
Abstract. For the past four decades, cost and features have driven complementary metal-oxide semiconductor (CMOS) scaling. Severe lithography and material limitations seen below the 20-nm node, however, are challenging the fundamental premise of affordable CMOS scaling. Just continuing to co-optimize leaf cell circuit and layout designs with process technology does not enable us to exploit the challenges of sub-20-nm CMOS. For affordable scaling, it is imperative to work past sub-20-nm technology impediments while exploiting its features. To this end, we propose to broaden the scope of design technology co-optimization (DTCO) to be more holistic by including microarchitecture design and computer-aided design, along with circuits, layout, and process technology. Furthermore, we undertook such a holistic DTCO for all critical design elements such as embedded memory, standard cell logic, analog components, and physical synthesis in a 14-nm process. Measurements results from experimental designs in a representative 14-nm process from IBM demonstrate the efficacy of the proposed approach.
signal processing systems | 2013
Qiuling Zhu; Christian R. Berger; Eric Turner; Lawrence T. Pileggi; Franz Franchetti
In this paper we present a local interpolation-based variant of the well-known polar format algorithm used for synthetic aperture radar (SAR) image formation. We develop the algorithm to match the capabilities of the application-specific logic-in-memory processing paradigm, which off-loads lightweight computation directly into the SRAM and DRAM. Our proposed algorithm performs filtering, an image perspective transformation, and a local 2D interpolation, and supports partial and low-resolution reconstruction. We implement our customized SAR grid interpolation logic-in-memory hardware in advanced 14 nm silicon technology. Our high-level design tools allow to instantiate various optimized design choices to fit image processing and hardware needs of application designers. Our simulation results show that the logic-in-memory approach has the potential to enable substantial improvements in energy efficiency without sacrificing image quality.
design automation conference | 2015
H. Ekin Sumbul; Kaushik Vaidyanathan; Qiuling Zhu; Franz Franchetti; Lawrence T. Pileggi
For deeply scaled digital integrated systems, the power required for transporting data between memory and logic can exceed the power needed for computation, thereby limiting the efficacy of synthesizing logic and compiling memory independently. Logic-in-Memory (LiM) architectures address this challenge by embedding logic within the memory block to perform basic operations on data locally for specific functions. While custom smart memories have been successfully constructed for various applications, a fully automated LiM synthesis flow enables architectural exploration that has heretofore not been possible. In this paper we present a tool and design methodology for LiM physical synthesis that performs co-design of algorithms and architectures to explore system level trade-offs. The resulting layouts and timing models can be incorporated within any physical synthesis tool. Silicon results shown in this paper demonstrate a 250x performance improvement and 310x energy savings for a data-intensive application example.
ifip ieee international conference on very large scale integration | 2012
Qiuling Zhu; Lawrence T. Pileggi; Franz Franchettis
As nanoscale lithography challenges mandate greater pattern regularity and commonality for logic and memory circuits, new opportunities are created to affordably synthesize more powerful smart memory blocks for specific applications. Leveraging the ability to embed logic inside the memory block boundary, we demonstrate the synthesis of smart memory architectures that exploits the inherent memory address patterns of the backprojection algorithm to enable efficient parallel image reconstruction at minimum hardware overhead. An end-to-end design framework in sub-20nm CMOS technologies was constructed for the physical synthesis of smart memories and evaluation of the huge design space. Our experimental results show that customizing memory for the computerized tomography (CT) parallel backprojection can achieve more than 30% area and power savings while offering significant performance improvements with marginal sacrifice of image accuracy.
2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC) | 2012
Qiuling Zhu; Lawrence T. Pileggi; Franz Franchetti
As nanoscale lithography challenges mandate greater pattern regularity and commonality for logic and memory circuits, new opportunities are created to affordably synthesize more powerful smart memory blocks for specific applications. Leveraging the ability to embed logic inside the memory block boundary, we demonstrate the synthesis of smart memory architectures that exploits the inherent memory address patterns of the backprojection algorithm to enable efficient parallel image reconstruction at minimum hardware overhead. An end-to-end design framework in sub-20nm CMOS technologies was constructed for the physical synthesis of smart memories and evaluation of the huge design space. Our experimental results show that customizing memory for the computerized tomography (CT) parallel backprojection can achieve more than 30% area and power savings while offering significant performance improvements with marginal sacrifice of image accuracy.