Computer Science Hardware Architecture - Researchain

Featured Researches

A Range Matching CAM for Hierarchical Defect Tolerance Technique in NRAM Structures

Due to the small size of nanoscale devices, they are highly prone to process disturbances which results in manufacturing defects. Some of the defects are randomly distributed throughout the nanodevice layer. Other disturbances tend to be local and lead to cluster defects caused by factors such as layer misintegration and line width variations. In this paper, we propose a method for identifying cluster defects from random ones. The motivation is to repair the cluster defects using rectangular ranges in a range matching content-addressable memory (RM-CAM) and random defects using triple-modular redundancy (TMR). It is believed a combination of these two approaches is more effective for repairing defects at high error rate with less resource. With the proposed fault repairing technique, defect recovery results are examined for different fault distribution scenarios. Also the mapping circuit structure required for two conceptual 32*32 and 64*64 bit RAMs are presented and their speed, power and transistor count are reported.

Hardware Architecture

A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications

This study began with a research project, called DISCvR, conducted at the IBM-ILLINOIS Center for Cognitive Computing Systems Reseach. The goal of DISCvR was to build a practical NLP based AI pipeline for document understanding which will help us better understand the computation patterns and requirements of modern computing systems. While building such a prototype, an early use case came to us thanks to the 2017 IEEE/ACM International Symposium on Microarchitecture (MICRO-50) Program Co-chairs, Drs. Hillery Hunter and Jaime Moreno. They asked us if we can perform some data-driven analysis of the past 50 years of MICRO papers and show some interesting historical perspectives on MICRO's 50 years of publication. We learned two important lessons from that experience: (1) building an AI solution to truly understand unstructured data is hard in spite of the many claimed successes in natural language understanding; and (2) providing a data-driven perspective on computer architecture research is a very interesting and fun project. Recently we decided to conduct a more thorough study based on all past papers of International Symposium on Computer Architecture (ISCA) from 1973 to 2018, which resulted this article. We recognize that we have just scratched the surface of natural language understanding of unstructured data, and there are many more aspects that we can improve. But even with our current study, we felt there were enough interesting findings that may be worthwhile to share with the community. Hence we decided to write this article to summarize our findings so far based only on ISCA publications. Our hope is to generate further interests from the community in this topic, and we welcome collaboration from the community to deepen our understanding both of the computer architecture research and of the challenges of NLP-based AI solutions.

Hardware Architecture

A Ring Router Microarchitecture for NoCs

Network-on-Chip (NoC) has become a popular choice for connecting a large number of processing cores in chip multiprocessor design. In a conventional NoC design, most of the area in the router is occupied by the buffers and the crossbar switch. These two components also consume the majority of the router's power. Much of the research in NoC has been based on the conventional router microarchitecture. We propose a novel router microarchitecture that treats the router itself as a small network of the ring topology. It eliminates the large crossbar switch in the conventional design. In addition, network latency is much reduced. Simulation and circuit synthesis show that the proposed microarchitecture can reduce the latency, area and power by 53%, 34% and 27%, respectively, compared to the conventional design.

Hardware Architecture

A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired_unused_prefetches---the number of unused prefetches expired by the reduced retention time STTRAM cache---can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24% and 24.59%, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50% and 3.59%, respectively, and reduced the hardware overhead by 54.55%

Hardware Architecture

A Survey of Aging Monitors and Reconfiguration Techniques

CMOS technology scaling makes aging effects an important concern for the design and fabrication of integrated circuits. Aging deterioration reduces the useful life of a circuit, making it fail earlier. This deterioration can affect all portions of a circuit and impacts its performance and reliability. Contemporary literature shows solutions to monitor and mitigate aging using hardware and software monitoring mechanisms and reconfiguration techniques. The goal of this review of the state-of-the-art is to identify existing monitoring and reconfiguration solutions for aging. This survey evaluates the aging research, focusing the years from 2012 to 2019, and proposes a classification for monitors and reconfiguration techniques. Results show that the most common monitor type used for aging detection is to monitor timing errors, and the most common reconfiguration technique used to deal with aging is voltage scaling. Furthermore, most of the literature contributions are in the digital field, using hardware solutions for monitoring aging in circuits. There are few literature contributions in the analog area, being the scope of this survey in the digital domain. By scrutinizing these solutions, this survey points directions for further research and development of aging monitors and reconfiguration techniques

Hardware Architecture

A Survey of Machine Learning Applied to Computer Architecture Design

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.

Hardware Architecture

A Survey of Novel Cache Hierarchy Designs for High Workloads

Traditional on-die, three-level cache hierarchy design is very commonly used but is also prone to latency, especially at the Level 2 (L2) cache. We discuss three distinct ways of improving this design in order to have better performance. Performance is especially important for systems with high workloads. The first method proposes to eliminate L2 altogether while proposing a new prefetching technique, the second method suggests increasing the size of L2, while the last method advocates the implementation of optical caches. After carefully contemplating the results in performance gains and the advantages and disadvantages of each method, we found the last method to be the best of the three.

Hardware Architecture

A Survey of Resource Management for Processing-in-Memory and Near-Memory Processing Architectures

Due to amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become the bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D stacked memories or charge sharing based bitwise operations in DRAM. However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

Hardware Architecture

A Survey of System Architectures and Techniques for FPGA Virtualization

FPGA accelerators are gaining increasing attention in both cloud and edge computing because of their hardware flexibility, high computational throughput, and low power consumption. However, the design flow of FPGAs often requires specific knowledge of the underlying hardware, which hinders the wide adoption of FPGAs by application developers. Therefore, the virtualization of FPGAs becomes extremely important to create a useful abstraction of the hardware suitable for application developers. Such abstraction also enables the sharing of FPGA resources among multiple users and accelerator applications, which is important because, traditionally, FPGAs have been mostly used in single-user, single-embedded-application scenarios. There are many works in the field of FPGA virtualization covering different aspects and targeting different application areas. In this survey, we review the system architectures used in the literature for FPGA virtualization. In addition, we identify the primary objectives of FPGA virtualization, based on which we summarize the techniques for realizing FPGA virtualization. This survey helps researchers to efficiently learn about FPGA virtualization research by providing a comprehensive review of the existing literature.

Hardware Architecture

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

With the end of both Dennard's scaling and Moore's law, computer users and researchers are aggressively exploring alternative forms of computing in order to continue the performance scaling that we have come to enjoy. Among the more salient and practical of the post-Moore alternatives are reconfigurable systems, with Coarse-Grained Reconfigurable Architectures (CGRAs) seemingly capable of striking a balance between performance and programmability. In this paper, we survey the landscape of CGRAs. We summarize nearly three decades of literature on the subject, with a particular focus on the premise behind the different CGRAs and how they have evolved. Next, we compile metrics of available CGRAs and analyze their performance properties in order to understand and discover knowledge gaps and opportunities for future CGRA research specialized towards High-Performance Computing (HPC). We find that there are ample opportunities for future research on CGRAs, in particular with respect to size, functionality, support for parallel programming models, and to evaluate more complex applications.

Ready to get started?

Join us today

Archive Your Research