Mike Ignatowski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mike Ignatowski is active.

Explore More

Publication

Featured researches published by Mike Ignatowski.

high-performance computer architecture | 2015

Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories

Mitesh R. Meswani; Sergey Blagodurov; David A. Roberts; John Slice; Mike Ignatowski; Gabriel H. Loh

Die-stacked DRAM is a technology that will soon be integrated in high-performance systems. Recent studies have focused on hardware caching techniques to make use of the stacked memory, but these approaches require complex changes to the processor and also cannot leverage the stacked memory to increase the systems overall memory capacity. In this work, we explore the challenges of exposing the stacked DRAM as part of the systems physical address space. This non-uniform access memory (NUMA) styled approach greatly simplifies the hardware and increases the physical memory capacity of the system, but pushes the burden of managing the heterogeneous memory architecture (HMA) to the software layers. We first explore simple (and somewhat impractical) schemes to manage the HMA, and then refine the mechanisms to address a variety of hardware and software implementation challenges. In the end, we present an HMA approach with low hardware and software impact that can dynamically tune itself to different application scenarios, achieving performance even better than the (impractical-to-implement) baseline approaches.

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness | 2013

A new perspective on processing-in-memory architecture design

Dong Ping Zhang; Nuwan Jayasena; Alexander Lyashevsky; Joseph L. Greathouse; Mitesh R. Meswani; Mark Nutter; Mike Ignatowski

As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to expect from the computing industry. Moving computation closer to main memory presents an opportunity to reduce the overheads associated with data movement. We explore the potential of using 3D die stacking to move memory-intensive computations closer to memory. This approach to processing-in-memory addresses some drawbacks of prior research on in-memory computing and appears commercially viable in the foreseeable future. We show promising early results from this approach and identify areas that are in need of research to unlock its full potential.

IEEE Micro | 2015

Achieving Exascale Capabilities through Heterogeneous Computing

Michael J. Schulte; Mike Ignatowski; Gabriel H. Loh; Bradford M. Beckmann; William C. Brantley; Sudhanva Gurumurthi; Nuwan Jayasena; Indrani Paul; Steven K. Reinhardt; Gregory Rodgers

This article provides an overview of AMDs vision for exascale computing, and in particular, how heterogeneity will play a central role in realizing this vision. Exascale computing requires high levels of performance capabilities while staying within stringent power budgets. Using hardware optimized for specific functions is much more energy efficient than implementing those functions with general-purpose cores. However, there is a strong desire for supercomputer customers not to have to pay for custom components designed only for high-end high-performance computing systems. Therefore, high-volume GPU technology becomes a natural choice for energy-efficient data-parallel computing. To fully realize the GPUs capabilities, the authors envision exascale computing nodes that compose integrated CPUs and GPUs (that is, accelerated processing units), along with the hardware and software support to enable scientists to effectively run their scientific experiments on an exascale system. The authors discuss the hardware and software challenges in building a heterogeneous exascale system and describe ongoing research efforts at AMD to realize their exascale vision.

automation, robotics and control systems | 2012

New memory organizations for 3d DRAM and PCMs

Ademola Fawibe; Jared Sherman; Krishna M. Kavi; Mike Ignatowski; David E. Mayhew

The memory wall (the gap between processing and storage speeds) remains a concern to computer systems designers. Caches have played a key role in hiding the performance gap by keeping recently accessed information in fast memories closer to the processor. Multi and many core systems are placing severe demands on caches, exacerbating the performance disparity between memory and processors. New memory technologies including 3D stacked DRAMs, solid state disks (SSDs) such as those built using flash technologies and phase change memories (PCM) may alleviate the problem: 3D DRAMs and SSDs present lower latencies than conventional, off-chip DRAMs and magnetic disk drives. However these technologies force us to rethink how address spaces should be organized into pages and how virtual addresses should be translated into physical pages. In this paper, we present some preliminary ideas in this connection, and evaluate these new organizations using SPEC CPU2006 benchmarks.

ieee international conference on high performance computing data and analytics | 2014

Toward efficient programmer-managed two-level memory hierarchies in exascale computers

Mitesh R. Meswani; Gabriel H. Loh; Sergey Blagodurov; David A. Roberts; John Slice; Mike Ignatowski

Future exascale systems will require very aggressive memory systems simultaneously delivering huge storage capacities and multi-TB/s bandwidths. To achieve the bandwidth targets, in-package, die-stacked memory technologies will likely be necessary. However, these integrated memories do not provide enough capacity to achieve the overall per-node memory size requirements. As a result, conventional off-package memory (e.g., DIMMs) will still be needed. This creates a two-level memory (TLM) organization where a portion of the machines memory space provides high bandwidth, and the remainder provides capacity at a lower level of performance. Effective use of such a heterogeneous memory organization may require the co-design of the software applications along with the advancements in memory architecture. In this paper, we explore the efficacy of programmer-driven approaches to managing a TLM system, using three Exascale proxy applications as case studies.

high-performance computer architecture | 2017

Design and Analysis of an APU for Exascale Computing

Thiruvengadam Vijayaraghavany; Yasuko Eckert; Gabriel H. Loh; Michael J. Schulte; Mike Ignatowski; Bradford M. Beckmann; William C. Brantley; Joseph L. Greathouse; Wei Huang; Arun Karunanithi; Onur Kayiran; Mitesh R. Meswani; Indrani Paul; Matthew Poremba; Steven E. Raasch; Steven K. Reinhardt; Greg Sadowski; Vilas Sridharan

The challenges to push computing to exaflop levels are difficult given desired targets for memory capacity, memory bandwidth, power efficiency, reliability, and cost. This paper presents a vision for an architecture that can be used to construct exascale systems. We describe a conceptual Exascale Node Architecture (ENA), which is the computational building block for an exascale supercomputer. The ENA consists of an Exascale Heterogeneous Processor (EHP) coupled with an advanced memory system. The EHP provides a high-performance accelerated processing unit (CPU+GPU), in-package high-bandwidth 3D memory, and aggressive use of die-stacking and chiplet technologies to meet the requirements for exascale computing in a balanced manner. We present initial experimental analysis to demonstrate the promise of our approach, and we discuss remaining open research challenges for the community.

automation, robotics and control systems | 2015

Processing-in-Memory: Exploring the Design Space

Marko Scrbak; Mahzabeen Islam; Krishna M. Kavi; Mike Ignatowski; Nuwan Jayasena

With the emergence of 3D-DRAM, Processing-in-Memory has once more become of great interest to the research community and industry. In this paper, we present our observations on a subset of the PIM design space. We show how the architectural choices for PIM core frequency and cache sizes will affect the overall power consumption and energy efficiency. Our findings include detailed power consumption modeling for an ARM-like core as a PIM core. We show the maximum number of PIM cores we can place in the logic layer with respect to a power budget. In addition, we explore the optimal design choices for the number of cores as a function of frequency, utilization, and energy efficiency.

european conference on parallel processing | 2014

Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies

Mahzabeen Islam; Marko Scrbak; Krishna M. Kavi; Mike Ignatowski; Nuwan Jayasena

Processing-in-Memory (PIM) is the concept of moving computation as close as possible to memory. This decreases the need for the movement of data between central processor and memory system, hence improves energy efficiency from the reduced memory traffic. In this paper we present our approach on how to embed processing cores in 3D-stacked memories, and evaluate the use of such a system for Big Data analytics. We present a simple server architecture, which employs several energy efficient PIM cores in multiple 3D-DRAM units where the server acts as a node of a cluster for Big Data analyses utilizing MapReduce programming framework. Our preliminary analyses show that on a single node up to 23% energy savings on the processing units can be achieved while reducing execution time by up to 8.8%. Additional energy savings can result from simplifying the system memory buses. We believe such energy efficient systems with PIM capability will become viable in the near future because of the potential to scale the memory wall.

automation, robotics and control systems | 2013

A multi-core memory organization for 3-d DRAM as main memory

Jared Sherman; Krishna M. Kavi; Brandon Potter; Mike Ignatowski

There is a growing interest in using 3-D DRAM structures and non-volatile memories such as Phase Change Memories (PCM) to both improve access latencies and reduce energy consumption in multicore systems. These new memory technologies present both opportunities and challenges to computer systems design. n nIn this paper we address how such memories should be organized to fully benefit from these technologies. We propose to keep 3-D DRAMs as main memory systems, but use non-volatile memories as backing store. In this connection, we view DRAM based main-memory both as a cache memory and as main memory. The cache like addressing allows for fast address translation and better memory allocation among multiple processes. We explore a set of wide-ranging design parameters for page sizes, sub-page sizes, TLB sizes, and sizes of write-buffers.

Journal of Systems Architecture | 2017

Exploring the Processing-in-Memory design space

Marko Scrbak; Mahzabeen Islam; Krishna M. Kavi; Mike Ignatowski; Nuwan Jayasena

With the emergence of 3D-DRAM, Processing-in-Memory has once more become of great interest to the research community and industry. Here we present our observations on a subset of the PIM design space. We show how the architectural choices for PIM core frequency and cache sizes will affect the overall power consumption and energy efficiency. We include a detailed power consumption breakdown for an ARM-like core as a PIM core. We show the maximum possible number of PIM cores we can place in the logic layer with respect to a predefined power budget. Additionally, we catalog additional sources of power consumption in a system with PIM such as 3D-DRAM link power and discuss the possible power reduction techniques. We describe the shortcomings of using ARM-like cores for PIM and discuss other alternatives for the PIM cores. Finally, we explore the optimal design choices for the number of cores as a function of performance, utilization, and energy efficiency.

Explore More