Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marius Grannæs is active.

Publication


Featured researches published by Marius Grannæs.


high performance computing and communications | 2009

A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures

Magnus Jahre; Marius Grannæs; Lasse Natvig

The potential for destructive interference between running processes is increased as Chip Multiprocessors (CMPs) share more on-chip resources. We believe that understanding the nature of memory system interference is vital to achieve good fairness/complexity/performance trade-offs in CMPs. Our goal in this work is to quantify the latency penalties due to interference in all hardware-controlled, shared units (i.e. the on-chip interconnect, shared cache and memory bus). To achieve this, we simulate a wide variety of realistic CMP architectures. In particular, we vary the number of cores, interconnect topology, shared cache size and off-chip memory bandwidth. We observe that interference in the off-chip memory bus accounts for between 63% and 87% of the total interference impact while the impact of cache capacity interference can be lower than indicated by previous studies (between 5% and 32% of the total impact). In addition, as much as 11% of the total impact can be due to uncontrolled allocation of shared cache Miss Status Holding Registers (MSHRs).


international conference on computer design | 2008

Low-cost open-page prefetch scheduling in chip multiprocessors

Marius Grannæs; Magnus Jahre; Lasse Natvig

The pressure on off-chip memory increases significantly as more cores compete for the same resources. A CMP deals with the memory wall by exploiting thread level parallelism (TLP), shifting the focus from reducing overall memory latency to memory throughput. This extends to the memory controller where the 3D structure of modern DRAM is exploited to increase throughput. Traditionally, prefetching reduces latency by fetching data before it is needed. In this paper we explore how prefetching can be used to increase memory throughput. We present our own low-cost open-page prefetch scheduler that exploits the 3D structure of DRAM when issuing prefetches. We show that because of the complex structure of modern DRAM, prefetches can be made cheaper than ordinary reads, thus making prefetching beneficial even when prefetcher accuracy is low. As a result, prefetching with good coverage is more important than high accuracy. By exploiting this observation our low-cost open page scheme increases performance and QoS. Furthermore, we explore how prefetches should be scheduled in a state of the art memory controller by examining sequential, scheduled region, CZone/delta correlation and reference prediction table prefetchers.


high performance embedded architectures and compilers | 2010

Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matching

Marius Grannæs; Magnus Jahre; Lasse Natvig

This paper presents a low complexity table-based approach to delta correlation prefetching. Our approach uses a table indexed by the load address which stores the latest deltas observed. By storing deltas rather than full miss addresses, considerable space is saved while making pattern matching easier. The delta-history can predict repeating patterns with long periods by using delta correlation. In addition, we propose L1 hoisting which is a technique for moving data from the L2 to the L1 using the same underlying table structure and partial matching which reduces the spatial resolution in the delta stream to expose more patterns. We evaluate our prefetching technique using the simulator framework used in the Data Prefetching Championship. This allows us to use the original code submitted to the contest to fairly evaluate several alternate prefetching techniques. Our prefetcher technique increases performance by 87% on average (6.6X max) on SPEC2006.


high performance embedded architectures and compilers | 2010

DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems

Magnus Jahre; Marius Grannæs; Lasse Natvig

Chip Multi-Processors (CMPs) commonly share hardware-controlled on-chip units that are unaware that memory requests are issued by independent processors. Consequently, the resources a process receives will vary depending on the behavior of the processes it is co-scheduled with. Resource allocation techniques can avoid this problem if they are provided with an accurate interference estimate. Our Dynamic Interference Estimation Framework (DIEF) provides this service by dynamically estimating the latency a process would experience with exclusive access to all hardware-controlled, shared resources. Since the total interference latency is the sum of the interference latency in each shared unit, the system designer can choose estimation techniques to achieve the desired accuracy/complexity trade-off. In this work, we provide high-accuracy estimation techniques for the on-chip interconnect, shared cache and memory bus. This DIEF implementation has an average relative estimate error between -0.4% and 4.7% and a standard deviation between 2.4% and 5.8%.


automation, robotics and control systems | 2011

Exploring the prefetcher/memory controller design space: an opportunistic prefetch scheduling strategy

Marius Grannæs; Magnus Jahre; Lasse Natvig

Prefetching is a well-known technique for bridging the memory gap. By predicting future memory references the prefetcher can fetch data from main memory and insert it into the cache such that overall performance is increased. Modern memory controllers reorder memory requests to exploit the 3D structure of modern DRAM interfaces. In particular, prioritizing memory requests that use open pages increases throughput significantly. In this work, we investigate the prefetcher/memory controller design space along three dimensions: prefetching heuristic, prefetch scheduling strategy and available memory bandwidth. In particular, we evaluate 5 different prefetchers and 6 prefetch scheduling strategies. Through this extensive investigation, we observed that prior prefetch scheduling strategies often cause memory bus contention in bandwidth constrained CMPs which in turn causes performance regressions. To avoid this problem, we propose a novel prefetch scheduling heuristic called Opportunistic Prefetch Scheduling that selectively prioritizes prefetches to open DRAM pages such that performance regressions are minimized. Opportunistic prefetch scheduling reduces performance regressions by 6.7X and 5.2X, while improving performance by 17 % and 20 % for sequential and scheduled region prefetching, compared to the direct scheduling strategy.


Journal of Embedded Computing | 2006

Destructive-read in embedded DRAM, impact on power consumption

Haakon Dybdahl; Per Gunnar Kjeldsberg; Marius Grannæs; Lasse Natvig


Journal of Instruction-level Parallelism | 2011

Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables.

Marius Grannæs; Magnus Jahre; Lasse Natvig


Archive | 2010

Managing Chip Multiprocessor Memory Systems with Miss Bandwidth Allocations

Magnus Jahre; Marius Grannæs; Lasse Natvig


CMP-MSI 2008 | 2008

Hardware Prefetching Using Shadow Tagging

Marius Grannæs; Lasse Natvig


Lecture Notes in Computer Science | 2006

Cache write-back schemes for embedded destructive-read DRAM

Haakon Dybdahl; Marius Grannæs; Lasse Natvig

Collaboration


Dive into the Marius Grannæs's collaboration.

Top Co-Authors

Avatar

Lasse Natvig

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Magnus Jahre

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Haakon Dybdahl

Norwegian University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Per Gunnar Kjeldsberg

Norwegian University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge