Mats Brorsson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mats Brorsson is active.

Explore More

Publication

Featured researches published by Mats Brorsson.

international symposium on computer architecture | 1993

An adaptive cache coherence protocol optimized for migratory sharing

Per Stenström; Mats Brorsson; Lars Sandberg

Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request. In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol. Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.

annual simulation symposium | 1993

The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors

Mats Brorsson; Fredrik Dahlgren; Håkan Nilsson; Per Stenström

The CacheMire Test Bench – A Flexible and Effective Approach for Simulation of Multiprocessors

Concurrency and Computation: Practice and Experience | 2000

OdinMP/CCp - a portable implementation of OpenMP for C

Christian Brunschen; Mats Brorsson

We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portability reasons and takes a C-program with OpenMP directives and produces a C-program for POSIX threads. We describe some of the ideas behind the design of OdinMP/CCp and show some performanceresults achieved on an SGI Origin 2000 and a Sun E10000. Speedup measurements relative to a sequential version of the test programs show that OpenMP programs using OdinMP/CCp exhibit excellent performance on the Sun E10000 and reasonable performance on the Origin 2000. Copyright

parallel computing | 1998

A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2

Sven Karlsson; Mats Brorsson

In this paper we analyze the characteristics of communication in three different applications, FFT, Barnes and Water, on an IBM SP2. We contrast the communication using two different programming models: message-passing, MPI, and shared memory, represented by a state-of-the-art distributed virtual shared memory package, TreadMarks. We show that while communication time and busy times are comparable for small systems, the communication patterns are fundamentally different leading to poor performance for TreadMarks-based applications when the number of processors increase. This is due to the request/reply technique used in TreadMarks that results in a large fraction of very small messages. However, if the application can be tuned to reduce the impact of small message communication it is possible to achieve acceptable performance at least up to 32 nodes. Our measurements also show that TreadMarks programs tend to cause a more even network load compared to MPI programs.

workshop on computer architecture education | 2002

MipsIt: a simulation and development environment using animation for computer architecture education

Mats Brorsson

Computer animation is a tool which nowadays is used in more and more fields. In this paper we describe the use of computer animation to support the learning of computer organization itself. MipsIt is a system consisting of a software development environment, a system and cache simulator and a highly flexible microarchitecture simulator used for pipeline studies. It has been in use for several years now and constitutes an important tool in the education at Lund University and KTH, Royal Institute of Technology in Sweden.

IEEE Computer | 1997

Boosting the performance of shared memory multiprocessors

Per Stenström; Mats Brorsson; Fredrik Dahlgren; Håkan Grahn; Michel Dubois

Proposed hardware optimizations to CC-NUMA machines-shared memory multiprocessors that use cache consistency protocols-can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each.

ieee international conference on high performance computing, data, and analytics | 2002

A fully compliant OpenMP implementation on software distributed shared memory

Sven Karlsson; Sung-Woo Lee; Mats Brorsson

OpenMP is a relatively new industry standard for programming parallel computers with a shared memory programming model. Given that clusters of workstations are a cost-effective solution for building parallel platforms, it would of course be highly interesting if the OpenMP model could be extended to these systems as well as to the standard shared memory architectures for which it was originally intended.We present in this paper a fully compliant implementation of the OpenMPsp ecification 1.0 for C targeting networks of workstations. We have used an experimental software distributed shared memory system called Coherent Virtual Machine to implement a run-time library which is the target of a source-to-source OpenMP translator also developed in this project.The system has been evaluated using an OpenMP micro-benchmark suite as to evaluate the effect of some memory coherence protocol improvements. We have also used OpenMP versions of three Splash-2 applications concluding in reasonable speedups on an IBM SP2 machine. This also is the first study to investigate the subtle mechanisms of consistency in OpenMP on software distributed shared memory systems.

international conference on big data and cloud computing | 2015

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Ahsan Javed Awan; Mats Brorsson; Vladimir Vlassov; Eduard Ayguadé

In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

compilers, architecture, and synthesis for embedded systems | 2002

An adaptive chip-multiprocessor architecture for future mobile terminals

Mladen Nikitovic; Mats Brorsson

Power consumption has become an increasingly important factor in the field of computer architecture. It affects issues such as heat dissipation and packaging cost, which in turn affects the design and cost of a mobile terminal. Today, a lot of effort is put into the design of architectures and software implementation to increase performance. However, little is done on a system level to minimize power consumption, which is crucial in mobile systems.We propose an adaptive chip-multiprocessor (CMP) architecture, where the number of active processors is dynamically adjusted to the current workload need in order to save energy while preserving performance. The architecture is suitable in future mobile terminals where we anticipate a bursty and performance demanding workload.We have carried out an evaluation of the performance and power consumption of the proposed architecture using previously validated high-level simulation models. Our experiments show that orders of magnitude in power consumption can be saved compared to a conventional architecture to a negligable performance cost. The method used is complementary to other power saving techniques such as voltage and frequency scaling.

Ibm Systems Journal | 1997

Predicting the performance of distributed virtual shared-memory applications

Eric W. Parsons; Mats Brorsson; Kenneth C. Sevcik

The use of networks of workstations for parallel computing is becoming increasingly common. Networks of workstations are attractive for a large class of parallel applications that can tolerate the ...

Explore More