Mats Brorsson
Royal Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mats Brorsson.
international symposium on computer architecture | 1993
Per Stenström; Mats Brorsson; Lars Sandberg
Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request. In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol. Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.
annual simulation symposium | 1993
Mats Brorsson; Fredrik Dahlgren; Håkan Nilsson; Per Stenström
The CacheMire Test Bench – A Flexible and Effective Approach for Simulation of Multiprocessors
Concurrency and Computation: Practice and Experience | 2000
Christian Brunschen; Mats Brorsson
We describe here the design and performance of OdinMP/CCp, which is a portable compiler for C-programs using the OpenMP directives for parallel processing with shared memory. OdinMP/CCp was written in Java for portability reasons and takes a C-program with OpenMP directives and produces a C-program for POSIX threads. We describe some of the ideas behind the design of OdinMP/CCp and show some performanceresults achieved on an SGI Origin 2000 and a Sun E10000. Speedup measurements relative to a sequential version of the test programs show that OpenMP programs using OdinMP/CCp exhibit excellent performance on the Sun E10000 and reasonable performance on the Origin 2000. Copyright
parallel computing | 1998
Sven Karlsson; Mats Brorsson
In this paper we analyze the characteristics of communication in three different applications, FFT, Barnes and Water, on an IBM SP2. We contrast the communication using two different programming models: message-passing, MPI, and shared memory, represented by a state-of-the-art distributed virtual shared memory package, TreadMarks. We show that while communication time and busy times are comparable for small systems, the communication patterns are fundamentally different leading to poor performance for TreadMarks-based applications when the number of processors increase. This is due to the request/reply technique used in TreadMarks that results in a large fraction of very small messages. However, if the application can be tuned to reduce the impact of small message communication it is possible to achieve acceptable performance at least up to 32 nodes. Our measurements also show that TreadMarks programs tend to cause a more even network load compared to MPI programs.
workshop on computer architecture education | 2002
Mats Brorsson
Computer animation is a tool which nowadays is used in more and more fields. In this paper we describe the use of computer animation to support the learning of computer organization itself. MipsIt is a system consisting of a software development environment, a system and cache simulator and a highly flexible microarchitecture simulator used for pipeline studies. It has been in use for several years now and constitutes an important tool in the education at Lund University and KTH, Royal Institute of Technology in Sweden.
IEEE Computer | 1997
Per Stenström; Mats Brorsson; Fredrik Dahlgren; Håkan Grahn; Michel Dubois
Proposed hardware optimizations to CC-NUMA machines-shared memory multiprocessors that use cache consistency protocols-can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each.
ieee international conference on high performance computing, data, and analytics | 2002
Sven Karlsson; Sung-Woo Lee; Mats Brorsson
OpenMP is a relatively new industry standard for programming parallel computers with a shared memory programming model. Given that clusters of workstations are a cost-effective solution for building parallel platforms, it would of course be highly interesting if the OpenMP model could be extended to these systems as well as to the standard shared memory architectures for which it was originally intended.We present in this paper a fully compliant implementation of the OpenMPsp ecification 1.0 for C targeting networks of workstations. We have used an experimental software distributed shared memory system called Coherent Virtual Machine to implement a run-time library which is the target of a source-to-source OpenMP translator also developed in this project.The system has been evaluated using an OpenMP micro-benchmark suite as to evaluate the effect of some memory coherence protocol improvements. We have also used OpenMP versions of three Splash-2 applications concluding in reasonable speedups on an IBM SP2 machine. This also is the first study to investigate the subtle mechanisms of consistency in OpenMP on software distributed shared memory systems.
international conference on big data and cloud computing | 2015
Ahsan Javed Awan; Mats Brorsson; Vladimir Vlassov; Eduard Ayguadé
In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.
compilers, architecture, and synthesis for embedded systems | 2002
Mladen Nikitovic; Mats Brorsson
Power consumption has become an increasingly important factor in the field of computer architecture. It affects issues such as heat dissipation and packaging cost, which in turn affects the design and cost of a mobile terminal. Today, a lot of effort is put into the design of architectures and software implementation to increase performance. However, little is done on a system level to minimize power consumption, which is crucial in mobile systems.We propose an adaptive chip-multiprocessor (CMP) architecture, where the number of active processors is dynamically adjusted to the current workload need in order to save energy while preserving performance. The architecture is suitable in future mobile terminals where we anticipate a bursty and performance demanding workload.We have carried out an evaluation of the performance and power consumption of the proposed architecture using previously validated high-level simulation models. Our experiments show that orders of magnitude in power consumption can be saved compared to a conventional architecture to a negligable performance cost. The method used is complementary to other power saving techniques such as voltage and frequency scaling.
Ibm Systems Journal | 1997
Eric W. Parsons; Mats Brorsson; Kenneth C. Sevcik
The use of networks of workstations for parallel computing is becoming increasingly common. Networks of workstations are attractive for a large class of parallel applications that can tolerate the ...