Peter A. Jonsson
Luleå University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter A. Jonsson.
symposium on principles of programming languages | 2009
Peter A. Jonsson; Johan Nordlander
Previous deforestation and supercompilation algorithms may introduce accidental termination when applied to call-by-value programs. This hides looping bugs from the programmer, and changes the behavior of a program depending on whether it is optimized or not. We present a supercompilation algorithm for a higher-order call-by-value language and we prove that the algorithm both terminates and preserves termination properties. This algorithm utilizes strictness information for deciding whether to substitute or not and compares favorably with previous call-by-name transformations.
international workshop on openmp | 2013
Ananya Muddukrishna; Peter A. Jonsson; Vladimir Vlassov; Mats Brorsson
Modern parallel computer systems exhibit Non-Uniform Memory Access (NUMA) behavior. For best performance, any parallel program therefore has to match data allocation and scheduling of computations to the memory architecture of the machine. When done manually, this becomes a tedious process and since each individual system has its own peculiarities this also leads to programs that are not performance-portable.
acm sigplan symposium on principles and practice of parallel programming | 2016
Ananya Muddukrishna; Peter A. Jonsson; Artur Podobas; Mats Brorsson
Average programmers struggle to solve performance problems in OpenMP programs with tasks and parallel for-loops. Existing performance analysis tools visualize OpenMP task performance from the runtime systems perspective where task execution is interleaved with other tasks in an unpredictable order. Problems with OpenMP parallel for-loops are similarly difficult to resolve since tools only visualize aggregate thread-level statistics such as load imbalance without zooming into a per-chunk granularity. The runtime system/threads oriented visualization provides poor support for understanding problems with task and chunk execution time, parallelism, and memory hierarchy utilization, forcing average programmers to rely on experts or use tedious trial-and-error tuning methods for performance. We present grain graphs, a new OpenMP performance analysis method that visualizes grains -- computation performed by a task or a parallel for-loop chunk instance -- and highlights problems such as low parallelism, work inflation and poor parallelization benefit at the grain level. We demonstrate that grain graphs can quickly reveal performance problems that are difficult to detect and characterize in fine detail using existing visualizations in standard OpenMP programs, simplifying OpenMP performance analysis. This enables average programmers to make portable optimizations for poor performing OpenMP programs, reducing pressure on experts and removing the need for tedious trial-and-error tuning.
International Journal of Occupational Safety and Ergonomics | 2003
Andi Wijaya; Peter A. Jonsson; Örjan Johansson
A field study was done to evaluate different seat designs in the aspect of minimizing vibration transmission and reducing the level of discomfort experienced by drivers subjected to transient vibration. Two seat designs (sliding or fixed in the horizontal direction) were compared in an experiment based on variation of sitting posture, speed, and type of obstacle. The comparison was done by assessing discomfort and perceived motion and by vibration measurement. Ten professional drivers were used as participants. Maximum Transient Vibration Value and Vibration Dose Value were used in the evaluation. The results showed that a sliding seat is superior in attenuating vibration containing transient vibration in the horizontal direction. It was also perceived as giving less overall and low back discomfort compared to a fixed seat.
Scientific Programming | 2015
Ananya Muddukrishna; Peter A. Jonsson; Mats Brorsson
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
partial evaluation and semantic-based program manipulation | 2011
Peter A. Jonsson; Johan Nordlander
Supercompilation algorithms can perform great optimizations but sometimes suffer from the problem of code explosion. This results in huge binaries which might hurt the performance on a modern processor. We present a supercompilation algorithm that is fast enough to speculatively supercompile expressions and discard the result if it turned out bad. This allows us to supercompile large parts of the imaginary and spectral parts of nofib in a matter of seconds while keeping the binary size increase below 5%.
PLOS ONE | 2015
Ananya Muddukrishna; Peter A. Jonsson; Mats Brorsson
Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance.
Journal of Sound and Vibration | 2005
Peter A. Jonsson; Örjan Johansson
Archive | 2008
Peter A. Jonsson; Johan Nordlander
International Valentin Turchin Memorial Workshop on Metacomputation in Russia : 01/07/2010 - 05/07/2010 | 2010
Peter A. Jonsson; Johan Nordlander