Eduard Mehofer
University of Vienna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eduard Mehofer.
european conference on parallel processing | 2009
Sabri Pllana; Siegfried Benkner; Eduard Mehofer; Lasse Natvig; Fatos Xhafa
In this position paper we argue that an intelligent program development environment that proactively supports the user helps a mainstream programmer to overcome the difficulties of programming multi-core computing systems. We propose a programming environment based on intelligent software agents that enables users to work at a high level of abstraction while automating low-level implementation activities. The programming environment supports program composition in a model-driven development fashion using parallel building blocks and proactively assists the user during major phases of program development and performance tuning. We highlight the potential benefits of using such a programming environment with usage-scenarios. An experiment with a parallel building block on a Sun UltraSPARC T2 Plus processor shows how the system may assist the programmer in achieving performance improvements.
IEEE Transactions on Parallel and Distributed Systems | 2002
Jens Knoop; Eduard Mehofer
Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High-Performance Fortran (HPF) thus offer means allowing a programmer both to specify data distributions and to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive and can significantly degrade a programs performance. They must thus be reduced to a minimum. In this article, we present a novel, aggressive approach for avoiding unnecessary remappings, which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations, each achieving optimal results on its own. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency, allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range from typical HPF fragments to real-world programs. Performance measurements underline its importance and show its effectiveness on different hardware platforms and in different settings.
compiler construction | 2001
Eduard Mehofer; Bernhard Scholz
Classical data flow analysis determines whether a data flow fact may hold or does not hold at some program point. Probabilistic data flow systems compute a range, i.e. a probability, with which a data flow fact will hold at some program point. In this paper we develop a novel, practicable framework for probabilistic data flow problems. In contrast to other approaches, we utilize execution history for calculating the probabilities of data flow facts. In this way we achieve significantly better results. Effectiveness and efficiency of our approach are shown by compiling and running the SPECint95 benchmark suite.
Sigplan Notices | 2000
Eduard Mehofer; Bernhard Scholz
Traditionally optimization is done statistically independent of actual execution environments. For generating highly optimized code, however, runtime information can be used to adapt a program to different environments. In probabilistic data flow systems runtime information on representative input data is exploited to compute the probability with what data flow facts may hold. Probabilistic data flow analysis can guide sophisticated optimizing transformations resulting in better performance. In comparison classical data flow analysis does not take runtime information into account. All paths are equally weighted irrespectively whether they are never, heavily, or rarely executed. In this paper we present the best solution what we can theoretically obtain for probabilistic data flow problems and compare it with the state-of-the-art one-edge approach. We show that the differences can be considerable and improvements are crucial. However, the theoretically best solution is too expensive in general and feasible approaches are required. In the sequel we develop an efficient approach which employs two-edge profiling and classical data flow analysis. We show that the results of the two-edge approach are significantly better than the state-of-the-art one-edge approach.
international conference on parallel architectures and compilation techniques | 2002
Bernhard Scholz; Eduard Mehofer
Efficient use of machine resources in high-performance computer systems requires highly optimizing compilers with sophisticated analyses. Static analysis often fails to identify frequently executed portions of a program which are the places where optimizations achieve the greatest benefit. This paper introduces a novel data flow frequency analysis framework that computes the frequency with which a data flow fact will hold at some program point based on profiling information. Several approaches which approximate the frequencies based on k-edge profiling have been presented. However, no feasible approach for obtaining the accurate solution exists so far. Recently, efficient techniques for recording whole program paths (WPPs) have been developed. Our approach for computing data flow frequencies results in an accurate solution and utilizes WPPs to obtain the solution in reasonable time. In our experiments we show that the execution time of WPP-based frequency analysis is in case of the SPEC benchmark suite only a fraction of the overall compilation time.
european conference on parallel processing | 2003
Bernhard Scholz; Eduard Mehofer; R. Nigel Horspool
Partial redundancy elimination (PRE) techniques play an important role in optimizing compilers. Many optimizations, such as elimination of redundant expressions, communication optimizations, and load-reuse optimizations, employ PRE as an underlying technique for improving the efficiency of a program.
Concurrency and Computation: Practice and Experience | 2001
Thomas Fahringer; Peter Blaha; Andreas Hössinger; J. Luitz; Eduard Mehofer; Hans Moritsch; Bernhard Scholz
Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright
Journal of Parallel and Distributed Computing | 1999
Thomas Fahringer; Eduard Mehofer
This paper presents a new approach for optimizing communication of data parallel programs. Our techniques are based on unidirectional bit-vector data flow analyses that enable vectorizing, coalescing and aggregating communication, and overlapping communication with computation both within and across loop nests. Previous techniques are based on fixed communication optimization strategies whose quality is very sensitive to changes of machine and problem sizes. Our algorithm is novel in that we carefully examine tradeoffs between enhancing communication latency hiding and reducing the number and volume of messages by systematically evaluating a reasonable set of promising communication placements for a given program covering several (possibly conflicting) communication guiding profit motives. We useP3T, a state-of-the-art performance estimator, to ensure communication buffer safety and to find the best communication placement of all created ones. First results show that our method implies a significant reduction in communication costs and demonstrate the effectiveness of this analysis in improving the performance of programs.
international conference on parallel architectures and compilation techniques | 1997
Thomas Fahringer; Eduard Mehofer
The paper presents a novel approach to reduce communication costs of programs for distributed memory machines. The techniques are based on uni-directional bit-vector data flow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. The data flow analysis differs from previous techniques that it does not require to explicitly model balanced communication placement and loops and does not employ interval analysis. The techniques are based on simple yet highly effective data flow equations which are solved iteratively for arbitrary control flow graphs. Moving communication earlier to hide latency has been shown to dramatically increase communication buffer sizes and can even cause run-time errors. The authors use P/sup 3/T, a state-of-the-art performance estimator to create a buffer-safe program. By accurately estimating both the communication buffer sizes required and the implied communication times of every single communication of a program one can selectively choose communication that must be delayed in order to ensure a correct communication placement while maximizing communication latency hiding. Experimental results are presented to prove the efficacy of the communication optimization strategy.
european conference on parallel processing | 1997
Jens Knoop; Eduard Mehofer
Dynamic data redistribution is a key technique for maintaining data locality and workload balance in data parallel languages like HPF. On the other hand, redistributions can be very expensive and significantly degrade a programs performance. In this article, we present a novel and aggressive approach for avoiding unnecessary remappings by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving optimal results for sequential programs. Optimality, however, becomes more intricate by the combination. Unlike the sequential setting the data-parallel setting leads to a hierarchy of algorithms of varying power and efficiency fitting a users individual needs. The power and flexibility of the new approach are demonstrated by illustrating examples. First practical experiences underline its importance and effectivity.