Eduard Mehofer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eduard Mehofer is active.

Explore More

Publication

Featured researches published by Eduard Mehofer.

european conference on parallel processing | 2009

Towards an Intelligent Environment for Programming Multi-core Computing Systems

Sabri Pllana; Siegfried Benkner; Eduard Mehofer; Lasse Natvig; Fatos Xhafa

In this position paper we argue that an intelligent program development environment that proactively supports the user helps a mainstream programmer to overcome the difficulties of programming multi-core computing systems. We propose a programming environment based on intelligent software agents that enables users to work at a high level of abstraction while automating low-level implementation activities. The programming environment supports program composition in a model-driven development fashion using parallel building blocks and proactively assists the user during major phases of program development and performance tuning. We highlight the potential benefits of using such a programming environment with usage-scenarios. An experiment with a parallel building block on a Sun UltraSPARC T2 Plus processor shows how the system may assist the programmer in achieving performance improvements.

IEEE Transactions on Parallel and Distributed Systems | 2002

Distribution assignment placement: effective optimization of redistribution costs

Jens Knoop; Eduard Mehofer

Data locality and workload balance are key factors for getting high performance out of data-parallel programs on multiprocessor architectures. Data-parallel languages such as High-Performance Fortran (HPF) thus offer means allowing a programmer both to specify data distributions and to change them dynamically in order to maintain these properties. On the other hand, redistributions can be quite expensive and can significantly degrade a programs performance. They must thus be reduced to a minimum. In this article, we present a novel, aggressive approach for avoiding unnecessary remappings, which works by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations, each achieving optimal results on its own. In distinction to the sequential setting, the data-parallel setting leads naturally to a family of algorithms of varying power and efficiency, allowing requirement-customized solutions. The power and flexibility of the new approach are demonstrated by various examples, which range from typical HPF fragments to real-world programs. Performance measurements underline its importance and show its effectiveness on different hardware platforms and in different settings.

compiler construction | 2001

A Novel Probabilistic Data Flow Framework

Eduard Mehofer; Bernhard Scholz

Classical data flow analysis determines whether a data flow fact may hold or does not hold at some program point. Probabilistic data flow systems compute a range, i.e. a probability, with which a data flow fact will hold at some program point. In this paper we develop a novel, practicable framework for probabilistic data flow problems. In contrast to other approaches, we utilize execution history for calculating the probabilities of data flow facts. In this way we achieve significantly better results. Effectiveness and efficiency of our approach are shown by compiling and running the SPECint95 benchmark suite.

Sigplan Notices | 2000

Probabilistic data flow system with two-edge profiling

Eduard Mehofer; Bernhard Scholz

Traditionally optimization is done statistically independent of actual execution environments. For generating highly optimized code, however, runtime information can be used to adapt a program to different environments. In probabilistic data flow systems runtime information on representative input data is exploited to compute the probability with what data flow facts may hold. Probabilistic data flow analysis can guide sophisticated optimizing transformations resulting in better performance. In comparison classical data flow analysis does not take runtime information into account. All paths are equally weighted irrespectively whether they are never, heavily, or rarely executed. In this paper we present the best solution what we can theoretically obtain for probabilistic data flow problems and compare it with the state-of-the-art one-edge approach. We show that the differences can be considerable and improvements are crucial. However, the theoretically best solution is too expensive in general and feasible approaches are required. In the sequel we develop an efficient approach which employs two-edge profiling and classical data flow analysis. We show that the results of the two-edge approach are significantly better than the state-of-the-art one-edge approach.

international conference on parallel architectures and compilation techniques | 2002

Dataflow frequency analysis based on whole program paths

Bernhard Scholz; Eduard Mehofer

Efficient use of machine resources in high-performance computer systems requires highly optimizing compilers with sophisticated analyses. Static analysis often fails to identify frequently executed portions of a program which are the places where optimizations achieve the greatest benefit. This paper introduces a novel data flow frequency analysis framework that computes the frequency with which a data flow fact will hold at some program point based on profiling information. Several approaches which approximate the frequencies based on k-edge profiling have been presented. However, no feasible approach for obtaining the accurate solution exists so far. Recently, efficient techniques for recording whole program paths (WPPs) have been developed. Our approach for computing data flow frequencies results in an accurate solution and utilizes WPPs to obtain the solution in reasonable time. In our experiments we show that the execution time of WPP-based frequency analysis is in case of the SPEC benchmark suite only a fraction of the overall compilation time.

european conference on parallel processing | 2003

Partial Redundancy Elimination with Predication Techniques

Bernhard Scholz; Eduard Mehofer; R. Nigel Horspool

Partial redundancy elimination (PRE) techniques play an important role in optimizing compilers. Many optimizations, such as elimination of redundant expressions, communication optimizations, and load-reuse optimizations, employ PRE as an underlying technique for improving the efficiency of a program.

Concurrency and Computation: Practice and Experience | 2001

Development and performance analysis of real‐world applications for distributed and parallel architectures

Thomas Fahringer; Peter Blaha; Andreas Hössinger; J. Luitz; Eduard Mehofer; Hans Moritsch; Bernhard Scholz

Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright

Journal of Parallel and Distributed Computing | 1999

Buffer-Safe and Cost-Driven Communication Optimization

Thomas Fahringer; Eduard Mehofer

This paper presents a new approach for optimizing communication of data parallel programs. Our techniques are based on unidirectional bit-vector data flow analyses that enable vectorizing, coalescing and aggregating communication, and overlapping communication with computation both within and across loop nests. Previous techniques are based on fixed communication optimization strategies whose quality is very sensitive to changes of machine and problem sizes. Our algorithm is novel in that we carefully examine tradeoffs between enhancing communication latency hiding and reducing the number and volume of messages by systematically evaluating a reasonable set of promising communication placements for a given program covering several (possibly conflicting) communication guiding profit motives. We useP3T, a state-of-the-art performance estimator, to ensure communication buffer safety and to find the best communication placement of all created ones. First results show that our method implies a significant reduction in communication costs and demonstrate the effectiveness of this analysis in improving the performance of programs.

international conference on parallel architectures and compilation techniques | 1997

Buffer-safe communication optimization based on data flow analysis and performance prediction

Thomas Fahringer; Eduard Mehofer

The paper presents a novel approach to reduce communication costs of programs for distributed memory machines. The techniques are based on uni-directional bit-vector data flow analysis that enable vectorizing and coalescing communication, overlapping communication with computation, eliminating redundant messages and amount of data being transferred both within and across loop nests. The data flow analysis differs from previous techniques that it does not require to explicitly model balanced communication placement and loops and does not employ interval analysis. The techniques are based on simple yet highly effective data flow equations which are solved iteratively for arbitrary control flow graphs. Moving communication earlier to hide latency has been shown to dramatically increase communication buffer sizes and can even cause run-time errors. The authors use P/sup 3/T, a state-of-the-art performance estimator to create a buffer-safe program. By accurately estimating both the communication buffer sizes required and the implied communication times of every single communication of a program one can selectively choose communication that must be delayed in order to ensure a correct communication placement while maximizing communication latency hiding. Experimental results are presented to prove the efficacy of the communication optimization strategy.

european conference on parallel processing | 1997

Optimal Distribution Assignment Placement

Jens Knoop; Eduard Mehofer

Dynamic data redistribution is a key technique for maintaining data locality and workload balance in data parallel languages like HPF. On the other hand, redistributions can be very expensive and significantly degrade a programs performance. In this article, we present a novel and aggressive approach for avoiding unnecessary remappings by eliminating partially dead and partially redundant distribution changes. Basically, this approach evolves from extending and combining two algorithms for these optimizations achieving optimal results for sequential programs. Optimality, however, becomes more intricate by the combination. Unlike the sequential setting the data-parallel setting leads to a hierarchy of algorithms of varying power and efficiency fitting a users individual needs. The power and flexibility of the new approach are demonstrated by illustrating examples. First practical experiences underline its importance and effectivity.

Explore More