Dror E. Maydan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dror E. Maydan is active.

Explore More

Publication

Featured researches published by Dror E. Maydan.

international symposium on microarchitecture | 1996

Combining loop transformations considering caches and scheduling

Michael Wolf; Dror E. Maydan; Ding-Kai Chen

The performance of modern microprocessors is greatly affected by cache behavior, instruction scheduling, register allocation and loop overhead. High-level loop transformations such as fission, fusion, tiling, interchanging and outer loop unrolling (e.g., unroll and jam) are well known to be capable of improving all these aspects of performance. Difficulties arise because these machine characteristics and these optimizations are highly interdependent. Interchanging two loops might, for example, improve cache behavior but make it impossible to allocate registers in the inner loop. Similarly, unrolling or interchanging a loop might individually hurt performance but doing both simultaneously might help performance. Little work has been published on how to combine these transformations into an efficient and effective compiler algorithm. In this paper, we present a model that estimates total machine cycle time taking into account cache misses, software pipelining, register pressure and loop overhead. We then develop an algorithm to intelligently search through the various, possible transformations, using our machine model to select the set of transformations leading to the best overall performance. We have implemented this algorithm as part of the MIPSPro commercial compiler system. We give experimental results showing that our approach is both effective and efficient in optimizing numerical programs.

programming language design and implementation | 1991

Efficient and exact data dependence analysis

Dror E. Maydan; John L. Hennessy; Monica S. Lam

Data dependence testing is the basic step in detecting loop level parallelism in numerieal programs. The problem is equivalent to integer linear programming and thus in general cannot be solved efficiently. Current methods in use employ inexact methods that sacrifice potential parallelism in order to improve compiler efficiency. This paper shows that in practice, data dependence can be computed exactly and efficiently. There are three major ideas that lead to this result. First, we have developed and assembled a small set of efficient algorithms, each one exact for special case inputs. Combined with a moderately expensive backup test, they are exact for all the cases we have seen in practice. Second, we introduce a memorization technique to save results of previous tests, thus avoiding calling the data dependence routines multiple times on the same input. Third, we show that this approach can both be extended to compute distance and direction vectors and to use unknowns ymbolic terms without any loss of accuracy or efficiency, We have implemented our algorithm in the SUIF system, a general purpose compiler system developwl at Stanford. We ran the algorithm on the PERFECT Club Benchmarks and our data dependence analyzer gave an exact solution in all cases efficiently,

symposium on principles of programming languages | 1993

Array-data flow analysis and its use in array privatization

Dror E. Maydan; Saman P. Amarasinghe; Monica S. Lam

Data-flow analysis of scalar variables and data dependence analysis on array elements are two important program analyses used in optimizing and parallelizing compilers. Traditional data-flow analysis models accesses of array elements simply as accesses to the entire array, and is inadequate for parallelizing loops in array-based programs. On the other hand, data dependence analysis differentiates between different array elements but is flow-insensitive. This paper studies the combination of these two analyses—data-flow analyses—data-flow analysis of accesses to individual array elements. The problem of finding precise array dataflow information in the domain of loop nests where the loop bounds and array indices are affine functions of loop indices was first formulated by Feautrier. Feautriers algorithm, based on parametric integer programming techniques, is general but inefficient. This paper presents an efficient algorithm that can find the same precise information for many of the programs found in practice. In this paper, we argue that data-flow analysis of individual array elements is necessary for effective automatic parallelization. In particular, we demonstrate the use of array data-flow analysis in an important optimization known as array privatization. By demonstrating that array data-flow analysis can be computed efficiently and by showing the importance of the optimizations enabled by the analysis, this paper suggests that array data-flow analysis may become just as important in future optimizing and parallelizing compilers as data-flow and data dependence analysis are in todays compilers.

programming language design and implementation | 1997

Data distribution support on distributed shared memory multiprocessors

Rohit Chandra; Ding-Kai Chen; Robert Cox; Dror E. Maydan; Nenad Nedeljkovic; Jennifer-Ann M. Anderson

Cache-coherent multiprocessors with distributed shared memory are becoming increasingly popular for parallel computing. However, obtaining high performance on these machines mquires that an application execute with good data locality. In addition to making efiective use of caches, it is often necessary to distribute data structures across the local memories of the processing nodes, thereby reducing the latency of cache misses.We have designed a set of abstractions for performing data distribution in the context of explicitly parallel programs and implemented them within the SGI MIPSpro compiler system. Our system incorporates many unique features to enhance both programmability and performance. We address the former by providing a very simple programmming model with extensive support for error detection. Regarding performance, we carefully design the user abstractions with the underlying compiler optimizations in mind, we incorporate several optimization techniques to generate efficient code for accessing distributed data, and we provide a tight integration of these techniques with other optimizations within the compiler Our initial experience suggests that the directives are easy to use and can yield substantial performance gains, in some cases by as much as a factor of 3 over the same codes without distribution.

languages and compilers for parallel computing | 1992

Data Dependence and Data-Flow Analysis of Arrays

Dror E. Maydan; S. Amarsinghe; Monica S. Lam

The power of any compiler is derived from, and also limited by, its program analyzers. Finding the right abstraction for program analysis is crucial in the development of compiler technology. For the abstraction to be useful, it must include sufficient information to support the code optimizations and transformations. In addition, it must be tractable to extract the information from at least a large enough set of programs. Data dependence, distance vectors and direction vectors are important data abstractions that have been proven useful for parallelization and loop transformations, and they are applicable to a large set of programs. The parallelizing compiler research frontier is currently pushing the limits of traditional data dependence analysis. For example, evaluation of today’s compiler technology suggests that reuse of data arrays in programs greatly reduces the opportunities for parallelism. Many more loops can be parallelized by privatizing the work arrays, that is, assigning a separate copy of the array to each processor. The information needed to support such an optimization is not justmemory disambiguation , that is, if two references can refer to the same location. To determine if each iteration can have its own copy of the array, we need data-flow analysison individual array elements. Another important research topic is code generation for distributed memory machines. The compiler is responsible for maintaining the coherence of data across processors. Generating efficient distributed memory code requires exact data-flow relationships between accesses to individual array locations. This paper proposes several analysis techniques that are useful for higher level optimizations. As we develop new data dependence abstractions, it is useful to relate the different abstractions in a uniform mathematical framework. We show that the various existing data dependence abstractions can be viewed as simply different approximations to the same dependence problem. For example, we say that two references are dependent as long as one dynamic pair of the instances of the two references can refer to the same location. Dependence levels, direction vectors and distance vectors provide more accurate approximations. We can define the approximations precisely using the mathematical concept of equivalence classes. Simple data dependence is a lower limit of the approximation in that all pairs of instances are said to belong to the same single equivalence class. The upper

International Journal of Parallel Programming | 1995

Effectiveness of data dependence analysis

Dror E. Maydan; John L. Hennessy; Monica S. Lam

Data dependence testing is the basic step in detecting loop level parallelism in numerical programs. The problem is undecidable in the general case. Therefore, work has been concentrated on a simplified problem, affine memory disambiguation. In this simpler domain, array references and loops bounds are assumed to be linear integer functions of loop variables. Dataflow information is ignored. For this domain, we have shown that in practice the problem can be solved accurately and efficiently.(1) This paper studies empirically the effectiveness of this domain restriction, how many real references are affine and flow insensitive. We use Laruss llpp system(2) to find all the data dependences dynamically. We compare these to the results given by our affine memory disambiguation system. This system is exact for all the cases we see in practice. We show that while the affine approximation is reasonable, memory disambiguation is not a sufficient approximation for data dependence analysis. We propose extensions to improve the analysis.

Archive | 2009

Generation and Use of an ASIP Software Tool Chain

Sterling Augustine; Marc Gauthier; Steve Leibson; Peter Macliesh; Grant Martin; Dror E. Maydan; Nenad Nedeljkovic; Bob Wilson

Software-development tool chains are hardware-dependent by their nature, because compilers and assemblers targeted to specific processors must generate target-specific code. However, a processor that is both configurable and extensible, with a variable instruction set architecture (ISA) melded to a basic architecture compounds the problems of adapting the software development tools to specific processor configurations. The only tractable way to support such extensible processor ISAs is through a highly automated tool-generation flow that allows the dynamic creation and adaptation of the development-tool chain to a specific instance of the processor. To be of practical use, this process (automated tool generation) must transpire in minutes. This chapter discusses the issues of application-specific instruction set processor (ASIP) configurability and extensibility as they relate to all the elements of a software development tool chain ranging from an integrated development environment (IDE) to compilers, profilers, instruction-set simulators (ISS), operating systems, and many other development tools and middleware. In addition to drawing out the issues involved, we illustrate possible solutions to these hardware-dependent software (HdS) problems by drawing on the experience of developing Tensilica’s Xtensa processor, as an example.

Archive | 2000

Parallel Programming in Openmp

Robit Chandra; Leonardo Dagum; Dave Kohr; Dror E. Maydan; Jeff McDonald; Ramesh Menon

Archive | 2001

Automated processor generation system for designing a configurable processor and method for the same

Earl A. Killian; Ricardo E. Gonzalez; Ashish B. Dixit; Monica S. Lam; Walter D. Lichtenstein; Christopher Rowen; John C. Ruttenberg; Robert P. Wilson; Albert Wang; Dror E. Maydan

Archive | 2008

Automated processor generation system and method for designing a configurable processor

Albert Wang; Richard Ruddell; David William Goodwin; Earl A. Killian; Nupur Bhattacharyya; Marines Puig Medina; Walter D. Lichtenstein; Pavlos Konas; Rangarajan Srinivasan; Christopher Mark Songer; Akilesh Parameswar; Dror E. Maydan; Ricardo E. Gonzalez

Explore More