Michael G. Burke | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael G. Burke is active.

Explore More

Publication

Featured researches published by Michael G. Burke.

symposium on principles of programming languages | 1993

Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects

Jong-Deok Choi; Michael G. Burke; Paul R. Carini

We present practical approximation methods for computing interprocedural aliases and side effects for a program written in a language that includes pointers, reference parameters and recursion. We present the following results: 1) An algorithm for flow-sensitive interprocedural alias analysis which is more precise and efficient than the best interprocedural method known. 2) An extension of traditional flow-insensitive alias analysis which accommodates pointers and provides a framework for a family of algorithms which trade off precision for efficiency. 3) An algorithm which correctly computes side effects in the presence of pointers. Pointers cannot be correctly handled by conventional methods for side effect analysis. 4) An alias naming technique which handles dynamically allocated objects and guarantees the correctness of data-flow analysis. 5) A compact representation based on transitive reduction which does not result in a loss of precision and improves precision in some case. 6) A method for intraprocedural alias analysis which is based on a sparse representation.

Proceedings of the ACM 1999 conference on Java Grande | 1999

The Jalapeño dynamic optimizing compiler for Java

Michael G. Burke; Jong-Deok Choi; Stephen J. Fink; David Grove; Michael Hind; Vivek Sarkar; Mauricio J. Serrano; Vugranam C. Sreedhar; Harini Srinivasan; John Whaley

interpretation Loop: Parse bytecode Update state Rectify state with Successor basic blocks Main Initialization Choose basic block from set Figure 4: Overview of BC2IR algorithm class t1 { static float foo(A a, B b, float c1, float c3) { float c2 = c1/c3; return(c1*a.f1 + c2*a.f2 + c3*b.f1); } } Figure 5: An example Java program element-wise meet operation is used on the stack operands to update the symbolic state [38]. When a backward branch whose target is the middle of an already-generated basic block is encountered, the basic block is split at that point. If the stack is not empty at the start of the split BB, the basic block must be regenerated because the initial states may be incorrect. The initial state of a BB may also be incorrect due to as-of-yet-unseen control ow joins. To minimize the number of a times HIR is generated for a BB a simple greedy algorithm is used for selecting BBs in the main loop. When selecting a BB to generate the HIR, the BB with the lowest starting bytecode index is chosen. This simple heuristic relies on the fact that, except for loops, all controlow constructs are generated in topological order, and that the control ow graph is reducible. Surprisingly, for programs compiled with current Java compilers, the greedy algorithm can always nd the optimal ordering in practice.5 5The optimal order for basic block generation that minimizes number of regeneration is a topological order (ignoring the back edges). However, because BC2IR computes the control ow graph in the same pass, it cannot compute the optimal order a priori. Example: Figure 5 shows an example Java source program of class t1, and Figure 6 shows the HIR for method foo of the example. The number on the rst column of each HIR instruction is the index of the bytecode from which the instruction is generated. Before compiling class t1, we compiled and loaded class B, but not class A. As a result, the HIR instructions for accessing elds of class A, bytecode indices 7 and 14 in Figure 6, are getfield unresolved, while the HIR instruction accessing a eld of class B, bytecode index 21, is a regular getfield instruction. Also notice that there is only one null check instruction that covers both getfield unresolved instructions; this is a result of BC2IRs on-they optimizations. 0 LABEL0 B0@0 2 float_div l4(float) = l2(float), l3(float) 7 null_check l0(A, NonNull) 7 getfield_unresolved t5(float) = l0(A), < A.f1> 10 float_mul t6(float) = l2(float), t5(float) 14 getfield_unresolved t7(float) = l0(A, NonNull), < A.f2> 17 float_mul t8(float) = l4(float), t7(float) 18 float_add t9(float) = t6(float), t8(float) 21 null_check l1(B, NonNull) 21 getfield t10(float) = l1(B), < B.f1> 24 float_mul t11(float) = l3(float), t10(float) 25 float_add t12(float) = t9(float), t11(float) 26 float_return t12(float) END_BBLOCK B0@0 Figure 6: HIR of method foo(). l and t are virtual registers for local variables and temporary operands, respectively. 5.2 On-the-Fly Analyses and Optimizations To illustrate our approach to on-they optimizations we consider copy propagation as an example. Java bytecode often contains sequences that perform a calculation and store the result into a local variable (see Figure 7). A simple copy propagation can eliminate most of the unnecessary temporaries. When storing from a temporary into a local variable, BC2IR inspects the most recently generated instruction. If its result is the same temporary, the instruction is modi ed to write the value directly to the local variable instead. Other optimizations such as constant propagation, dead Java bytecode Generated IR Generated IR (optimization off) (optimization on) ---------------------------------------------iload x INT_ADD tint, xint, 5 INT_ADD yint, xint, 5 iconst 5 INT_MOVE yint, tint iadd istore y Figure 7: Example of limited copy propagation and dead code elimination code elimination, register renaming for local variables, method inlining, etc. are performed during the translation process. Further details are provided in [38]. 6 Jalape~ no Optimizing Compiler Back-end In this section, we describe the back-end of the Jalape~ no Optimizing Compiler. 6.1 Lowering of the IR After high-level analyses and optimizations are performed, HIR is lowered to low-level IR (LIR). In contrast to HIR, the LIR expands instructions into operations that are speci c to the Jalape~ no JVM implementation, such as object layouts or parameter-passing mechanisms of the Jalape~ no JVM. For example, operations in HIR to invoke methods of an object or of a class consist of a single instruction, closely matching the corresponding bytecode instructions such as invokevirtual/invokestatic. These single-instruction HIR operations are lowered (i.e., converted) into multiple-instruction LIR operations that invoke the methods based on the virtualfunction-table layout. These multiple LIR operations expose more opportunities for low-level optimizations. 0 LABEL0 B0@0 2 float_div l4(float) = l2(float), l3(float) (n1) 7 null_check l0(A, NonNull) (n2) 7 getfield_unresolved t5(float) = l0(A), <A.f1> (n3) 10 float_mul t6(float) = l2(float), t5(float) (n4) 14 getfield_unresolved t7(float) = l0(A, NonNull), <A.f2>(n5) 17 float_mul t8(float) = l4(float), t7(float) (n6) 18 float_add t9(float) = t6(float), t8(float) (n7) 21 null_check l1(B, NonNull) (n8) 21 float_load t10(float) = @{ l1(B), -16 } (n9) 24 float_mul t11(float) = l3(float), t10(float) (n10) 25 float_add t12(float) = t9(float), t11(float) (n11) 26 return t12(float) (n12) END_BBLOCK B0@0 Figure 8: LIR of method foo() Example: Figure 8 shows the LIR for method foo of the example in Figure 5. The labels (n1) through (n12) on the far right of each instruction indicate the corresponding node in the data dependence graph shown in Figure 9. 6.2 Dependence Graph Construction We construct an instruction-level dependence graph, used during BURS code generation (Section 6.3), for each basic block that captures register true/anti/output dependences, reg_true n12 excep reg_true excep reg_true reg_true reg_true control reg_true excep reg_true excep reg_true reg_true n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 Figure 9: Dependence graph of basic block in method foo() memory true/anti/output dependences, and control dependences. The current implementation of memory dependences makes conservative assumptions about alias information. Synchronization constraints are modeled by introducing synchronization dependence edges between synchronization operations (monitor enter and monitor exit) and memory operations. These edges prevent code motion of memory operations across synchronization points. Java exception semantics [29] is modeled by exception dependence edges, which connect di erent exception points in a basic block. Exception dependence edges are also added between register write operations of local variables and exception points in the basic block. Exception dependence edges between register operations and exceptions points need not be added if the corresponding method does not have catch blocks. This precise modeling of dependence constraints allows us to perform more aggressive code generation. Example: Figure 9 shows the dependence graph for the single basic block in method foo() of Figure 5. The graph, constructed from the LIR for the method, shows registertrue dependence edges, exception dependence edges, and a control dependence edge from the rst instruction to the last instruction in the basic block. There are no memory dependence edges because the basic block contains only loads and no stores, and we do not currently model load-load input dependences6 . An exception dependence edge is created between an instruction that tests for an exception (such as null check) and an instruction that depends on the result of the test (such as getfield). 6.3 BURS-based Retargetable Code Generation In this section, we address the problem of using tree-patternmatching systems to perform retargetable code generation after code optimization in the Jalape~ no Optimizing Compiler [33]. Our solution is based on partitioning a basic 6The addition of load-load memory dependences will be necessary to correctly support the Java memory model for multithreaded programs that contain data races. input LIR: DAG/tree: input grammar (relevant rules): move r2=r0 not r3=r1 and r4=r2,r3 cmp r5=r4,0 if r5,!=,LBL emitted instructions: andc. r4,r0,r1 bne LBL IF CMP AND MOVE r0 0 NOT r1 RULE PATTERN COST ------------1 reg: REGISTER 0 2 reg: MOVE(reg) 0 3 reg: NOT(reg) 1 4 reg: AND(reg,reg) 1 5 reg: CMP(reg,INTEGER) 1 6 stm: IF(reg) 1 7 stm: IF(CMP(AND(reg, 2 NOT(reg)),ZERO))) Figure 10: Example of tree pattern matching for PowerPC block dependence graph (de ned in Section 6.2) into trees that can be given as input to a BURS-based tree-patternmatching system [15]. Unlike previous approaches to partitioning DAGs for tree-pattern-matching (e.g., [17]), our approach considers partitioning in the presence of memory and exception dependences (not just register-true dependences). We have de ned legality constraints for this partitioning, and developed a partitioning algorithm that incorporates code duplication. Figure 10 shows a simple example of pattern matching for the PowerPC. The data dependence graph is partitioned into trees before using BURS. Then, pattern matching is applied on the trees using a grammar (relevant fragments are illustrated in Figure 10). Each grammar rule has an associated cost, in this case the number of instructions that the rule will generate. For example, rule 2 has a zero cost because it is used to eliminate unnecessary register moves, i.e., coalescing. Although rules 3, 4, 5, and 6 could be used to parse the tree, the pattern matching selects rules 1, 2, and 7 as the ones with the least cost to cover the tree. Once these rules are selected as the least cover of the tree, the selected code is emitted as MIR instructions. Thus, for our example, only two PowerPC instructions are emitted for v

compiler construction | 1986

Interprocedural dependence analysis and parallelization

Michael G. Burke; Ron K. Cytron

The area of dependence analysis has served as grounds for fruitful research as well as practical implementation. Compilers and tools that utilize dependence information can generate code that takes advantage of parallel resources and storage hierarchies on modern architectures. Here, we offer some historical background on the context and thinking that fostered our 1986 paper. We also attempt to summarize the direction research in this area has taken since the papers appearance.We present a method that combines a deep analysis of program dependences with a broad analysis of the interaction among procedures. The method is more efficient than existing methods: we reduce many tests, performed separately by existing methods, to a single test. The method is more precise than existing methods with respect to references to multi-dimensional arrays and dependence information hidden by procedure calls. The method is more general than existing methods: we accommodate potentially aliased variables and structures of differing shapes that share storage. We accomplish the above through a unified approach that integrates subscript analysis with aliasing and interprocedural information.

international conference on supercomputing | 2014

An overview of the PTRAN analysis system for multiprocessing

Frances E. Allen; Michael G. Burke; Philippe Charles; Ron Cytron; Jeanne Ferrante

PTRAN (Parallel TRANslator) is a system for automatically restructuring sequential FORTRAN programs for execution on parallel architectures. This paper describes PTRAN-A: the currently operational analysis phase of PTRAN. The analysis is both broad and deep, incorporating interprocedural information into dependence analysis. The system is organized around a persistent database of program and procedure information. PTRAN incorporates several new, fast algorithms in a pragmatic design.

ACM Transactions on Programming Languages and Systems | 1999

Interprocedural pointer alias analysis

Michael Hind; Michael G. Burke; Paul R. Carini; Jong-Deok Choi

We present practical approximation methods for computing and representing interprocedural aliases for a program written in a language that includes pointers, reference parameters, and recursion. We present the following contributions: (1) a framework for interprocedural pointer alias analysis that handles function pointers by constructing the program call graph while alias analysis is being performed; (2) a flow-sensitive interprocedural pointer alias analysis algorithm; (3) a flow-insensitive interprocedural pointer alias analysis algorithm; (4) a flow-insensitive interprocedural pointer alias analysis algorithm that incorporates kill information to improve precision; (5) empirical measurements of the efficiency and precision of the three interprocedural alias analysis algorithms.

ACM Transactions on Programming Languages and Systems | 1990

An interval-based approach to exhaustive and incremental interprocedural data-flow analysis

Michael G. Burke

We reformulate interval analysis so that it can he applied to any monotone data-flow problem, including the nonfast problems of flow-insensitive interprocedural analysis. We then develop an incremental interval analysis technique that can be applied to the same class of problems. When applied to flow-insensitive interprocedural data-flow problems, the resulting algorithms are simple, practical, and efficient. With a single update, the incremental algorithm can accommodate any sequence of program changes that does not alter the structure of the program call graph. It can also accommodate a large class of structural changes. For alias analysis, we develop an incremental algorithm that obtains the exact solution as computed by an exhaustive algorithm. Finally, we develop a transitive closure algorithm that is particularly well suited to the very sparse matrices associated with the problems we address.

Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism archive | 2010

Concurrent Collections

Zoran Budimlic; Michael G. Burke; Vincent Cavé; Kathleen Knobe; Geoff Lowney; Ryan R. Newton; Jens Palsberg; David M. Peixotto; Vivek Sarkar; Frank Schlimbach; Sagnak Tasirlar

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.

Journal of Parallel and Distributed Computing | 1988

An overview for the PTRAN analysis system for multiprocessing

Frances E. Allen; Michael G. Burke; Philippe Charles; Ron Cytron; Jeanne Ferreant

languages and compilers for parallel computing | 1994

Flow-Insensitive Interprocedural Alias Analysis in the Presence of Pointers

Michael G. Burke; Paul R. Carini; Jong-Deok Choi; Michael Hind

Data-flow analysis algorithms can be classified into two categories: flow-sensitive and flow-insensitive. To improve efficiency, flow insensitive interprocedural analyses do not make use of the intraprocedural control flow information associated with individual procedures. Since pointer-induced aliases can change within a procedure, applying known flow-insensitive analyses can result in either incorrect or overly conservative solutions. In this paper, we present a flow-insensitive data flow analysis algorithm that computes interprocedural pointer-induced aliases. We improve the precision of our analysis by (1) making use of certain types of kill information that can be precomputed efficiently, and (2) computing aliases generated in each procedure instead of holding at the exit of each procedure. We improve the efficiency of our algorithm by introducing a technique called deferred evaluation.

programming language design and implementation | 2000

A framework for interprocedural optimization in the presence of dynamic class loading

Vugranam C. Sreedhar; Michael G. Burke; Jong-Deok Choi

Dynamic class loading during program execution in the Java Programming Language is an impediment for generating code that is as efficient as code generated using static whole-program analysis and optimization. Whole-program analysis and optimization is possible for languages, such as C++, that do not allow new classes and/or methods to be loaded during program execution. One solution for performing whole-program analysis andavoiding incorrect execution after a new class is loaded is to invalidate and recompile affected methods. Runtime invalidation and recompilation mechanisms can be expensive in both space and time, and, therefore, generally restrict optimization. To address these drawbacks, we propose a new framework, called the extant analysis framework, for interprocedural optimization of programs that support dynamic class (or method)loading. Given a set of classes comprising the closed world, we perform an offline static analysis which partitions references into two categories:(1) unconditionally extant references which point only to objects whose runtime type is guaranteed to be in the closed world; and (2) conditionally extant references which point to objects whose runtime type is not guaranteed to be in the closed world. Optimizations solely dependent on the first categorycan be statically performed, and are guaranteed to be correct even with any future class/method loading. Optimizations dependent on the second category are guarded by dynamic tests, called extant safety tests, for correct execution behavior.We describe the properties for extant safety tests, and provide algorithms for their generation and placement.

Explore More