Is this you? Create Your Porfile

Andreas Krall

Vienna University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andreas Krall is active.

Explore More

Publication

Featured researches published by Andreas Krall.

International Journal of Parallel Programming | 2000

Compilation techniques for multimedia processors

Andreas Krall; Sylvain Lelait

The huge processing power needed by multimedia applications has led to multimedia extensions in the instruction set of microprocessors which exploit subword parallelism. Examples of these extended instruction sets are the Visual Instruction Set of the UltraSPARC processor, the AltiVec instruction set of the PowerPC processor, the MMX and ISS extensions of the Pentium processors, and the MAX-2 instruction set of the HP PA-RISC processor. Currently, these extensions can only be used by programs written in assembly language, through system libraries or by calling specialized macros in a high-level language. Therefore, these instructions are not used by most applications. We propose two code generation techniques to produce native code using these multimedia extensions for programs written in a high-level language: classical vectorization and vectorization by unrolling. Vectorization by unrolling is simpler than classical vectorization since data dependence analysis is reduced to acyclic control flow graph analysis. Furthermore, we address the problem of unaligned memory accesses. This can be handled by both static analysis and dynamic runtime checking. Preliminary experimental results for a code generator for the UltraSPARC VIS instruction set show that speedups of up to a factor of 4.8 are possible, and that vectorization by unrolling is much simpler but as effective as classical vectorization.

international conference on parallel architectures and compilation techniques | 1998

Efficient JavaVM just-in-time compilation

Andreas Krall

Conventional compilers are designed for producing highly optimized code without paying much attention to compile time. The design goals of Java just-in-time compilers are different: produce fast code at the smallest possible compile time. In this article we present a very fast algorithm for translating JavaVM byte code to high quality machine code for RISC processors. This algorithm handles combines instructions, does copy elimination and coalescing and does register allocation. It comprises three passes: basic block determination, stack analysis and register preallocation, final register allocation and machine code generation. This algorithm replaces an older one in the CACAO JavaVM implementation reducing the compile time by a factor of seven and producing slightly faster machine code. The speedup comes mainly from following simplifications: fixed assignment of registers at basic block boundaries, simple register allocator better exception handling better memory management and fine tuning the implementation. The CACAO system is currently faster than every JavaVM implementation for the Alpha processor and generates machine code for all used methods of the javac compiler and its libraries in 60 milliseconds on an Alpha workstation.

conference on object-oriented programming systems, languages, and applications | 1997

Efficient type inclusion tests

Jan Vitek; R. Nigel Horspool; Andreas Krall

A type inclusion test determines whether one type is a subtype of another. Efficient type testing techniques exist for single subtyping, but not for languages with multiple subtyping. To date, the fast constant-time technique relies on a binary matrix encoding of the subtype relation with quadratic space requirements. In this paper, we present three new encodings of the subtype relation, the packed encoding, the bit-packed encoding and the compact encoding. These encodings have different characteristics. The bit-packed encoding delivers the best compression rates: on average 85% for real life programs. The packed encoding performs type inclusion tests in only 4 machine instructions. We present a fast algorithm for computing these encoding which runs in less than 13 milliseconds for PE and BPE, and 23 milliseconds for CE on an Alpha processor. Finally, we compare our results with other constant-time type inclusion tests on a suite of 11 large -benchmark hierarchies.

Concurrency and Computation: Practice and Experience | 1997

CACAO — A 64-bit JavaVM just-in-time compiler

Andreas Krall; Reinhard Grafl

This paper describes the design and implementation of CACAO, a just-in-time compiler for Java. The CACAO system translates Java byte code on demand into native code for the ALPHA processor. During this translation process the stack-oriented Java byte code is transformed into a register-oriented intermediate code. Local variables and stack locations are replaced by pseudo-registers eliminating the 32-bit restriction on address types. A fast register allocation algorithm is applied to map the pseudo-registers to machine registers. During code generation, field offsets are computed for proper alignment on 64-bit architectures. Even though the CACAO system has to incur loading and compilation time, it executes Java programs up to 85 times faster than the JDK interpreter, and up to seven times faster than the kaffe JIT compiler. It is slightly slower than equivalent C programs compiled at the highest optimization level.

Software - Practice and Experience | 2002

Vmgen: a generator of efficient virtual machine interpreters

M. Anton Ertl; David Gregg; Andreas Krall; Bernd Paysan

In a virtual machine interpreter, the code for each virtual machine instruction has similarities to code for other instructions. We present an interpreter generator that takes simple virtual machine instruction descriptions as input and generates C code for processing the instructions in several ways: execution, virtual machine code generation, disassembly, tracing, and profiling. The generator is designed to support efficient interpreters: it supports threaded code, aching the top‐of‐stack item in a register, combining simple instructions into superinstructions, and other optimizations. We have used the generator to create interpreters for Forth and Java. Theresulting interpreters are faster than other interpreters for the same languages and they are typically 2–10 times slower than code produced by native‐code compilers. We also present results for the effects of the individual optimizations supported by the generator. Copyright

european conference on object-oriented programming | 1997

Near optimal hierarchical encoding of types

Andreas Krall; Jan Vitek; R. Nigel Horspool

A type inclusion test is a procedure to decide whether two types are related by a given subtyping relationship. An efficient implementation of the type inclusion test plays an important role in the performance of object oriented programming languages with multiple subtyping like C++, Eiffel or Java. There are well-known methods for performing fast constant time type inclusion tests that use a hierarchical bit vector encoding of the partial ordered set representing the type hierarchy. The number of instructions required by the type inclusion test is proportional to the length of those bit vectors. We present a new algorithm based on graph coloring which computes a near optimal hierarchical encoding of type hierarchies. The new algorithm improves significantly on previous results - it is faster, simpler and generates smaller bit vectors.

design and diagnostics of electronic circuits and systems | 2011

DODT: Increasing requirements formalism using domain ontologies for improved embedded systems development

Stefan Farfeleder; Thomas Moser; Andreas Krall; Tor Stålhane; Herbert Zojer; Christian Panis

In times of ever-growing system complexity and thus increasing possibilities for errors, high-quality requirements are crucial to prevent design errors in later project phases and to facilitate design verification and validation. To ensure and improve the consistency, completeness and correctness of requirements, formal languages have been introduced as an alternative to using natural language (NL) requirement descriptions. However, in many cases existing NL requirements must be taken into account. The formalization of those requirements by now is a primarily manual task, which therefore is both cumbersome and error-prone. We introduce the tool DODT that semi-automatically transforms NL requirements into semi-formal boilerplate requirements. The transformation builds upon a domain ontology (DO) containing knowledge of the problem domain and upon natural language processing techniques. The tool strongly reduced the required manual effort for the transformation. In addition the quality of the requirements was improved.

Concurrency and Computation: Practice and Experience | 1998

Monitors and exceptions: how to implement Java efficiently

Andreas Krall; Mark Probst

Efficient implementation of monitors and exceptions is crucial for the performance of Java. One implementation of threads showed a factor of 30 difference in runtime on some benchmark programs. This paper describes an efficient implementation of monitors for Java as used in the CACAO just-in-time compiler. With this implementation the thread overhead is less than 40% for typical application programs and can be completely eliminated for some applications. This paper also gives the implementation details of the new exception handling scheme in CACAO. The new approach reduces the size of the generated native code by a half and allows null pointers to be checked by hardware. By using these techniques, the CACAO system has become the fastest JavaVM implementation for the Alpha processor.

programming language design and implementation | 1994

Improving semi-static branch prediction by code replication

Andreas Krall

Speculative execution on superscalar processors demands substantially better branch prediction than what has been previously available. In this paper we present code replication techniques that improve the accuracy of semi-static branch prediction to a level comparable to dynamic branch prediction schemes. Our technique uses profiling to collect information about the correlation between different branches and about the correlation between the subsequent outcomes of a single branch. Using this information and code replication the outcome of branches is represented in the program state. Our experiments have shown that the misprediction rate can almost be halved while the code size is increased by one third.

extended semantic web conference | 2011

Ontology-driven guidance for requirements elicitation

Stefan Farfeleder; Thomas Moser; Andreas Krall; Tor Stålhane; Inah Omoronyia; Herbert Zojer

Requirements managers aim at keeping their sets of requirements well-defined, consistent and up to date throughout a projects life cycle. Semantic web technologies have found many valuable applications in the field of requirements engineering, with most of them focusing on requirements analysis. However the usability of results originating from such requirements analyses strongly depends on the quality of the original requirements, which often are defined using natural language expressions without meaningful structures. In this work we present the prototypic implementation of a semantic guidance system used to assist requirements engineers with capturing requirements using a semiformal representation. The semantic guidance system uses concepts, relations and axioms of a domain ontology to provide a list of suggestions the requirements engineer can build on to define requirements. The semantic guidance system is evaluated based on a domain ontology and a set of requirements from the aerospace domain. The evaluation results show that the semantic guidance system effectively supports requirements engineers in defining well-structured requirements.

Explore More