Nikolay Mateev | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nikolay Mateev is active.

Explore More

Publication

Featured researches published by Nikolay Mateev.

international conference on supercomputing | 2000

Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Nawaaz Ahmed; Nikolay Mateev; Keshav Pingali

We present an approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of every statement in a loop nest into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like code sinking and loop fusion that are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space is then transformed further to enhance locality, after which fully permutable loops are tiled, and code is generated. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.

international symposium on microarchitecture | 2002

DELI: a new run-time control point

Giuseppe Desoli; Nikolay Mateev; Evelyn Duesterwald; Paolo Faraboschi; Joseph A. Fisher

The Dynamic Execution Layer Interface (DELI) offers the following unique capability: it provides fine-grain control over the execution of programs, by allowing its clients to observe and optionally manipulate every single instruction - at run time - just before it runs. DELI accomplishes this by opening up art interface to the layer between the execution of software and hardware. To avoid the slowdown, DELI caches a private copy of the executed code and always runs out of its own private cache. In addition to giving powerful control to clients, DELI opens up caching and linking to ordinary emulators and just-in-time compilers, which their get the reuse benefits of the same mechanism. For example, emulators themselves call also use other clients, to mix emulation with already existing services, native code, and other emulators. This paper describes the basic aspects of DELI, including the underlying caching and linking mechanism, the Hardware Abstraction Mechanism (HAM), the Binary-Level Translation (BLT) infrastructure, and the Application Programming Interface (API) exposed to the clients. We also cover some of the services that clients could offer through the DELI, such as ISA emulation, software patching, and sandboxing. Finally, we consider a case study of emulation in detail: the emulation of a PocketPC system on the Lx/ST210 embedded VLIW processor. In this case, DELI enables us to achieve near-native performance, and to mix-and-match native and emulated code.

conference on high performance computing (supercomputing) | 2000

Tiling Imperfectly-nested Loop Nests

Nawaaz Ahmed; Nikolay Mateev; Keshav Pingali

Tiling is one of the more important transformations for enhancing locality of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop nests in which all assignment statements are contained in the innermost loop) is well understood. In practice, many loop nests are imperfectly-nested, so existing compilers use heuristics to try to find a sequence of transformations that convert such loop nests into perfectly-nested ones, but these heuristics do not always succeed. In this paper, we propose a novel approach to tiling imperfectly-nested loop nests. The key idea is to embed the iteration space of every statement in the imperfectly-nested loop nest into a special space called the product space which is tiled to produce the final code. We evaluate the effectiveness of this approach for dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks. No other single approach in the literature can tile all these codes automatically.

international conference on supercomputing | 2000

Next-generation generic programming and its application to sparse matrix computations

Nikolay Mateev; Keshav Pingali; Paul Stodghill; Vladimir Kotlyar

The contributions of this paper are the following.We introduce a new variety of generic programming in which algorithm implementors use a different API than data structure designers, the gap between the APIs being bridged by restructuring compilers. One view of this approach is that it exploits restructuring compiler technology to perform a novel kind of template instantiation. We demonstrate the usefulness of this new generic programming technology by deploying it in a system that generates efficient sparse codes from high-level algorithms and specifications of sparse matrix formats. We argue that sparse matrix formats should be viewed as indexed-sequential access data structures (in the database sense), and show that appropriate abstractions of the index structure of common formats can be conveyed to a restructuring compiler through the type system of a modern language that supports inheritance and templates.

International Journal of Parallel Programming | 2001

Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests

Nawaaz Ahmed; Nikolay Mateev; Keshav Pingali

Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-nested loops into perfectly-nested loops by using statement sinking, loop fusion, etc., and then apply locality enhancing transformations to the resulting perfectly-nested loops, but the approaches used are fairly ad hoc and may fail even for simple programs. In this paper, we present a systematic approach for synthesizing transformations to enhance locality in imperfectly-nested loops. The key idea is to embed the iteration space of each statement into a special iteration space called the product space. The product space can be viewed as a perfectly-nested loop nest, so embedding generalizes techniques like statement sinking and loop fusion which are used in ad hoc ways in current compilers to produce perfectly-nested loops from imperfectly-nested ones. In contrast to these ad hoc techniques however, our embeddings are chosen carefully to enhance locality. The product space can itself be transformed to increase locality further, after which fully permutable loops can be tiled. The final code generation step may produce imperfectly-nested loops as output if that is desirable. We present experimental evidence for the effectiveness of this approach, using dense numerical linear algebra benchmarks, relaxation codes, and the tomcatv code from the SPEC benchmarks.

ACM Transactions on Programming Languages and Systems | 2003

Fractal symbolic analysis

Vijay Menon; Keshav Pingali; Nikolay Mateev

Modern compilers restructure programs to improve their efficiency. Dependence analysis is the most widely used technique for proving the correctness of such transformations, but it suffers from the limitation that it considers only the memory locations read and written by a statement without considering what is being computed by that statement. Exploiting the semantics of program statements permits more transformations to be proved correct, and is critical for automatic restructuring of codes such as LU with partial pivoting.One approach to exploiting the semantics of program statements is symbolic analysis and comparison of programs.In principle, this technique is very powerful, but in practice, it is intractable for all but the simplest programs.In this paper, we propose a new form of symbolic analysis and comparison of programs which is appropriate for use in restructuring compilers. Fractal symbolic analysis is an approximate symbolic analysis that compares a program and its transformed version by repeatedly simplifying these programs until symbolic analysis becomes tractable while ensuring that equality of the simplified programs is sufficient to guarantee equality of the original programs.Fractal symbolic analysis combines some of the power of symbolic analysis with the tractability of dependence analysis. We discuss a prototype implementation of fractal symbolic analysis, and show how it can be used to solve the long-open problem of verifying the correctness of transformations required to improve the cache performance of LU factorization with partial pivoting.

conference on high performance computing (supercomputing) | 2000

A Framework for Sparse Matrix Code Synthesis from High-level Specifications

Nawaaz Ahmed; Nikolay Mateev; Keshav Pingali; Paul Stodghill

We present compiler technology for synthesizing sparse matrix code from (i) dense matrix code, and (ii) a description of the index structure of a sparse matrix. Our approach is to embed statement instances into a Cartesian product of statement iteration and data spaces, and to produce efficient sparse code by identifying common enumerations for multiple references to sparse matrices. The approach works for imperfectly-nested codes with dependences, and produces sparse code competitive with hand-written library code for the Basic Linear Algebra Subroutines (BLAS).

european conference on parallel processing | 2000

Left-Looking to Right-Looking and Vice Versa: An Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring

Nikolay Mateev; Vijay Menon; Keshav Pingali

We have recently developed a new program analysis strategy called fractal symbolic analysis that addresses some of limitations of techniques such as dependence analysis. In this paper, we show how fractal symbolic analysis can be used to convert between left-looking and right-looking versions of three kernels of central importance in computational science: Cholesky factorization, LU factorization with pivoting, and triangular solve.

embedded software | 2002

A New Facility for Dynamic Control of Program Execution: DELI

Giuseppe Desoli; Nikolay Mateev; Evelyn Duesterwald; Paolo Faraboschi; Josh Fisher

The DELI (Dynamic Execution Layer Interface) provides fine-grain control over the execution of programs, by allowing its clients to observe and optionally manipulate every single instruction at run time. It accomplishes this by opening up an interface to the layer between the execution of application software and hardware. To avoid the 100x implicit slowdown, DELI uses a technique typical of modern emulators: it caches fragments of the executable and always runs out of that cache. Unlike previous systems, DELI exposes the caching through a common interface, so that emulators themselves can take advantage of other DELI clients. This enables mixing emulation with already existing services and native code. In this paper, we describe the basic aspects of DELI: the underlying caching and linking mechanism, the Hardware Abstraction Mechanism (HAM), the Binary-Level Translation (BLT) infrastructure, and the Application Programming Interface (API). We also cover some uses, such as ISA emulation and software patching. Finally, we present emulation results of a PocketPC system on an embedded VLIW processor, where we achieve almost-native performance, and show how to mix-and-match native and emulated code.

Archive | 2001