Kleanthis Psarris | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kleanthis Psarris is active.

Explore More

Publication

Featured researches published by Kleanthis Psarris.

IEEE Transactions on Parallel and Distributed Systems | 1991

The I test: an improved dependence test for automatic parallelization and vectorization

Xiangyun Kong; David Klappholz; Kleanthis Psarris

The I test is a subscript dependence test which extends both the range of applicability and the accuracy of the GCD and Banerjee tests (U. Banerjee, 1976), standard subscript dependence tests used to determine whether loops may be parallelized/vectorized. It is shown that the I test is useful when, in the event that a positive result must be reported, a definitive positive is of more use than a tentative positive and when insufficient loop iterations are known for the Banerjee test to apply. >

parallel computing | 1999

Compilation techniques for parallel systems

Rajiv Gupta; Santosh Pande; Kleanthis Psarris; Vivek Sarkar

Abstract Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through which parallelism is detected and expressed in form of a program representation. Next compilation techniques for scheduling instruction level parallelism (ILP) are discussed along with the relationship between the nature of compiler support and type of processor architecture. Compilation techniques for exploiting loop and task level parallelism on shared-memory multiprocessors (SMPs) are summarized. Locality optimizations that must be used in conjunction with parallelization techniques for achieving high performance on machines with complex memory hierarchies are also discussed. Finally we provide an overview of compilation techniques for distributed memory machines that must perform partitioning of both code and data for parallel execution. Communication optimization and code generation issues that are unique to such compilers are also briefly discussed.

IEEE Transactions on Parallel and Distributed Systems | 2004

An experimental evaluation of data dependence analysis techniques

Kleanthis Psarris; Konstantinos Kyriakopoulos

Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. Data dependence information captures the essential ordering constraints of the statements in a program that need to be preserved in order to produce valid optimized and parallel code. Data dependence testing is very important for automatic parallelization, vectorization, and any other code transformation. In this paper, we examine the impact of data dependence analysis in practice. A number of data dependence tests have been proposed in the literature. In each test, there are different trade offs between accuracy and efficiency. We present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test, and the Omega test. We compare these tests in terms of data dependence accuracy, compilation efficiency, effectiveness in parallelization, and program execution performance. We analyze the reasons why a data dependence test can be inexact and we explain how the examined tests handle such cases. We run various experiments using the Perfect Club Benchmarks and the scientific library Lapack. We present the measured accuracy of each test and the reasons for any approximation. We compare these tests in terms of efficiency and we analyze the trade offs between accuracy and efficiency. We also determine the impact of each data dependence test on the total compilation time. Finally, we measure the number of loops parallelized by each test and we compare the execution performance of each benchmark on a multiprocessor. Our results indicate that the Omega test is more accurate, but also very inefficient in the cases where the other two tests are inaccurate. In general, the cost of the Omega test is high and uses a significant percentage of the total compilation time. Furthermore, the difference in accuracy of the Omega test over the Banerjee test and the l-Test does not improve parallelization and program execution performance.

international conference on supercomputing | 2003

The impact of data dependence analysis on compilation and program parallelization

Kleanthis Psarris; Konstantinos Kyriakopoulos

Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. Data dependence information captures the essential ordering constraints of the statements in a program that need to be preserved in order to produce valid optimized and parallel code. Data dependence testing is very important for automatic parallelization, vectorization and any other code transformation. In this paper we examine the impact of data dependence analysis in practice. A number of data dependence tests have been proposed in the literature. In each test there are different tradeoffs between accuracy and efficiency. We present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test and the Omega test. We compare these tests in terms of data dependence accuracy, compilation efficiency, effectiveness in parallelization and program execution performance. We analyze the reasons why a data dependence test can be inexact and we explain how the examined tests handle such cases. We run various experiments using the Perfect Club Benchmarks and the scientific library Lapack. We present the measured accuracy of each test and the reasons for any approximation. We compare these tests in terms of efficiency and we analyze the tradeoffs between accuracy and efficiency. We also determine the impact of each data dependence test on the total compilation time. Finally, we measure the number of loops parallelized by each test and we compare the execution performance of each benchmark on a multiprocessor. Our results indicate that the Omega test is more accurate, but also very inefficient in the cases where the other two tests are inaccurate. In general the cost of the Omega test is high and a significant percentage of the total compilation time. Furthermore, the difference in accuracy of the Omega test over the Banerjee test and the I-Test does not improve parallelization and program execution performance.

international conference on parallel architectures and compilation techniques | 1999

Data dependence testing in practice

Kleanthis Psarris; Konstantinos Kyriakopoulos

Data dependence analysis is a fundamental step in an optimizing compiler. The results of the analysis enable the compiler to identify code fragments that can be executed in parallel. A number of data dependence tests have been proposed in the literature. In each test there are different tradeoffs between accuracy and efficiency. In this paper we present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test and the Omega test. We compare these tests in terms of accuracy and efficiency. We run various experiments using the Perfect Club Benchmarks and the scientific libraries Eispack, Linpack and Lapack. Several observations and conclusions are derived from the experimental results, which are displayed and analyzed in this paper.

Journal of Parallel and Distributed Computing | 1996

The Banerjee-Wolfe and GCD Tests on Exact Data Dependence Information

Kleanthis Psarris

The GCD test and the Banerjee?Wolfe test are the two tests traditionally used to determine statement data dependence, subject to direction vectors, in automatic vectorization/parallelization of loops. In an earlier study, a sufficient condition for the accuracy of the Banerjee?Wolfe test was stated and proved. In that work, we only considered the case of general data dependence, i.e., the case of data dependence without direction vector information. In this paper, we extend the previous result to the case of data dependence subject to an arbitrary direction vector. We also state and prove a sufficient condition for the accuracy of a combination of the GCD and Banerjee?Wolfe tests. Furthermore, we show that the sufficient conditions, for the accuracy of the Banerjee?Wolfe test and the accuracy of a combination of the GCD and Banerjee?Wolfe tests are necessary conditions as well. Finally, we demonstrate how these results can be used in actual practice to obtain exact data dependence information.

IEEE Transactions on Parallel and Distributed Systems | 2012

Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction

Yi-Gang Tai; Chia-Tien Dan Lo; Kleanthis Psarris

Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its execution. We implement the design on an FPGA and compare its results to other methods.

parallel computing | 2002

Program analysis techniques for transforming programs for parallel execution

Kleanthis Psarris

Abstract In a multiple processor system, computer programs have to be redesigned to efficiently use the parallel processors and deliver higher performance. One major approach is automatic detection of parallelism, in which existing conventional sequential programs are translated into parallel programs, in order to benefit from the presence of multiple processors. Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. The results of the analysis enable the compiler to identify code fragments that can be executed in parallel. The proposed dependence analysis techniques fall into two different categories: either efficient and approximate tests or exact but exponential. In this paper, we show that exact data dependence information can be computed efficiently in practice. The Banerjee inequality and the GCD test are the two tests traditionally used to determine statement data dependence in automatic parallelization of loops. These tests are approximate in the sense that they are necessary but not sufficient conditions for data dependence. In an earlier work we formally studied the accuracy of the Banerjee and GCD tests and derived a set of conditions that can be tested along with the Banerjee inequality and the GCD test to obtain exact data dependence information. In this work, we perform an empirical study to explain and demonstrate the accuracy of the Banerjee and GCD tests in actual practice. Our experiments indicate that exact data dependence information can be computed in linear time in practice.

Journal of Parallel and Distributed Computing | 1996

Program Repartitioning on Varying Communication Cost Parallel Architectures

Santosh Pande; Kleanthis Psarris

In an earlier work, aThreshold Scheduling Algorithmwas proposed to schedule the functional (DAG) parallelism in a program on distributed memory systems. In this work, we address the issue of adapting the schedule for a set of distributed memory architectures with the same computation costs but higher communication costs. We introduce a new concept ofdominant edgesof a schedule to denote those edges which dictate the schedule time of the destination nodes due to the changes in their communication costs. Using this concept, we derive the conditions under which schedule on the whole or at least part of the graph can be reused for a different architecture keeping the cost of program repartitioning and rescheduling to a minimum. We demonstrate the practical significance of the method by incorporating it in the compiler backend for targeting Sisal (Streams and Iterations in a Single Assignment Language) on a family of Intel i860 architectures, Gamma, Delta, and Paragon, which vary in their communication costs. It is shown that almost 30 to 65% of the schedule can be reused unchanged, thereby avoiding program repartitioning to a large degree. The remainder of the schedule can beregeneratedthrough a linear algorithm at run time.

acm symposium on applied computing | 2008

Hardware implementation for network intrusion detection rules with regular expression support

Chia-Tien Dan Lo; Yi-Gang Tai; Kleanthis Psarris

Signature-based network intrusion detection systems (NIDSs), such as Snort and Bro, rely on a rule database that describes traffic patterns for known attacks. They examine each packets flowing through a network segment and report suspicious packets to assure security. An attack signature may be represented in terms of fields in a packet such as source/destination IP addresses, source/destination ports, protocols, specific contents in payload, etc. Typically, a Perl Compatible Regular Expression (PCRE) is used to describe a specific content in the payload which may identify an attack. Our study shows that over 60% of the execution time in an NIDS is found to perform string comparisons against a signature database of over 5,950 tokens and over 1,763 PCREs. This paper proposes to extend a bit-parallel algorithm to support multi-byte processing and PCRE. This design takes a segment of bytes from the payload of a packet and detects all possible tokens including those crossing text segment boundaries. A tool is designed to generate VHDL code from a rule set automatically. Performance results are reported.

Explore More