Theodore S. Papatheodorou

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Theodore S. Papatheodorou is active.

Explore More

Publication

Featured researches published by Theodore S. Papatheodorou.

conference on high performance computing (supercomputing) | 2000

Is Data Distribution Necessary in OpenMP

Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou; Constantine D. Polychronopoulos; Jesús Labarta; Eduard Ayguadé; eacute

This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution of pages incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtime environment can use page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results support the effectiveness of these mechanisms and provide a proof of concept that there is no need to introduce data distribution directives in OpenMP and warrant the portability of the programming model.

international conference on parallel processing | 2000

User-level dynamic page migration for multiprogrammed shared-memory multiprocessors

Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou; Constantine D. Polychronopoulos; Jesús Labarta; Eduard Ayguadé

This paper presents algorithms for improving the performance of parallel programs on multiprogrammed shared-memory NUMA multiprocessors, via the use of user-level dynamic page migration. The idea that drives the algorithms is that a page migration engine can perform accurate and timely page migrations in a multiprogrammed system if it can correlate page reference information with scheduling information obtained from the operating system. The necessary page migrations can be performed as a response to scheduling events that break the implicit association between threads and their memory affinity sets. We present two algorithms that use feedback from the kernel scheduler to aggressively migrate pages upon thread migrations. The first algorithm exploits the iterative nature of parallel programs, while the second targets generic codes without making assumptions on their structure. Performance evaluation on an SGI Origin2000 shows that our page migration algorithms provide substantial improvements in throughput of up to 264% compared to the native IRIX 6.5.5 page placement and migration schemes.

international conference on parallel processing | 2003

Scheduling algorithms with bus bandwidth considerations for SMPs

Christos D. Antonopoulos; Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou

The bus that connects processors to memory is known to be a major architectural bottleneck in SMPs. However, both software and scheduling policies for these systems generally focus on memory hierarchy optimizations and do not address the bus bandwidth limitations directly. We first present experimental results which indicate that bus saturation can cause an up to almost three-fold slowdown to applications. Motivated by these results, we introduce two scheduling policies that take into account the bus bandwidth consumption of applications. The necessary information is provided by performance monitoring counters which are present in all modern processors. Our algorithms organize jobs so that processes with high-bandwidth and low-bandwidth demands are co-scheduled to improve bus bandwidth utilization without saturating the bus. We found that our scheduler is effective with applications of varying bandwidth requirements, from very low to close to the limit of saturation. We also tuned our scheduler for robustness in the presence of bursts of high bus bandwidth consumption from individual jobs. The new scheduling policies improve system throughput by up to 68% (26% in average) in comparison with the standard Linux scheduler

international conference on supercomputing | 2000

A case for user-level dynamic page migration

Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou; Constantine D. Polychronopoulos; Jesús Labarta; Eduard Ayguadé

This paper presents user-level dynamic page migration, a runtime technique which transparently enables parallel programs to tune their memory performance on distributed shared memory multiprocessors, with feedback obtained from dynamic monitoring of memory activity. Our technique exploits the iterative nature of parallel programs and information available to the program both at compile time and at runtime in order to improve the accuracy and the timeliness of page migrations, as well as amortize better the overhead, compared to page migration engines implemented in the operating system. We present an adaptive page migration algorithm based on a competitive and a predictive criterion. The competitive criterion is used to correct poor page placement decisions of the operating system, while the predictive criterion makes the algorithm responsive to scheduling events that necessitate immediate page migrations, such as preemptions and migrations of threads. We also present a new technique for preventing page pingpong and a mechanism for monitoring the performance of page migration algorithms at runtime and tuning their sensitive parameters accordingly. Our experimental evidence on a SGI Origin2000 shows that unmodified OpenMP codes linked with our runtime system for dynamic page migration are effectively immune to the page placement strategy of the operating system and the associated problems with data locality. Furthermore, our runtime system achieves solid performance improvements compared to the IRIX 6.5.5 page migration engine, for single parallel OpenMP codes and multiprogrammed workloads.

D-lib Magazine | 2010

The Use of Metadata for Educational Resources in Digital Repositories: Practices and Perspectives

Dimitrios A. Koutsomitropoulos; Andreas D. Alexopoulos; Georgia D. Solomou; Theodore S. Papatheodorou

The wide availability of educational resources is a common objective for universities, libraries, archives and other knowledge-intensive institutions. Although generic metadata specifications (such as Dublin Core) seem to fulfill the need for documenting web-distributed objects, educational resources demand a more specialized treatment and characterization. In this article we focus on the use of learning-object specific metadata in digital repositories, as they are primarily incarnated in the LOM (learning object metadata) standard. We review relevant standards and practices, especially noting the importance of application profiling paradigms. A widespread institutional repository platform is offered by DSpace. We discuss our implementation of LOM metadata in this system as well as our interoperability extensions. To this end, we propose a potential LOM to DC mapping that we have put into use in DSpace. Finally, we introduce our implementation of an LOM ontology, as a basis for delivering Semantic Web services over educational resources

Lecture Notes in Computer Science | 2000

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors

Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou; Constantine D. Polychronopoulos; Jesús Labarta; Eduard Ayguadé

We present the design and implementation of UPMLIB, a runtime system that provides transparent facilities for dynamically tuning the memory performance of OpenMP programs on scalable shared-memory multiprocessors with hardware cache-coherence. UPMLIB integrates information from the compiler and the operating system, to implement algorithms that perform accurate and timely page migrations. The algorithms and the associated mechanisms correlate memory reference information with the semantics of parallel programs and scheduling events that break the association between threads and data for which threads have memory affinity at runtime. Our experimental evidence shows that UPMLIB makes OpenMP programs immune to the page placement strategy of the operating system, thus obviating the need for introducing data placement directives in OpenMP. Furthermore, UPMlib provides solid improvements of throughput in multiprogrammed execution environments.

international conference on supercomputing | 1998

Kernel-level scheduling for the nano-threads programming model

Eleftherios D. Polychronopoulos; Xavier Martorell; Dimitrios S. Nikolopoulos; Jesús Labarta; Theodore S. Papatheodorou; Nacho Navarro

Multiprocessor systems are increasingly becoming the systems of choice for low and high-end servers, running such diverse tasks as number crunching, large-scale simulations, data base engines and world wide web server applications. With such diverse workloads, system utilization and throughput, as well as execution time become important performance metrics. In this paper we present efficient kernel scheduling policies and propose a new kernel-user interface aiming at supporting efficient parallel execution in diverse workload environments. Our approach relies on support for user level threads which are used to exploit parallelism within applications, and a two-level scheduling policy which coordinates the number of resources allocated by the kernel with the number of threads generated by each application. We compare our scheduling policies with the native gang scheduling policy of the IRIX 6.4 operating system on a Silicon Graphics Origin2000. Our experimental results show substantial performance gains in terms of overall workload execution times, individual application execution times, and cache performance.

International Journal of Parallel Programming | 2001

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

Dimitrios S. Nikolopoulos; Theodore S. Papatheodorou

This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system. In contrast with previous related studies that emphasized the relative performance of synchronization algorithms, this paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments. From the architectural perspective, the paper identifies the implications of directory-based cache coherence on the latency and scalability of synchronization instructions and examines if and how can simple hardware that accelerates these instructions be leveraged to reduce synchronization latency. From the operating systems perspective, the paper evaluates in a unified framework, user-level, kernel-level and hybrid algorithms for implementing scalable synchronization in multiprogrammed execution environments. Along with visiting the aforementioned issues, the paper contributes a new methodology for implementing fast synchronization algorithms on ccNUMA multiprocessors. The relevant experiments are conducted on the SGI Origin2000, a popular commercial ccNUMA multiprocessor.

parallel computing | 1989

Parallel algorithms and architectures for multisplitting iterative methods

Theodore S. Papatheodorou; Yiannis G. Saridakis

Abstract The Multi-Splitting (MS) iterative method, designed exclusively for multiprocessor environments, is considered for the solution of large systems of linear equations. A general parallel algorithm is devised and implemented on a modular two-level parallel architecture, which utilizes the systolic arrays as building blocks, to demonstrate the point iteration. A particular three-term member of the MS family is applied, for the parallel block iterative solution, on the Poissons equation discretized by the collocation method.

Mathematics and Computers in Simulation | 1989

Parallel (‖) ELLPACK: an expert system for parallel processing of partial differential equations

Elias N. Houstis; John R. Rice; Theodore S. Papatheodorou

We are developing an expert system environment for solving elliptic partial differential equations (PDEs) defined on two and three dimensional domains on MIMD type parallel machines. According to its design objectives, it will provide a uniform programming environment for implementing parallel MIMD PDE solution solvers, an automatic partitioning and allocation of the PDE computation, a very high level problem specification language, an interactive high level environment for grid selection, a domain partitioning and mapping facility, a uniform environment for obtaining software engineering measurements and a graphical display of solution output.

Explore More