Polyvios Pratikakis
University of Maryland, College Park
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Polyvios Pratikakis.
programming language design and implementation | 2006
Polyvios Pratikakis; Jeffrey S. Foster; Michael Hicks
One common technique for preventing data races in multi-threaded programs is to ensure that all accesses to shared locations are consistently protected by a lock. We present a tool called LOCKSMITH for detecting data races in C programs by looking for violations of this pattern. We call the relationship between locks and the locations they protect consistent correlation, and the core of our technique is a novel constraint-based analysis that infers consistent correlation context-sensitively, using the results to check that locations are properly guarded by locks. We present the core of our algorithm for a simple formal language λ> which we have proven sound, and discuss how we scale it up to an algorithm that aims to be sound for all of C. We develop several techniques to improve the precision and performance of the analysis, including a sharing analysis for inferring thread locality; existential quantification for modeling locks in data structures; and heuristics for modeling unsafe features of C such as type casts. When applied to several benchmarks, including multi-threaded servers and Linux device drivers, LOCKSMITH found several races while producing a modest number of false alarm.
symposium on principles of programming languages | 2008
Iulian Neamtiu; Michael Hicks; Jeffrey S. Foster; Polyvios Pratikakis
This paper presents a generalization of standard effect systems that we call contextual effects. A traditional effect system computes the effect of an expression e. Our system additionally computes the effects of the computational context in which e occurs. More specifically, we computethe effect of the computation that has already occurred(the prior effect) and the effect of the computation yet to take place (the future effect). Contextual effects are useful when the past or future computation of the program is relevant at various program points. We present two substantial examples. First, we show how prior and future effects can be used to enforce transactional version consistency(TVC), a novel correctness property for dynamic software updates. TV Censures that programmer-designated transactional code blocks appear to execute entirely at the same code version, even if a dynamic update occurs in the middle of the block. Second, we show how future effects can be used in the analysis of multi-threaded programs to find thread-shared locations. This is an essential step in applications such as data race detection.
conference on object-oriented programming systems, languages, and applications | 2004
Polyvios Pratikakis; Jaime Spacco; Michael Hicks
A <i>proxy</i> object is a surrogate or placeholder that controls access to another target object. Proxies can be used to support distributed programming, lazy or parallel evaluation, access control, and other simple forms of behavioral reflection. However, <i>wrapper proxies</i> (like <i>futures</i> or <i>suspensions</i> for yet-to-be-computed results) can require significant code changes to be used in statically-typed languages, while proxies more generally can inadvertently violate assumptions of transparency, resulting in subtle bugs. To solve these problems, we have designed and implemented a simple framework for proxy programming that employs a static analysis based on qualifier inference, but with additional novelties. Code for using wrapper proxies is automatically introduced via a classfile-to-classfile transformation, and potential violations of transparency are signaled to the programmer. We have formalized our analysis and proven it sound. Our framework has a variety of applications, including support for asynchronous method calls returning futures. Experimental results demonstrate the benefits of our framework: programmers are relieved of managing and/or checking proxy usage, analysis times are reasonably fast, overheads introduced by added dynamic checks are negligible, and performance improvements can be significant. For example, changing two lines in a simple RMI-based peer-to-peer application and then using our framework resulted in a large performance gain.
ACM Transactions on Programming Languages and Systems | 2011
Polyvios Pratikakis; Jeffrey S. Foster; Michael Hicks
Locksmith is a static analysis tool for automatically detecting data races in C programs. In this article, we describe each of Locksmiths component analyses precisely, and present systematic measurements that isolate interesting trade-offs between precision and efficiency in each analysis. Using a benchmark suite comprising stand-alone applications and Linux device drivers totaling more than 200,000 lines of code, we found that a simple no-worklist strategy yielded the most efficient interprocedural dataflow analysis; that our sharing analysis was able to determine that most locations are thread-local, and therefore need not be protected by locks; that modeling C structs and void pointers precisely is key to both precision and efficiency; and that context sensitivity yields a much more precise analysis, though with decreased scalability. Put together, our results illuminate some of the key engineering challenges in building Locksmith and data race detection analyses in particular, and constraint-based program analyses in general.
acm sigplan symposium on principles and practice of parallel programming | 2012
George Tzenakis; Angelos Papatriantafyllou; John Kesapides; Polyvios Pratikakis; Hans Vandierendonck; Dimitrios S. Nikolopoulos
Reasoning about synchronization, ordering and conflicting memory accesses makes parallel programming difficult, error-prone and hard to test, debug and maintain. Task-parallel programming models such as OpenMP, Cilk and Sequoia offer a more structured way of expressing parallelism than threads, but still require the programmer to manually find and enforce any ordering or memory dependencies among tasks. Programming models with implicit parallelism such as SvS, OoOJava, or StarSs lift this limitation by automatically inferring parallelism and dependencies, requiring the programmer to describe the memory footprint of each task. Current limitations of these systems require the programmer to restrict task footprints into either whole and isolated program objects, one-dimensional array ranges, or static compile-time regions; all producing over-approximations and false dependencies that reduce the available parallelism in the program. This paper presents BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range or multidimensional array tile. BDDT uses a block-based dependence analysis with arbitrary granularity, making it easier to apply to
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness | 2011
Polyvios Pratikakis; Hans Vandierendonck; Spyros Lyberis; Dimitrios S. Nikolopoulos
The currently dominant programming models to write software for multicore processors use threads that run over shared memory. However, as the core count increases, cache coherency protocols get very complex and ineffective, and maintaining a shared memory abstraction becomes expensive and impractical. Moreover, writing multithreaded programs is notoriously difficult, as the programmer needs to reason about all the possible thread interleavings and interactions, including the myriad of implicit, non-obvious, and often unpredictable thread interactions through shared memory. Overall, as processors get more cores and parallel software becomes mainstream, the shared memory model reaches its limits regarding ease of programming and efficiency. This position paper presents two ideas aiming to solve the problem. First, we restrict the way the programmer expresses parallelism: The program is a collection of possibly recursive tasks, where each task is atomic and cannot communicate with any other task during its execution. Second, we relax the requirement for coherent shared memory: Each task defines its memory footprint, and is guaranteed to have exclusive access to that memory during its execution. Using this model, we can then define a runtime system that transparently performs the data transfers required among cores without cache coherency, and also produces a deterministic execution of the program, provably equivalent to its sequential elision.
static analysis symposium | 2006
Polyvios Pratikakis; Jeffrey S. Foster; Michael Hicks
In programming languages, existential quantification is useful for describing relationships among members of a structured type. For example, we may have a list in which there exists some mutual exclusion lock l in each list element such that l protects the data stored in that element. With this information, a static analysis can reason about the relationship between locks and locations in the list even when the precise identity of the lock and/or location is unknown. To facilitate the construction of such static analyses, this paper presents a context-sensitive label flow analysis algorithm with support for existential quantification. Label flow analysis is a core part of many static analysis systems. Following Rehof et al, we use context-free language (CFL) reachability to develop an efficient O(n3) label flow inference algorithm. We prove the algorithm sound by reducing its derivations to those in a system based on polymorphically-constrained types, in the style of Mossin. We have implemented a variant of our analysis as part of a data race detection tool for C programs.
APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299 | 2013
Georgios Tzenakis; Angelos Papatriantafyllou; Hans Vandierendonck; Polyvios Pratikakis; Dimitrios S. Nikolopoulos
We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without having to restructure object or array allocation, and provides flexibility in array layouts and tile dimensions. We evaluate BDDT using a representative set of benchmarks, and we compare it to SMPSs the equivalent runtime system in StarSs and OpenMP. BDDT performs comparable to or better than SMPSs and is able to cope with task granularity as much as one order of magnitude finer than SMPSs. Compared to OpenMP, BDDT performs up to 3.9× better for benchmarks that benefit from dynamic dependence analysis. BDDT provides additional data annotations to bypass dependence analysis. Using these annotations, BDDT outperforms OpenMP also in benchmarks where dependence analysis does not discover additional parallelism, thanks to a more efficient implementation of the runtime system.
international symposium on memory management | 2012
Spyros Lyberis; Polyvios Pratikakis; Dimitrios S. Nikolopoulos; Martin Schulz; Todd Gamblin; Bronis R. de Supinski
Constantly increasing hardware parallelism poses more and more challenges to programmers and language designers. One approach to harness the massive parallelism is to move to task-based programming models that rely on runtime systems for dependency analysis and scheduling. Such models generally benefit from the existence of a global address space. This paper presents the parallel memory allocator of the Myrmics runtime system, in which multiple allocator instances organized in a tree hierarchy cooperate to implement a global address space with dynamic region support on distributed memory machines. The Myrmics hierarchical memory allocator is step towards improved productivity and performance in parallel programming. Productivity is improved through the use of dynamic regions in a global address space, which provide a convenient shared memory abstraction for dynamic and irregular data structures. Performance is improved through scaling on manycore systems without system-wide cache coherency. We evaluate the stand-alone allocator on an MPI-based x86 cluster and find that it scales well for up to 512 worker cores, while it can outperform Unified Parallel C by a factor of 3.7-10.7x.
Proceedings of the 20th European MPI Users' Group Meeting on | 2013
Christi Symeonidou; Polyvios Pratikakis; Angelos Bilas; Dimitrios S. Nikolopoulos
We present DRASync, a region-based allocator that implements a global address space abstraction for MPI programs with pointer-based data structures. The main features of DRASync are: (a) it amortizes communication among nodes to allow efficient parallel allocation in a global address space; (b) it takes advantage of bulk deallocation and good locality with pointer-based data structures. (c) it supports ownership semantics of regions by nodes akin to reader-writer locks, which makes for a high-level, intuitive synchronization tool in MPI programs, without sacrificing message-passing performance. We evaluate DRASync against a state-of-the-art distributed allocator and find that it produces comparable performance while offering a higher level abstraction to programmers.