Gabriel Parmer
George Washington University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gabriel Parmer.
real-time systems symposium | 2008
Gabriel Parmer; Richard West
This paper presents the design of user-level scheduling hierarchies in the composite component-based system. The motivation for this is centered around the design of a system that is both dependable and predictable, and which is configurable to the needs of specific applications. Untrusted application developers can safely develop services and policies, that are isolated in protection domains outside the kernel. To ensure predictability, composite needs to enforce timing control over user-space services. Moreover, it must provide a means by which asynchronous events, such as interrupts, are handled in a timely manner without jeopardizing the system. Towards this end, we describe the features of composite that allow user-defined scheduling policies to be composed for the purposes of combined interrupt and task management. A significant challenge arises from the need to synchronize access to shared data structures (e.g., scheduling queues), without allowing untrusted code to disable interrupts or use atomic instructions that lock the memory bus. Additionally, efficient upcall mechanisms are needed to deliver asynchronous event notifications in accordance with policy-specific priorities, without undue recourse to schedulers. We show how these issues are addressed in Composite, by comparing several hierarchies of scheduling polices, to manage both tasks and the interrupts on which they depend. Studies show how it is possible to implement guaranteed differentiated services as part of the handling of I/O requests from a network device while avoiding livelock. Microbenchmarks indicate that the costs of implementing and invoking user-level schedulers in composite are on par with, or less than, those in other systems, with thread switches more than twice as fast as in Linux.
real time technology and applications symposium | 2011
Gabriel Parmer; Richard West
This paper presents HiRes, a system structured around predictable, hierarchical resource management (HRM). Applications and different subsystems use customized resource managers that control the allocation and usage of memory, CPU, and I/O. This increased resource management flexibility enables subsystems with different timing constraints to specialize resource management around meeting these requirements. In HiRes, subsystems delegate the management of resources to other subsystems, thus creating the resource management hierarchy. In delegating the control of resources, the subsystem focuses on providing isolation between competing subsystems. To make HRM both predictable and efficient, HiRes ensures that regardless of a subsystems depth in the hierarchy, the overheads of resource usage and control remain constant. In doing so, HiRes encourages HRM as a fundamental system design technique. Results show that HiRes has competitive performance with existing systems, and that HRM naturally provides both strong isolation guarantees, and flexible and efficient subsystem control over resources.
international conference on cluster computing | 2004
Xin Qi; Gabriel Parmer; Richard West
Cluster computing environments built from commodity hardware have provided a cost-effective solution for many scientific and high-performance applications. Likewise, middleware techniques have provided the basis for large-scale applications to communicate and exchange data across the various end-hosts in a distributed system. Unfortunately, middleware services are typically encapsulated in user-level address spaces that suffer from scheduling delays and communication overheads induced by the host kernel. For various high performance distributed computing applications such overheads are unacceptable. This work therefore addresses the problem of providing an efficient end-host architecture to support application-specific communication services at user-level, without the need to explicitly schedule such services or copy data via the kernel. We briefly describe a sandboxing mechanism that allows applications to configure and deploy services at user-level that may execute in the context of any address space. Using Linux as the basis for our approach, we focus specifically on the implementation of a user-space network protocol stack that avoids copying data via the kernel when communicating with the network interface. Our approach enables services to efficiently process and forward data via proxies, or intermediate hosts, in the communication path of high performance data streams. Unlike other user-level networking implementations, our method makes no special hardware requirements. Results show that we achieve a substantial increase in throughput, and a reduction in jitter, over comparable user-space communication methods.
embedded software | 2012
Gedare Bloom; Gabriel Parmer; Bhagirath Narahari; Rahul Simha
Hardware support can reduce the time spent operating on data structures by exploiting circuit-level parallelism. Such hardware data structures (HWDSs) can reduce the latency and jitter of data structure operations, which can benefit real-time systems by reducing worst-case execution times (WCETs). For example, a hardware priority queue (HWPQ) can enqueue and dequeue prioritized items in constant time with low variance; the best software implementations are in logarithmic-time asymptotic complexity for at least one of the enqueue or dequeue operations. The main problems with HWDSs are the limited size of hardware and the complexity of sharing it. In this paper we show that software support can help circumvent the size and sharing limitations of hardware so that applications can benefit from a HWDS. We evaluate our work by showing how the choice of software or hardware affects schedulability of task sets that use multiple priority queues of varying sizes. We model task behavior on two applications that are important in real-time and embedded domains: the grey-weighted distance transform for topology mapping and Dijkstras algorithm for GPS navigation. Our results indicate that HWDSs can reduce the WCET of applications even when a HWDS is shared by multiple data structures or when data structure sizes exceed HWDS size constraints.
real time technology and applications symposium | 2014
Qi Wang; Gabriel Parmer
With the increasing use of multi- and many-core processors in real-time and embedded systems, softwares ability to utilize those cores to increase system capability and functionality is important. Of particular interest is intra-task parallelism whereby a single task is able to harness the computational power of multiple cores to do processing of a complexity that is untenable on a single core. This paper introduces the design and implementation of FJOS, a system supporting predictable and efficient fork/join, intra-task parallelism. FJOS is implemented using abstractions that are close to the hardware, and decouples parallelism management, from thread coordination, yielding efficient fast-path operations. Compared to a traditional fork/join implementation, results show that FJOS has less overhead, is more scalable up to 40 cores, and can generally make better use of parallelism. We modify a response-time analysis to integrate system overheads to assess schedulability in a hard real-time environment, and design an effective algorithm for assigning task computation to cores. This assignment more than triples effective system utilization, and when implementation overheads are considered, FJOS maintains high system utilizations, thus providing a strong foundation for predictable, real-time intra-task parallelism.
IEEE Transactions on Software Engineering | 2012
Gabriel Parmer; Richard West
As software systems are becoming increasingly complex, the likelihood of faults and unexpected behaviors will naturally increase. Today, mobile devices to large-scale servers feature many millions of lines of code. Compile-time checks and offline verification methods are unlikely to capture all system states and control flow interactions of a running system. For this reason, many researchers have developed methods to contain faults at runtime by using software and hardware-based techniques to define protection domains. However, these approaches tend to impose isolation boundaries on software components that are static, and thus remain intact while the system is running. An unfortunate consequence of statically structured protection domains is that they may impose undue overhead on the communication between separate components. This paper proposes a new runtime technique that trades communication cost for fault isolation. We describe Mutable Protection Domains (MPDs) in the context of our Composite operating system. MPD dynamically adapts hardware isolation between interacting software components, depending on observed communication “hot-paths,” with the purpose of maximizing fault isolation where possible. In this sense, MPD naturally tends toward a system of maximal component isolation, while collapsing protection domains where costs are prohibitive. By increasing isolation for low-cost interacting components, MPD limits the scope of impact of future unexpected faults. We demonstrate the utility of MPD using a webserver, and identify different hot-paths for different workloads that dictate adaptations to system structure. Experiments show up to 40 percent improvement in throughput compared to a statically organized system, while maintaining high-fault isolation.
european conference on computer systems | 2016
Qi Wang; Timothy Stamler; Gabriel Parmer
As systems continue to increase the number of cores within cache coherency domains, traditional techniques for enabling parallel computation on data-structures are increasingly strained. A single contended cache-line bouncing between different caches can prohibit continued performance gains with additional cores. New abstractions and mechanisms are required to reassess how data-structure consistency can be provided, while maintaining stable per-core access latencies. This paper presents the Parallel Sections (ParSec) abstraction for mediating access to shared data-structures. Fundamental to the approach is a new form of scalable memory reclamation that leverages fast local access to real-time to globally order system events. This approach attempts to minimize coherency-traffic, while harnessing the benefit of shared read-mostly cache-lines. We show that the co-management of scalable memory reclamation, memory allocation, locking, and namespace management enables scalable system service implementation. We apply ParSec to both memcached, and virtual memory management in a microkernel, and find order-of magnitude performance increases on a four socket, 40 core machine, and 30x lower 99th percentile latencies for virtual memory management.
real time technology and applications symposium | 2015
Qi Wang; Yuxin Ren; Matt Scaperoth; Gabriel Parmer
Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing protocols. However, as the number of cores increase, even a single shared cache-line (e.g. for the lock) can cause significant interference. In this paper, we present a clean-slate design of the SPeCK kernel, the next generation of our COMPOSITE OS, that attempts to provide a strong version of scalable predictability - where predictability bounds made on a single core, remain constant with an increase in cores. Results show that, despite using a non-preemptive kernel, it has strong scalable predictability, low average-case overheads, and demonstrates better response-times than a state-of-the-art preemptive system.
IEEE Computer Architecture Letters | 2012
Jie Chen; Guru Venkataramani; Gabriel Parmer
Debugging an application for power has a wide array of benefits ranging from minimizing the thermal hotspots to reducing the likelihood of CPU malfunction. In this work, we justify the need for power debugging, and show that performance debugging of a parallel application does not automatically guarantee power balance across multiple cores. We perform experiments and show our results using two case study benchmarks, Volrend from Splash-2 and Bodytrack from Parsec-1.0.
real-time systems symposium | 2012
Qi Wang; Jiguo Song; Gabriel Parmer; Andrew Sweeney; Guru Venkataramani
In addition to predictability, both reliability and security are increasingly important for embedded systems. To limit the scope of errant behavior in open and mixed criticality systems, a common approach is to raise isolation barriers between software components. However, this decentralizes memory management across all system components. Memory is often cached and quickly accessible in each application. This paper introduces the TMEM system for increasing memory utilization while optimizing for application end-to-end constraints such as meeting deadlines. In addition to the traditional spatial multiplexing of memory, TMEM introduces the predictable temporal multiplexing of memory within caches in a system component, and memory scheduling to continually reallocate memory between components to best benefit the system. We find that TMEM is able to maintain the efficiency of caches, while also lowering both task tardiness and system memory requirements.