Featured Researches

Operating Systems

Automatic Verification of Message-Based Device Drivers

We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluation shows that this methodology amplifies the power of existing verification tools in detecting driver bugs, making it possible to verify properties beyond the reach of traditional techniques.

Read more
Operating Systems

Avoiding Scalability Collapse by Restricting Concurrency

Saturated locks often degrade the performance of a multithreaded application, leading to a so-called scalability collapse problem. This problem arises when a growing number of threads circulating through a saturated lock causes the overall application performance to fade or even drop abruptly. This problem is particularly (but not solely) acute on oversubscribed systems (systems with more threads than available hardware cores). In this paper, we introduce GCR (generic concurrency restriction), a mechanism that aims to avoid the scalability collapse. GCR, designed as a generic, lock-agnostic wrapper, intercepts lock acquisition calls, and decides when threads would be allowed to proceed with the acquisition of the underlying lock. Furthermore, we present GCR-NUMA, a non-uniform memory access (NUMA)-aware extension of GCR, that strives to ensure that threads allowed to acquire the lock are those that run on the same socket. The extensive evaluation that includes more than two dozen locks, three machines and three benchmarks shows that GCR brings substantial speedup (in many cases, up to three orders of magnitude) in case of contention and growing thread counts, while introducing nearly negligible slowdown when the underlying lock is not contended. GCR-NUMA brings even larger performance gains starting at even lighter lock contention.

Read more
Operating Systems

Aware: Controlling App Access to I/O Devices on Mobile Platforms

Smartphones' cameras, microphones, and device displays enable users to capture and view memorable moments of their lives. However, adversaries can trick users into authorizing malicious apps that exploit weaknesses in current mobile platforms to misuse such on-board I/O devices to stealthily capture photos, videos, and screen content without the users' consent. Contemporary mobile operating systems fail to prevent such misuse of I/O devices by authorized apps due to lack of binding between users' interactions and accesses to I/O devices performed by these apps. In this paper, we propose Aware, a security framework for authorizing app requests to perform operations using I/O devices, which binds app requests with user intentions to make all uses of certain I/O devices explicit. We evaluate our defense mechanisms through laboratory-based experimentation and a user study, involving 74 human subjects, whose ability to identify undesired operations targeting I/O devices increased significantly. Without Aware, only 18% of the participants were able to identify attacks from tested RAT apps. Aware systematically blocks all the attacks in absence of user consent and supports users in identifying 82% of social-engineering attacks tested to hijack approved requests, including some more sophisticated forms of social engineering not yet present in available RATs. Aware introduces only 4.79% maximum performance overhead over operations targeting I/O devices. Aware shows that a combination of system defenses and user interface can significantly strengthen defenses for controlling the use of on-board I/O devices.

Read more
Operating Systems

BPF for storage: an exokernel-inspired approach

The overhead of the kernel storage path accounts for half of the access latency for new NVMe storage devices. We explore using BPF to reduce this overhead, by injecting user-defined functions deep in the kernel's I/O processing stack. When issuing a series of dependent I/O requests, this approach can increase IOPS by over 2.5 ? and cut latency by half, by bypassing kernel layers and avoiding user-kernel boundary crossings. However, we must avoid losing important properties when bypassing the file system and block layer such as the safety guarantees of the file system and translation between physical blocks addresses and file offsets. We sketch potential solutions to these problems, inspired by exokernel file systems from the late 90s, whose time, we believe, has finally come!

Read more
Operating Systems

BRAVO -- Biased Locking for Reader-Writer Locks

Designers of modern reader-writer locks confront a difficult trade-off related to reader scalability. Locks that have a compact memory representation for active readers will typically suffer under high intensity read-dominated workloads when the "reader indicator"' state is updated frequently by a diverse set of threads, causing cache invalidation and coherence traffic. Other designs, such as cohort reader-writer locks, use distributed reader indicators, one per NUMA node. This improves reader-reader scalability, but also increases the size of each lock instance. We propose a simple transformation BRAVO, that augments any existing reader-writer lock, adding just two integer fields to the lock instance. Readers make their presence known to writers by hashing their thread's identity with the lock address, forming an index into a visible readers table. Readers attempt to install the lock address into that element in the table, making their existence known to potential writers. All locks and threads in an address space can share the visible readers table. Updates by readers tend to be diffused over the table, resulting in a NUMA-friendly design. Crucially, readers of the same lock tend to write to different locations in the array, reducing coherence traffic. Specifically, BRAVO allows a simple compact lock to be augmented so as to provide scalable concurrent reading but with only a modest increase in footprint.

Read more
Operating Systems

Barrier Enabled IO Stack for Flash Storage

This work is dedicated to eliminating the overhead of guaranteeing the storage order in modern IO stack. The existing block device adopts prohibitively expensive resort in ensuring the storage order among write requests: interleaving successive write requests with transfer and flush. Exploiting the cache barrier command for the Flash storage, we overhaul the IO scheduler, the dispatch module and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application can be delivered to the storage. Key ingredients of Barrier Enabled IO stack are Epoch based IO scheduling, Order Preserving Dispatch, and Dual Mode Journaling. Barrier enabled IO stack successfully eliminates the root cause of excessive overhead in enforcing the storage order. Dual Mode Journaling in BarrierFS dedicates the separate threads to effectively decouple the control plane and data plane of the journal commit. We implement Barrier Enabled IO Stack in server as well as in mobile platform. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. Relaxing the durability of a transaction, SQLite performance and MySQL performance increases as much as by 73X and by 43X, respectively, in server storage.

Read more
Operating Systems

Blocking time under basic priority inheritance: Polynomial bound and exact computation

The Priority Inheritance Protocol (PIP) is arguably the best-known protocol for resource sharing under real-time constraints. Its importance in modern applications is undisputed. Nevertheless, because jobs may be blocked under PIP for a variety of reasons, determining a job's maximum blocking time could be difficult, and thus far no exact method has been proposed that does it. Existing analysis methods are inefficient, inaccurate, and of limited applicability. This article proposes a new characterization of the problem, thus allowing a polynomial method for bounding the blocking time, and an exact, optimally efficient method for blocking time computation under priority inheritance that have a general applicability.

Read more
Operating Systems

Boomerang: Real-Time I/O Meets Legacy Systems

This paper presents Boomerang, an I/O system that integrates a legacy non-real-time OS with one that is customized for timing-sensitive tasks. A relatively small RTOS benefits from the pre-existing libraries, drivers and services of the legacy system. Additionally, timing-critical tasks are isolated from less critical tasks by securely partitioning machine resources among the separate OSes. Boomerang guarantees end-to-end processing delays on input data that requires outputs to be generated within specific time bounds. We show how to construct composable task pipelines in Boomerang that combine functionality spanning a custom RTOS and a legacy Linux system. By dedicating time-critical I/O to the RTOS, we ensure that complementary services provided by Linux are sufficiently predictable to meet end-to-end service guarantees. While Boomerang benefits from spatial isolation, it also outperforms a standalone Linux system using deadline-based CPU reservations for pipeline tasks. We also show how Boomerang outperforms a virtualized system called ACRN, designed for automotive systems.

Read more
Operating Systems

Browsix: Bridging the Gap Between Unix and the Browser

Applications written to run on conventional operating systems typically depend on OS abstractions like processes, pipes, signals, sockets, and a shared file system. Porting these applications to the web currently requires extensive rewriting or hosting significant portions of code server-side because browsers present a nontraditional runtime environment that lacks OS functionality. This paper presents Browsix, a framework that bridges the considerable gap between conventional operating systems and the browser, enabling unmodified programs expecting a Unix-like environment to run directly in the browser. Browsix comprises two core parts: (1) a JavaScript-only system that makes core Unix features (including pipes, concurrent processes, signals, sockets, and a shared file system) available to web applications; and (2) extended JavaScript runtimes for C, C++, Go, and Node.js that support running programs written in these languages as processes in the browser. Browsix supports running a POSIX shell, making it straightforward to connect applications together via pipes. We illustrate Browsix's capabilities via case studies that demonstrate how it eases porting legacy applications to the browser and enables new functionality. We demonstrate a Browsix-enabled LaTeX editor that operates by executing unmodified versions of pdfLaTeX and BibTeX. This browser-only LaTeX editor can render documents in seconds, making it fast enough to be practical. We further demonstrate how Browsix lets us port a client-server application to run entirely in the browser for disconnected operation. Creating these applications required less than 50 lines of glue code and no code modifications, demonstrating how easily Browsix can be used to build sophisticated web applications from existing parts without modification.

Read more
Operating Systems

Building Resilient Cloud Over Unreliable Commodity Infrastructure

Cloud Computing has emerged as a successful computing paradigm for efficiently utilizing managed compute infrastructure such as high speed rack-mounted servers, connected with high speed networking, and reliable storage. Usually such infrastructure is dedicated, physically secured and has reliable power and networking infrastructure. However, much of our idle compute capacity is present in unmanaged infrastructure like idle desktops, lab machines, physically distant server machines, and laptops. We present a scheme to utilize this idle compute capacity on a best-effort basis and provide high availability even in face of failure of individual components or facilities. We run virtual machines on the commodity infrastructure and present a cloud interface to our end users. The primary challenge is to maintain availability in the presence of node failures, network failures, and power failures. We run multiple copies of a Virtual Machine (VM) redundantly on geographically dispersed physical machines to achieve availability. If one of the running copies of a VM fails, we seamlessly switchover to another running copy. We use Virtual Machine Record/Replay capability to implement this redundancy and switchover. In current progress, we have implemented VM Record/Replay for uniprocessor machines over Linux/KVM and are currently working on VM Record/Replay on shared-memory multiprocessor machines. We report initial experimental results based on our implementation.

Read more

Ready to get started?

Join us today