Featured Researches

Operating Systems

Efficient Kernel Object Management for Tiered Memory Systems with KLOC

Software-controlled heterogeneous memory systems have the potential to improve performance, efficiency, and cost tradeoffs in emerging systems. Delivering on this promise requires an efficient operating system (OS) mechanisms and policies for data management. Unfortunately, modern OSes do not support efficient tiering of data between heterogeneous memories. While this problem is known (and is being studied) for application-level data pages, the question of how best to tier OS kernel objects has largely been ignored. We show that careful kernel object management is vital to the performance of software-controlled tiered memory systems. We find that the state-of-the-art OS page management research leaves considerable performance on the table by overlooking how best to tier, migrate, and manage kernel objects like inodes, dentry caches, journal blocks, network socket buffers, etc., associated with the filesystem and networking stack. In response, we characterize hotness, reuse, and liveness properties of kernel objects to develop appropriate tiering/migration mechanisms and policies. We evaluate our proposal using a real-system emulation framework on large-scale workloads like RocksDB, Redis, Cassandra, and Spark and achieve 1.4X to 4X higher throughput compared to the prior art.

Read more
Operating Systems

Efficient Schedulability Test for Dynamic-Priority Scheduling of Mixed-Criticality Real-Time Systems

Systems in many safety-critical application domains are subject to certification requirements. In such a system, there are typically different applications providing functionalities that have varying degrees of criticality. Consequently, the certification requirements for functionalities at these different criticality levels are also varying, with very high levels of assurance required for a highly critical functionality, whereas relatively low levels of assurance required for a less critical functionality. Considering the timing assurance given to various applications in the form of guaranteed budgets within deadlines, a theory of real-time scheduling for such multi-criticality systems has been under development in the recent past. In particular, an algorithm called Earliest Deadline First with Virtual Deadlines (EDF-VD) has shown a lot of promise for systems with two criticality levels, especially in terms of practical performance demonstrated through experiment results. In this paper we design a new schedulability test for EDF-VD that extend these performance benefits to multi-criticality systems. We propose a new test based on demand bound functions and also present a novel virtual deadline assignment strategy. Through extensive experiments we show that the proposed technique significantly outperforms existing strategies for a variety of generic real-time systems.

Read more
Operating Systems

Efficient Synchronization Primitives for GPUs

In this paper, we revisit the design of synchronization primitives---specifically barriers, mutexes, and semaphores---and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well as a method of sleeping on the GPU, by running a set of memory-system benchmarks on two of the most common GPUs in use, the Tesla- and Fermi-class GPUs from NVIDIA. From our results we define higher-level principles that are valid for generic many-core processors, the most important of which is to limit the number of atomic accesses required for a synchronization operation because atomic accesses are slower than regular memory accesses. We use the results of the benchmarks to critique existing synchronization algorithms and guide our new implementations, and then define an abstraction of GPUs to classify any GPU based on the behavior of the memory system. We use this abstraction to create suitable implementations of the primitives specifically targeting the GPU, and analyze the performance of these algorithms on Tesla and Fermi. We then predict performance on future GPUs based on characteristics of the abstraction. We also examine the roles of spin waiting and sleep waiting in each primitive and how their performance varies based on the machine abstraction, then give a set of guidelines for when each strategy is useful based on the characteristics of the GPU and expected contention.

Read more
Operating Systems

Efficient System-Enforced Deterministic Parallelism

Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations. Determinator's kernel provides only single-threaded, "shared-nothing" address spaces interacting via deterministic synchronization. An untrusted user-level runtime uses distributed computing techniques to emulate familiar abstractions such as Unix processes, file systems, and shared memory multithreading. The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster. Coarse-grained parallel benchmarks perform and scale comparably to - sometimes better than - conventional systems, though determinism is costly for fine-grained parallel applications.

Read more
Operating Systems

Efficient and Playful Tools to Teach Unix to New Students

Teaching Unix to new students is a common tasks in many higher schools. This paper presents an approach to such course where the students progress autonomously with the help of the teacher. The traditional textbook is complemented with a wiki, and the main thread of the course is a game, in the form of a treasure hunt. The course finishes with a lab exam, where students have to perform practical manipulations similar to the ones performed during the treasure hunt. The exam is graded fully automatically. This paper discusses the motivations and advantages of the approach, and gives an overall view of the tools we developed. The tools are available from the web, and open-source, hence re-usable outside the Ensimag.

Read more
Operating Systems

Efficient, Dynamic Multi-tenant Edge Computation in EdgeOS

In the future, computing will be immersed in the world around us -- from augmented reality to autonomous vehicles to the Internet of Things. Many of these smart devices will offer services that respond in real time to their physical surroundings, requiring complex processing with strict performance guarantees. Edge clouds promise a pervasive computational infrastructure a short network hop away from end devices, but today's operating systems are a poor fit to meet the goals of scalable isolation, dense multi-tenancy, and predictable performance required by these emerging applications. In this paper we present EdgeOS, a micro-kernel based operating system that meets these goals by blending recent advances in real-time systems and network function virtualization. EdgeOS introduces a Featherweight Process model that offers lightweight isolation and supports extreme scalability even under high churn. Our architecture provides efficient communication mechanisms, and low-overhead per-client isolation. To achieve high performance networking, EdgeOS employs kernel bypass paired with the isolation properties of Featherweight Processes. We have evaluated our EdgeOS prototype for running high scale network middleboxes using the Click software router and endpoint applications using memcached. EdgeOS reduces startup latency by 170X compared to Linux processes and over five orders of magnitude compared to containers, while providing three orders of magnitude latency improvement when running 300 to 1000 edge-cloud memcached instances on one server.

Read more
Operating Systems

Elevating commodity storage with the SALSA host translation layer

To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.

Read more
Operating Systems

Enabling Failure-resilient Intermittent Systems Without Runtime Checkpointing

Self-powered intermittent systems typically adopt runtime checkpointing as a means to accumulate computation progress across power cycles and recover system status from power failures. However, existing approaches based on the checkpointing paradigm normally require system suspension and/or logging at runtime. This paper presents a design which overcomes the drawbacks of checkpointing-based approaches, to enable failure-resilient intermittent systems. Our design allows accumulative execution and instant system recovery under frequent power failures while enforcing the serializability of concurrent task execution to improve computation progress and ensuring data consistency without system suspension during runtime, by leveraging the characteristics of data accessed in hybrid memory. We integrated the design into FreeRTOS running on a Texas Instruments device. Experimental results show that our design can still accumulate progress when the power source is too weak for checkpointing-based approaches to make progress, and improves the computation progress by up to 43% under a relatively strong power source, while reducing the recovery time by at least 90%.

Read more
Operating Systems

End-User Effects of Microreboots in Three-Tiered Internet Systems

Microreboots restart fine-grained components of software systems "with a clean slate," and only take a fraction of the time needed for full system reboot. Microreboots provide an application-generic recovery technique for Internet services, which can be supported entirely in middleware and requires no changes to the applications or any a priori knowledge of application semantics. This paper investigates the effect of microreboots on end-users of an eBay-like online auction application; we find that microreboots are nearly as effective as full reboots, but are significantly less disruptive in terms of downtime and lost work. In our experiments, microreboots reduced the number of failed user requests by 65% and the perceived downtime by 78% compared to a server process restart. We also show how to replace user-visible transient failures with transparent call-retry, at the cost of a slight increase in end-user-visible latency during recovery. Due to their low cost, microreboots can be used aggressively, even when their necessity is less than certain, hence adding to the reduced recovery time a reduction in the fault detection time, which further improves availability.

Read more
Operating Systems

Energy Minimization for Parallel Real-Time Systems with Malleable Jobs and Homogeneous Frequencies

In this work, we investigate the potential utility of parallelization for meeting real-time constraints and minimizing energy. We consider malleable Gang scheduling of implicit-deadline sporadic tasks upon multiprocessors. We first show the non-necessity of dynamic voltage/frequency regarding optimality of our scheduling problem. We adapt the canonical schedule for DVFS multiprocessor platforms and propose a polynomial-time optimal processor/frequency-selection algorithm. We evaluate the performance of our algorithm via simulations using parameters obtained from a hardware testbed implementation. Our algorithm has up to a 60 watt decrease in power consumption over the optimal non-parallel approach.

Read more

Ready to get started?

Join us today