Charles Edwin Killian

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Charles Edwin Killian is active.

Explore More

Publication

Featured researches published by Charles Edwin Killian.

programming language design and implementation | 2007

Mace: language support for building distributed systems

Charles Edwin Killian; James W. Anderson; Ryan Braud; Ranjit Jhala; Amin Vahdat

Building distributed systems is particularly difficult because of the asynchronous, heterogeneous, and failure-prone environment where these systemsmust run. Tools for building distributed systems must strike a compromise between reducing programmer effort and increasing system efficiency. We present Mace, a C++ language extension and source-to-source compiler that translates a concise but expressive distributed system specification into a C++ implementation. Mace overcomes the limitations of low-level languages by providing a unified framework for networking and event handling, and the limitations of high-level languages by allowing programmers to write program components in a controlled and structured manner in C++. By imposing structure and restrictions on how applications can be written, Mace supports debugging at a higher level, including support for efficient model checking and causal-path debugging. Because Mace programs compile to C++, programmers can use existing C++ tools, including optimizers, profilers, and debuggers to analyze their systems.

ACM Transactions on Computer Systems | 2008

High-bandwidth data dissemination for large-scale distributed systems

Dejan Kostic; Alex C. Snoeren; Amin Vahdat; Ryan Braud; Charles Edwin Killian; James W. Anderson; Jeannie R. Albrecht; Adolfo Rodriguez; Erik Vandekieft

This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree. To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.

foundations of software engineering | 2010

Finding latent performance bugs in systems implementations

Charles Edwin Killian; Karthik Nagaraj; Salman Pervez; Ryan Braud; James W. Anderson; Ranjit Jhala

Robust distributed systems commonly employ high-level recovery mechanisms enabling the system to recover from a wide variety of problematic environmental conditions such as node failures, packet drops and link disconnections. Unfortunately, these recovery mechanisms also effectively mask additional serious design and implementation errors, disguising them as latent performance bugs that severely degrade end-to-end system performance. These bugs typically go unnoticed due to the challenge of distinguishing between a bug and an intermittent environmental condition that must be tolerated by the system. We present techniques that can automatically pinpoint latent performance bugs in systems implementations, in the spirit of recent advances in model checking by systematic state space exploration. The techniques proceed by automating the process of conducting random simulations, identifying performance anomalies, and analyzing anomalous executions to pinpoint the circumstances leading to performance degradation. By focusing our implementation on the MACE toolkit, MACEPC can be used to test our implementations directly, without modification. We have applied MACEPC to five thoroughly tested and trusted distributed systems implementations. MACEPC was able to find significant, previously unknown, long-standing performance bugs in each of the systems, and led to fixes that significantly improved the end-to-end performance of the systems.

compiler construction | 2009

Live Debugging of Distributed Systems

Darren Dao; Jeannie R. Albrecht; Charles Edwin Killian; Amin Vahdat

Debugging distributed systems is challenging. Although incremental debugging during development finds some bugs, developers are rarely able to fully test their systems under realistic operating conditions prior to deployment. While deploying a system exposes it to realistic conditions, debugging requires the developer to: (i) detect a bug, (ii) gather the system state necessary for diagnosis, and (iii) sift through the gathered state to determine a root cause. In this paper, we present MaceODB, a tool to assist programmers with debugging deployed distributed systems. Programmers define a set of runtime properties for their system, which MaceODB checks for violations during execution. Once MaceODB detects a violation, it provides the programmer with the information to determine its root cause. We have been able to diagnose several non-trivial bugs in existing mature distributed systems using MaceODB; we discuss two of these bugs in this paper. Benchmarks indicate that the approach has low overhead and is suitable for in situ debugging of deployed systems.

symposium on cloud computing | 2013

EventWave: programming model and runtime support for tightly-coupled elastic cloud applications

Wei-Chiu Chuang; Bo Sang; Sunghwan Yoo; Rui Gu; Milind Kulkarni; Charles Edwin Killian

An attractive approach to leveraging the ability of cloud-computing platforms to provide resources on demand is to build elastic applications, which can dynamically scale up or down based on resource requirements. To ease the development of elastic applications, it is useful for programmers to write applications with simple sequential semantics, without considering elasticity, and rely on runtime support to provide that elasticity. While this approach has been useful in restricted domains, such as MapReduce, existing programming models for general distributed applications do not expose enough information about their inherent organization of state and computation to provide such transparent elasticity. We introduce EventWave, an event-driven programming model that allows developers to design elastic programs with inelastic semantics while naturally exposing isolated state and computation with programmatic parallelism. In addition, we describe the runtime mechanism which takes the exposed parallelism to provide elasticity. Finally, we evaluate our implementation through microbenchmarks and case studies to demonstrate that EventWave can provide efficient, scalable, transparent elasticity for applications run in the cloud.

communication systems and networks | 2012

Hierarchy-aware distributed overlays in data centers using DC2

Karthik Nagaraj; Hitesh Khandelwal; Charles Edwin Killian; Ramana Rao Kompella

Popular online services such as social networks, e-commerce and bidding are routinely hosted in large-scale data centers. Group communication systems (e.g., multicast) and distributed key-value stores are among some of the most essential building blocks for these services. Due to their scaling requirements, overlay networks such as distributed hash tables (DHTs) have been traditionally used in such systems. Modern hierarchical datacenter networks and global services running across datacenters pose unique challenges that traditional systems are ill-equipped to handle. For instance, the inherent multi-rooted tree topology design with oversubscription at the core translates into lesser bandwidth at the upper levels of the trees; traditional systems do not take this into consideration leading to a wastage of precious network resources. To solve this problem, we introduce a hierarchy-aware distributed overlay framework called DC2, for large scale and highly dynamic services. We build two applications-DC2-Multicast and DC2-Store-on top of DC2. In our experiments using a real prototype deployed over 700 nodes running over a Modelnet topology with 2 datacenters, we found that DC2-Multicast minimizes message latencies by several orders of magnitude, and reduces node and link stress by a factor of 2 to 3. We also find a reduction in object lookup latency by a factor of 8.

high performance distributed computing | 2011

InContext: simple parallelism for distributed applications

Sunghwan Yoo; Hyojeong Lee; Charles Edwin Killian; Milind Kulkarni

As networking services, such as DHTs, provide increasingly complex functionality, providing acceptable performance will require parallelizing their operations on individual nodes. Unfortunately, the event-driven style in which these applications have traditionally been written makes it difficult to reason about parallelism, and providing safe, efficient parallel implementations of distributed systems remains a challenge. In this paper, we introduce a declarative programming model based on contexts, which allows programmers to specify the sharing behavior of event handlers. Programs that adhere to the programming model can be safely parallelized according to an abstract execution model, with parallel behavior that is well-defined with respect to the expected sequential behavior. The declarative nature of the programming model allows conformance to be captured as a safety property that can be verified using a model checker. We develop a prototype implementation of our abstract execution model and show that distributed applications written in our programming model can be automatically and efficiently parallelized. To recover additional parallelism, we present an optimization to the implementation based on state snapshots that permits more events to proceed in parallel. We evaluate our prototype implementation through several case studies and demonstrate significant speedup over optimized sequential implementations.

international conference on distributed computing systems | 2014

Turret: A Platform for Automated Attack Finding in Unmodified Distributed System Implementations

Hyojeong Lee; Jeff Seibert; Endadul Hoque; Charles Edwin Killian; Cristina Nita-Rotaru

Security and performance are critical goals for distributed systems. The increased design complexity, incomplete expertise of developers, and limited functionality of existing testing tools often result in bugs and vulnerabilities that prevent implementations from achieving their design goals in practice. Many of these bugs, vulnerabilities, and misconfigurations manifest after the code has already been deployed making the debugging process difficult and costly. In this paper, we present Turret, a platform for automatically finding performance attacks in unmodified implementations of distributed systems. Turret does not require the user to provide any information about vulnerabilities and runs the implementation in the same operating system setup as the deployment, with an emulated network. Turret uses a new attack finding algorithm and several optimizations that allow it to find attacks in a matter of minutes. We ran Turret on 5 different distributed system implementations specifically designed to tolerate insider attacks, and found 30 performance attacks, 24 of which were not previously reported to the best of our knowledge.

Discrete Mathematics | 2004

Antipodal Gray codes

Charles Edwin Killian; Carla D. Savage

Abstract An n-bit Gray code is a circular listing of the 2n n-bit strings so that successive strings differ only in one bit position. An n-bit antipodal Gray code has the additional property that the complement of any string appears exactly n steps away in the list. The problem of determining for which values of n antipodal Gray codes can exist was posed by Hunter Snevily, who showed them to be possible for n=1,2,3, and 4. In this paper, we show they are not possible for odd n>3 or for n=6. However, we provide a recursive construction to prove existence when n is a power of 2. The question remains open for any even n>6 which is not a power of 2.

symposium on operating systems principles | 2005

Experiences with Pip: finding unexpected behavior in distributed systems

Patrick Reynolds; Janet L. Wiener; Jeffrey C. Mogul; Mehul A. Shah; Charles Edwin Killian; Amin Vahdat

Bugs in complex distributed systems are often hard to find. Many bugs reflect discrepancies between a systems behavior and the programmers assumptions about that behavior. Differences may be in correctness, in performance characteristics, or both. Our debugging framework, Pip, compares actual behavior with expected behavior and visualizes both. Pip consists of two tools to help reconcile assumptions and actual behavior: an automatic expectations checker and an interactive behavior-explorer GUI.

Explore More