Robert Soulé | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert Soulé is active.

Explore More

Publication

Featured researches published by Robert Soulé.

ACM Computing Surveys | 2014

A catalog of stream processing optimizations

Martin Hirzel; Robert Soulé; Scott Schneider; Bugra Gedik; Robert Grimm

Various research communities have independently arrived at stream processing as a programming model for efficient and parallel computing. These communities include digital signal processing, databases, operating systems, and complex event processing. Since each community faces applications with challenging performance requirements, each of them has developed some of the same optimizations, but often with conflicting terminology and unstated assumptions. This article presents a survey of optimizations for stream processing. It is aimed both at users who need to understand and guide the system’s optimizer and at implementers who need to make engineering tradeoffs. To consolidate terminology, this article is organized as a catalog, in a style similar to catalogs of design patterns or refactorings. To make assumptions explicit and help understand tradeoffs, each optimization is presented with its safety constraints (when does it preserve correctness?) and a profitability experiment (when does it improve performance?). We hope that this survey will help future streaming system builders to stand on the shoulders of giants from not just their own community.

conference on emerging network experiment and technology | 2014

Merlin: A Language for Provisioning Network Resources

Robert Soulé; Shrutarshi Basu; Parisa Jalili Marandi; Fernando Pedone; Robert Kleinberg; Emin Gün Sirer; Nate Foster

This paper presents Merlin, a new framework for managing resources in software-defined networks. With Merlin, administrators express high-level policies using programs in a declarative language. The language includes logical predicates to identify sets of packets, regular expressions to encode forwarding paths, and arithmetic formulas to specify bandwidth constraints. The Merlin compiler maps these policies into a constraint problem that determines bandwidth allocations using parameterizable heuristics. It then generates code that can be executed on the network elements to enforce the policies. To allow network tenants to dynamically adapt policies to their needs, Merlin provides mechanisms for delegating control of sub-policies and for verifying that modifications made to sub-policies do not violate global constraints. Experiments demonstrate the expressiveness and effectiveness of Merlin on real-world topologies and applications. Overall, Merlin simplifies network administration by providing high-level abstractions for specifying network policies that provision network resources.

hot topics in networks | 2013

Managing the network with Merlin

Robert Soulé; Shrutarshi Basu; Robert Kleinberg; Emin Gün Sirer; Nate Foster

This paper presents the Merlin network management framework. With Merlin, administrators express network policy using programs in a declarative language based on logical predicates and regular expressions. The Merlin compiler automatically partitions these programs into components that can be placed on a variety of devices including switches, middleboxes, and end hosts. It uses a constraint solver and parameterizable heuristics to allocate resources such as paths and bandwidth. To ease the administration of federated networks, Merlin provides mechanisms for delegating management of sub-policies to tenants, along with tools for verifying that delegated sub-policies do not violate global constraints. Overall, Merlin simplifies the task of network administration by providing high-level abstractions for directly specifying network policy.

european symposium on programming | 2010

A universal calculus for stream processing languages

Robert Soulé; Martin Hirzel; Robert Grimm; Bugra Gedik; Henrique Andrade; Vibhore Kumar; Kun-Lung Wu

Stream processing applications such as algorithmic trading, MPEG processing, and web content analysis are ubiquitous and essential to business and entertainment. Language designers have developed numerous domain-specific languages that are both tailored to the needs of their applications, and optimized for performance on their particular target platforms. Unfortunately, the goals of generality and performance are frequently at odds, and prior work on the formal semantics of stream processing languages does not capture the details necessary for reasoning about implementations. This paper presents Brooklet, a core calculus for stream processing that allows us to reason about how to map languages to platforms and how to optimize stream programs. We translate from three representative languages, CQL, StreamIt, and Sawzall, to Brooklet, and show that the translations are correct. We formalize three popular and vital optimizations, data-parallel computation, operator fusion, and operator re-ordering, and show under which conditions they are correct. Language designers can use Brooklet to specify exactly how new features or languages behave. Language implementors can use Brooklet to show exactly under which circumstances new optimizations are correct. In ongoing work, we are developing an intermediate language for streaming that is based on Brooklet. We are implementing our intermediate language on System S, IBMs high-performance streaming middleware.

acm special interest group on data communication | 2015

NetPaxos: consensus at network speed

Huynh Tu Dang; Daniele Sciascia; Marco Canini; Fernando Pedone; Robert Soulé

This paper explores the possibility of implementing the widely deployed Paxos consensus protocol in network devices. We present two different approaches: (i) a detailed design description for implementing the full Paxos logic in SDN switches, which identifies a sufficient set of required OpenFlow extensions; and (ii) an alternative, optimistic protocol which can be implemented without changes to the OpenFlow API, but relies on assumptions about how the network orders messages. Although neither of these protocols can be fully implemented without changes to the underlying switch firmware, we argue that such changes are feasible in existing hardware. Moreover, we present an evaluation that suggests that moving Paxos logic into the network would yield significant performance benefits for distributed applications.

acm special interest group on data communication | 2016

Paxos Made Switch-y

Huynh Tu Dang; Marco Canini; Fernando Pedone; Robert Soulé

The Paxos protocol is the foundation for building many fault-tolerant distributed systems and services. This paper posits that there are significant performance benefits to be gained by implementing Paxos logic in network devices. Until recently, the notion of a switch-based implementation of Paxos would be a daydream. However, new flexible hardware is on the horizon that will provide customizable packet processing pipelines needed to implement Paxos. While this new hardware is still not readily available, several vendors and consortia have made the programming languages that target these devices public. This paper describes an implementation of Paxos in one of those languages, P4. Implementing Paxos provides a critical use case for P4, and will help drive the requirements for data plane languages in general. In the long term, we imagine that consensus could someday be offered as a network service, just as point-to-point communication is provided today.

distributed event-based systems | 2013

Dynamic expressivity with static optimization for streaming languages

Robert Soulé; Michael I. Gordon; Saman P. Amarasinghe; Robert Grimm; Martin Hirzel

Developers increasingly use streaming languages to write applications that process large volumes of data with high throughput. Unfortunately, when picking which streaming language to use, they face a difficult choice. On the one hand, dynamically scheduled languages allow developers to write a wider range of applications, but cannot take advantage of many crucial optimizations. On the other hand, statically scheduled languages are extremely performant, but have difficulty expressing many important streaming applications. This paper presents the design of a hybrid scheduler for stream processing languages. The compiler partitions the streaming application into coarse-grained subgraphs separated by dynamic rate boundaries. It then applies static optimizations to those subgraphs. We have implemented this scheduler as an extension to the StreamIt compiler. To evaluate its performance, we compare it to three scheduling techniques used by dynamic systems (OS thread, demand, and no-op) on a combination of micro-benchmarks and real-world inspired synthetic benchmarks. Our scheduler not only allows the previously static version of StreamIt to run dynamic rate applications, but it outperforms the three dynamic alternatives. This demonstrates that our scheduler strikes the right balance between expressivity and performance for stream processing languages.

symposium on operating systems principles | 2017

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

Xin Jin; Xiaozhou Li; Haoyu Zhang; Robert Soulé; Jeongkeun Lee; Nate Foster; Changhoon Kim; Ion Stoica

We present NetCache, a new key-value store architecture that leverages the power and flexibility of new-generation programmable switches to handle queries on hot items and balance the load across storage nodes. NetCache provides high aggregate throughput and low latency even under highly-skewed and rapidly-changing workloads. The core of NetCache is a packet-processing pipeline that exploits the capabilities of modern programmable switch ASICs to efficiently detect, index, cache and serve hot key-value items in the switch data plane. Additionally, our solution guarantees cache coherence with minimal overhead. We implement a NetCache prototype on Barefoot Tofino switches and commodity servers and demonstrate that a single switch can process 2+ billion queries per second for 64K items with 16-byte keys and 128-byte values, while only consuming a small portion of its hardware resources. To the best of our knowledge, this is the first time that a sophisticated application-level functionality, such as in-network caching, has been shown to run at line rate on programmable switches. Furthermore, we show that NetCache improves the throughput by 3-10x and reduces the latency of up to 40% of queries by 50%, for high-performance, in-memory key-value stores.

very large data bases | 2010

From a stream of relational queries to distributed stream processing

Qiong Zou; Huayong Wang; Robert Soulé; Martin Hirzel; Henrique Andrade; Bugra Gedik; Kun-Lung Wu

Applications from several domains are now being written to process live data originating from hardware and software-based streaming sources. Many of these applications have been written relying solely on database and data warehouse technologies, despite their lack of need for transactional support and ACID properties. In several extreme high-load cases, this approach does not scale to the processing speeds that these applications demand. In this paper we demonstrate an application acceleration approach whereby a regular ODBC-based application is converted into a true streaming application with minimal disruption from a software engineering standpoint. We showcase our approach on three real-world applications. We experimentally demonstrate the substantial performance improvements that can be observed when contrasting the accelerated implementation with the original database-oriented implementation.

symposium on sdn research | 2017

P4FPGA: A Rapid Prototyping Framework for P4

Han Wang; Robert Soulé; Huynh Tu Dang; Ki Suh Lee; Vishal Shrivastav; Nate Foster; Hakim Weatherspoon

This paper presents P4FPGA, a new tool for developing and evaluating data plane applications. P4FPGA is an open-source compiler and runtime. The compiler extends the P4.org reference compiler with a custom backend that generates FPGA code. P4FPGA supports different architecture configurations, depending on the needs of the particular application. We have benchmarked several representative P4 programs, and our experiments show that code generated by P4FPGA runs at line-rate at all packet sizes with latencies comparable to commercial ASICs. By combining high-level programming abstractions offered by P4 with a flexible and powerful hardware target, P4FPGA allows developers to rapidly prototype and deploy new data plane applications.

Explore More