Is this you? Create Your Porfile

Patrick Crowley

Washington University in St. Louis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Patrick Crowley is active.

Explore More

Publication

Featured researches published by Patrick Crowley.

acm special interest group on data communication | 2014

Named data networking

Lixia Zhang; Alexander Afanasyev; Jeffrey A Burke; Van Jacobson; Kimberly C. Claffy; Patrick Crowley; Christos Papadopoulos; Lan Wang; Beichuan Zhang

Named Data Networking (NDN) is one of five projects funded by the U.S. National Science Foundation under its Future Internet Architecture Program. NDN has its roots in an earlier project, Content-Centric Networking (CCN), which Van Jacobson first publicly presented in 2006. The NDN project investigates Jacobsons proposed evolution from todays host-centric network architecture (IP) to a data-centric network architecture (NDN). This conceptually simple shift has far-reaching implications for how we design, develop, deploy, and use networks and applications. We describe the motivation and vision of this new architecture, and its basic components and operations. We also provide a snapshot of its current design, development status, and research challenges. More information about the project, including prototype implementations, publications, and annual reports, is available on named-data.net.

acm special interest group on data communication | 2006

Algorithms to accelerate multiple regular expressions matching for deep packet inspection

Sailesh Kumar; Sarang Dharmapurikar; Fang Yu; Patrick Crowley; Jonathan S. Turner

There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. owever, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application.In this paper, we introduce a new representation for regular expressions, called the Delayed Input DFA (D2FA), which substantially reduces space equirements as compared to a DFA. A D2FA is constructed by transforming a DFA via incrementally replacing several transitions of the automaton with a single default transition. Our approach dramatically reduces the number of distinct transitions between states. For a collection of regular expressions drawn from current commercial and academic systems, a D2FA representation reduces transitions by more than 95%. Given the substantially reduced space equirements, we describe an efficient architecture that can perform deep packet inspection at multi-gigabit rates. Our architecture uses multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput. Our architecture can provide ostffective packet content scanning at OC-192 rates with memory requirements that are consistent with current ASIC technology.

computing frontiers | 2006

Dynamic thread assignment on heterogeneous multiprocessor architectures

Michela Becchi; Patrick Crowley

In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distinct threads differ, but each thread may also present diversity in its performance and resource usage over time. A heterogeneous chip multiprocessor (CMP) architecture consists of processor cores and caches of varying size and complexity. Prior work has shown that heterogeneous CMPs can meet the needs of a multi-programmed computing environment better than a homogeneous CMP system. In fact, the use of a combination of cores with different caches and instruction issue widths better accommodates threads with different computational requirements.A central issue in the design and use of heterogeneous systems is to determine an assignment of tasks to processors which better exploits the hardware resources in order to improve performance. In this paper we argue that the benefits of heterogeneous CMPs are bolstered by the usage of a dynamic assignment policy, i.e., a runtime mechanism which observes the behavior of the running threads and exploits thread migration between the cores. We validate our analysis by means of simulation. Specifically, our model assumes a combination of Alpha EV5 and Alpha EV6 processors and of integer and floating point programs from the SPEC2000 benchmark suite. We show that a dynamic assignment can outperform a static one by 20% to 40% on average and by as much as 80% in extreme cases, depending on the degree of multithreading simulated.

acm special interest group on data communication | 2007

Supercharging planetlab: a high performance, multi-application, overlay network platform

Jonathan S. Turner; Patrick Crowley; John D. DeHart; Amy Freestone; Brandon Heller; Fred Kuhns; Sailesh Kumar; John W. Lockwood; Jing Lu; Michael Wilson; Charles Wiseman; David M. Zar

In recent years, overlay networks have become an important vehicle for delivering Internet applications. Overlay network nodes are typically implemented using general purpose servers or clusters. We investigate the performance benefits of more integrated architectures, combining general-purpose servers with high performance Network Processor (NP) subsystems. We focus on PlanetLab as our experimental context and report on the design and evaluation of an experimental PlanetLab platform capable of much higher levels of performance than typical system configurations. To make it easier for users to port applications, the system supports a fast path/slow path application structure that facilitates the mapping of the most performance-critical parts of an application onto an NP subsystem, while allowing the more complex control and exception-handling to be implemented within the programmer-friendly environment provided by conventional servers. We report on implementations of two sample applications, an IPv4 router, and a forwarding application for the Internet Indirection Infrastructure. We demonstrate an 80x improvement in packet processing rates and comparable reductions in latency.

conference on emerging network experiment and technology | 2007

A hybrid finite automaton for practical deep packet inspection

Michela Becchi; Patrick Crowley

Deterministic finite automata (DFAs) are widely used to perform regular expression matching in linear time. Several techniques have been proposed to compress DFAs in order to reduce memory requirements. Unfortunately, many real-world IDS regular expressions include complex terms that result in an exponential increase in number of DFA states. Since all recent proposals use an initial DFA as a starting-point, they cannot be used as comprehensive regular expression representations in an IDS. In this work we propose a hybrid automaton which addresses this issue by combining the benefits of deterministic and non-deterministic finite automata. We test our proposal on Snort rule-sets and we validate it on real traffic traces. Finally, we address and analyze the worst case behavior of our scheme and compare it to traditional ones.

architectures for networking and communications systems | 2007

An improved algorithm to accelerate regular expression evaluation

Michela Becchi; Patrick Crowley

Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While deterministic finite automata (DFAs) allow this operation to be performed in linear time, they may exhibit prohibitive memory requirements. In [9], Kumar et al. propose Delayed Input DFAs (D2FAs), which provide a trade-off between the memory requirements of the compressed DFA and the number of states visited for each character processed, which corresponds directly to the memory bandwidth required to evaluate regular expressions. In this paper we introduce a general compression technique that results in at most 2N state traversals when processing a string of length N. In comparison to the D2FA approach, our technique achieves comparable levels of compression, with lower provable bounds on memory bandwidth (or greater compression for a given bandwidth bound). Moreover, our proposed algorithm has lower complexity, is suitable for scenarios where a compressed DFA needs to be dynamically built or updated, and fosters locality in the traversal process. Finally, we also describe a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.

international symposium on computer architecture | 1998

Execution characteristics of desktop applications on Windows NT

Dennis Lee; Patrick Crowley; Jean-Loup Baer; Thomas E. Anderson; Brian N. Bershad

This paper examines the performance of desktop applications running on the Microsoft Windows NT operating system on Intel x86 processors, and contrasts these applications to the programs in the integer SPEC95 benchmark suite. We present measurements of basic instruction set and program characteristics, and detailed simulation results of the way these programs use the memory system and processor branch architecture. We show that the desktop applications have similar characteristics to the integer SPEC95 benchmarks for many of these metrics. However, compared to the integer SPEC95 applications, desktop applications have larger instruction working sets, execute instructions in a greater number of unique functions, cross DLL boundaries frequently, and execute a greater number of indirect calls.

international conference on supercomputing | 2000

Characterizing processor architectures for programmable network interfaces

Patrick Crowley; Marc E. Fluczynski; Jean-Loup Baer; Brian N. Bershad

The rapid advancements of networking technology have boosted potential bandwidth to the point that the cabling is no longer the bottleneck. Rather, the bottlenecks lie at the crossing points, the nodes of the network, where data traffic is intercepted or forwarded. As a result, there has been tremendous interest in speeding those nodes, making the equipment run faster by means of specialized chips to handle data trafficking. The Network Processor is the blanket name thrown over such chips in their varied forms. To date, no performance data exist to aid in the decision of what processor architecture to use in next generation network processor. Our goal is to remedy this situation. In this study, we characterize both the application workloads that network processors need to support as well as emerging applications that we anticipate may be supported in the future. Then, we consider the performance of three sample benchmarks drawn from these workloads on several state-of-the-art processor architectures, including: an aggressive, out-of-order, speculative super-scalar processor, a fine-grained multithreaded processor, a single chip multiprocessor, and a simultaneous multithreaded processor (SMT). The network interface environment is simulated in detail, and our results indicate that SMT is the architecture best suited to this environment.

international conference on computer communications | 2008

Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking

Sailesh Kumar; Jonathan S. Turner; Patrick Crowley

Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, but also induces non- determinism in the performance. Due to this non-determinism, the performance of these hash tables degrades sharply in the multi-threaded network processor based environments, where a collection of threads perform the hashing operations in a loosely synchronized manner. In such systems, it is critical to keep the hash operations more deterministic. A recent series of papers have been proposed, which employs a compact on-chip memory to enable deterministic and fast hash queries. While effective, these schemes require substantial on- chip memory, roughly 10-bits for every entry in the hash table. This limits their general usability; specifically in the network processor context, where on-chip resources are scarce. In this paper, we propose a novel hash table construction called Peacock hash, which reduces the on-chip memory by more than 10-folds while keeping a high degree of determinism in performance. This significantly reduced on-chip memory not only makes Peacock hashing much more appealing for the general use but also makes it an attractive choice for the implementation of a hash hardware accelerator on a network processor.

architectures for networking and communications systems | 2008

Efficient regular expression evaluation: theory to practice

Michela Becchi; Patrick Crowley

Several algorithms and techniques have been proposed recently to accelerate regular expression matching and enable deep packet inspection at line rate. This work aims to provide a comprehensive practical evaluation of existing techniques, extending them and analyzing their compatibility. The study focuses on two hardware architectures: memory-based ASICs and FPGAs.

Explore More