Peter A. Buhr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter A. Buhr is active.

Explore More

Publication

Featured researches published by Peter A. Buhr.

european conference on computer systems | 2007

Comparing the performance of web server architectures

David Pariag; Tim Brecht; Ashif S. Harji; Peter A. Buhr; Amol Shukla; David R. Cheriton

In this paper, we extensively tune and then compare the performance of web servers based on three different server architectures. The μserver utilizes an event-driven architecture, Knot uses the highly-efficient Capriccio thread library to implement a thread-per-connection model, and WatPipe uses a hybrid of events and threads to implement a pipeline-based server that is similar in spirit to a staged event-driven architecture (SEDA) server like Haboob. We describe modifications made to the Capriccio thread library to use Linuxs zero-copy sendfile interface. We then introduce the SY mmetric Multi-Processor Event Driven (SYMPED) architecture in which relatively minor modifications are made to a single process event-driven (SPED) server (the μserver) to allow it to continue processing requests in the presence of blocking due to disk accesses. Finally, we describe our C++ implementation of WatPipe, which although utilizing a pipeline-based architecture, excludes the dynamic controls over event queues and thread pools used in SEDA. When comparing the performance of these three server architectures on the workload used in our study, we arrive at different conclusions than previous studies. In spite of recent improvements to threading libraries and our further improvements to Capriccio and Knot, both the event-based μserver and pipeline-based Wat-Pipe server provide better throughput (by about 18%). We also observe that when using blocking sockets to send data to clients, the performance obtained with some architectures is quite good and in one case is noticeably better than when using non-blocking sockets.

IEEE Transactions on Software Engineering | 2000

Advanced exception handling mechanisms

Peter A. Buhr; W.Y.R. Mok

It is no longer possible to consider exception handling as a secondary issue in language design, or even worse, a mechanism added after the fact via a library approach. Exception handling is a primary feature in language design and must be integrated with other major features, including advanced control flow, objects, coroutines, concurrency, real-time, and polymorphism. Integration is crucial as there are both obvious and subtle interactions between exception handling and other language features. Unfortunately, many exception handling mechanisms work only with a subset of the features and in the sequential domain. A framework for a comprehensive, easy to use, and extensible exception handling mechanism is presented for a concurrent, object-oriented environment. The environment includes language constructs with separate execution stacks, e.g. coroutines and tasks, so the exception environment is significantly more complex than the normal single-stack situation. The pros and cons of various exception features are examined, along with feature interaction with other language mechanisms. Both exception termination and resumption models are examined in this environment, and previous criticisms of the resumption model, a feature commonly missing in modern languages, are addressed.

Software - Practice and Experience | 1992

mC++: concurrency in the object-oriented language C++

Peter A. Buhr; Glen Ditchfield; Richard A. Stroobosscher; B. M. Younger; C. Robert Zarnke

We present a design, including its motivation, for introducing concurrency into C++. The design work is based on a set of requirements and elementary execution properties that generate a corresponding set of programming language constructs needed to express concurrency. The new constructs continue to support object‐oriented facilities such as inheritance and code reuse. Features that allow flexibility in accepting and subsequently postponing servicing of requests are provided. Currently, a major portion of the design is implemented, supporting concurrent programs on shared‐memory uniprocessor and mulitprocessor computers.

ACM Computing Surveys | 1995

Monitor classification

Peter A. Buhr; Michel Fortier; Michael H. Coffin

One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for systems with shared memory, is the monitor. Over the past twenty years many kinds of monitors have been proposed and implemented, and many modern programming languages provide some form of monitor for concurrency control. This paper presents a taxonomy of monitors that encompasses all the extant monitors and suggests others not found in the literature or in existing programming languages. It discusses the semantics and performance of the various kinds of monitors suggested by the taxonomy, and it discusses programming techniques suitable to each.

Software - Practice and Experience | 1990

The μsystem: Providing light-weight concurrency on shared-memory multiprocessor computers running UNIX

Peter A. Buhr; Richard A. Stroobosscher

This paper presents a description of the μSystem, which is a library of C routines that provide light‐weight concurrency on uniprocessor and multiprocessor computers running the UNIX operating system. A discussion of the run‐time structure of a μSystem program is given, which includes the following concepts: coroutines, tasks, virtual processors and clusters. Next the routines that implement these concepts are discussed in detail. Finally, some performance figures from the μSystem are given and discussed, followed by a comparison of the μSystem with other similar systems.

measurement and modeling of computer systems | 1996

KDB: a multi-threaded debugger for multi-threaded applications

Peter A. Buhr; Martin Karsten; Jun Shih

Concurrent programs contain both sequential and concurrent errors. While deadlock and race conditions are unique to concunent programs, there also exist algorithmic design errors, such as inhibiting concurrency, which are unknown in the sequential domain. Recently, there has been a large effort in debugging race conditions [16], both statically [8] and dynamically [7], and to a lesser extent, deadlock [13]. Our experience shows that concurrent errors occur with diminishing frequency in the order: traditional sequential errors, algorithmic design errors, deadlock, race conditions. However, the difficulty in determining and fixing these errors grows exponentially from sequential errors to race conditions. Our experience also shows that the frequency of deadlock and race-conditions diminishes significantly when high-level concurrency constructs (e.g., task, monitor, actor, etc.) are used, versus thread and lock programming. We believe the best way to improve concurrent debugging capabilities and significantly reduce debugging time is to use high-level concurrency constructs with a symbolic debugger that truly understands it is debugging a concurrent program coupled with a cooperative concurrent run-time system that actively participates in the debugging process. Additionally, the debugger must provide independent and concurrent access to every thread of control in the target program. Such a debugger handles a large set of errors in concurrent programs? leaving esoteric errors to specialized debugging tools. Ultimately, a debugger and specialized tools must complement each other. Our experience comes from designing high-level concurrent extensions for C+I-~ called IN++ [5], using IL++ to build itself, a debugger and visualization toolkit [4], a database toolkit [3], and using @I-+ to teach concurrency to undergraduate students. &++ is a shared-memory userlevel thread library that runs on symmetric multiprocessor architectures (e.g., SUN, DEC, SGI, Sequent); user-level threads are executed by multiple kernel threads associated with shared memory, which provides true parallelism when appropriate hardware is available. Furthermore, p+ provides several high-level concurrency constructs, e.g. ~ coroutines, monitors and tasks, for composing an application; hence, programmers do not work at the level of threads

real time technology and applications symposium | 2005

Solution space for fixed-priority with preemption threshold

Jiongxiong Chen; Ashif S. Harji; Peter A. Buhr

This paper reaffirms that fixed-priority with preemption threshold (FPPT) is an important form of real-time scheduling algorithm, which fills the gap between fixed-priority preemptive (FPP) and fixed-priority nonpreemptive (FPNP). When a task set is schedulable by FPPT, there may exist multiple valid preemption threshold assignments, which provide useful scheduling options. All valid assignments form a solution space that is delimited by a minimal and maximal assignment. A mechanism is presented to generate part of the valid assignments once the minimal and maximal assignments are known. The known algorithm to compute the minimal assignment starts at FPP, and the known algorithm to compute the maximal assignment starts from any valid assignment. This paper presents algorithms to compute the minimal and maximal assignments starting from FPNP, and the proofs for the correctness of these algorithms are also presented.

asia pacific workshop on systems | 2011

Our troubles with Linux and why you should care

Ashif S. Harji; Peter A. Buhr; Tim Brecht

Linux provides researchers with a full-fledged operating system that is widely used and open source. However, due to its complexity and rapid development, care should be exercised when using Linux for performance experiments, especially in systems research. The size and continual evolution of the Linux code-base makes it difficult to understand, and as a result, decipher and explain the reasons for performance improvements. In addition, the rapid kernel development cycle means that experimental results can be viewed as out of date, or meaningless, very quickly. We demonstrate that this viewpoint is incorrect because kernel changes can and have introduced both bugs and performance degradations. This paper describes some of our experiences using the Linux kernel as a platform for conducting performance evaluations and some performance regressions we have found. Our results show, these performance regressions can be serious (e.g., repeating identical experiments results in large variability in results) and long lived despite having a large negative effect on performance (one problem has existed for more than 3 years). Based on these experiences, we argue: it is sometimes reasonable to use an older kernel version, experimental results need careful analysis to explain why a performance effect occurs, and publishing papers validating prior research is essential.

acm international conference on systems and storage | 2012

Comparing high-performance multi-core web-server architectures

Ashif S. Harji; Peter A. Buhr; Tim Brecht

In this paper, we study how web-server architecture and implementation affect performance when trying to obtain high throughput on a 4-core system servicing static content. We focus on static content as a growing numbers of servers are dedicated to workloads comprised of songs, photos, software, and videos chunked for HTTP downloads. Two representative static-content workloads are used: one serviced entirely from the file-system cache and the other requires significant disk I/O. We focus on 4-core systems as: 1) it is a widely used configurations in data-centers and cloud services, 2) recent studies show large SMP systems may operate more efficiently when subdivided into smaller subsystems, 3) understanding performance with a smaller number of cores is essential before scaling to a larger number of cores, 4) and 4-cores may be sufficient for many web servers. Two high-performance web-servers, with event-driven (μserver) and pipelined (WatPipe) architectures, are developed and tested for a multi-core environment. By carefully implementing and tuning the two web-servers, both achieve performance comparable to running independent copies of the server on each processor (N-copy). The new web-servers achieve high throughput (4,000--6,000 Mbps) with 40,000 to 70,000 connects/second; performance in all cases is better than nginx, lighttpd, and Apache. We conclude that implementation and tuning of web servers is perhaps more important than server architecture. We also find it is better to use blocking rather than non-blocking calls to sendfile, when the requested files do not all fit in the file-system cache.

Sigplan Notices | 1985

A case for teaching multi-exit loops to beginning programmers

Peter A. Buhr

While programming using the WHILE and REPEAT constructs, as in Pascal, is well established and taught regularly, programming using a LOOP construct with multiple exits is not. I feel that this much more general construct is a more useful programming construct and that it should be taught in introductory programming courses.

Explore More