Is this you? Create Your Porfile

Emery D. Berger

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emery D. Berger is active.

Explore More

Publication

Featured researches published by Emery D. Berger.

architectural support for programming languages and operating systems | 2000

Hoard: a scalable memory allocator for multithreaded applications

Emery D. Berger; Kathryn S. McKinley; Robert D. Blumofe; Paul R. Wilson

Parallel, multithreaded C and C++ programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a dramatic increase in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This increase in memory consumption can range from a factor of P (the number of processors) to unbounded memory consumption.This paper introduces Hoard, a fast, highly scalable allocator that largely avoids false sharing and is memory efficient. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

programming language design and implementation | 2006

DieHard: probabilistic memory safety for unsafe languages

Emery D. Berger; Benjamin G. Zorn

Applications written in unsafe languages like C and C++ are vulnerable to memory errors such as buffer overflows, dangling pointers, and reads of uninitialized data. Such errors can lead to program crashes, security vulnerabilities, and unpredictable behavior. We present DieHard, a runtime system that tolerates these errors while probabilistically maintaining soundness. DieHard uses randomization and replication to achieve probabilistic memory safety by approximating an infinite-sized heap. DieHards memory manager randomizes the location of objects in a heap that is at least twice as large as required. This algorithm prevents heap corruption and provides a probabilistic guarantee of avoiding memory errors. For additional safety, DieHard can operate in a replicated mode where multiple replicas of the same application are run simultaneously. By initializing each replica with a different random seed and requiring agreement on output, the replicated version of Die-Hard increases the likelihood of correct execution because errors are unlikely to have the same effect across all replicas. We present analytical and experimental results that show DieHards resilience to a wide range of memory errors, including a heap-based buffer overflow in an actual application.

conference on object-oriented programming systems, languages, and applications | 2009

Grace: safe multithreaded programming for C/C++

Emery D. Berger; Ting Yang; Tongping Liu; Gene Novark

The shift from single to multiple core architectures means that programmers must write concurrent, multithreaded programs in order to increase application performance. Unfortunately, multithreaded applications are susceptible to numerous errors, including deadlocks, race conditions, atomicity violations, and order violations. These errors are notoriously difficult for programmers to debug. This paper presents Grace, a software-only runtime system that eliminates concurrency errors for a class of multithreaded programs: those based on fork-join parallelism. By turning threads into processes, leveraging virtual memory protection, and imposing a sequential commit protocol, Grace provides programmers with the appearance of deterministic, sequential execution, while taking advantage of available processing cores to run code concurrently and efficiently. Experimental results demonstrate Graces effectiveness: with modest code changes across a suite of computationally-intensive benchmarks (1-16 lines), Grace can achieve high scalability and performance while preventing concurrency errors.

conference on object-oriented programming systems, languages, and applications | 2002

Reconsidering custom memory allocation

Emery D. Berger; Benjamin G. Zorn; Kathryn S. McKinley

Programmers hoping to achieve performance improvements often use custom memory allocators. This in-depth study examines eight applications that use custom allocators. Surprisingly, for six of these applications, a state-of-the-art general-purpose allocator (the Lea allocator) performs as well as or better than the custom allocators. The two exceptions use regions, which deliver higher performance (improvements of up to 44%). Regions also reduce programmer burden and eliminate a source of memory leaks. However, we show that the inability of programmers to free individual objects within regions can lead to a substantial increase in memory consumption. Worse, this limitation precludes the use of regions for common programming idioms, reducing their usefulness.We present a generalization of general-purpose and region-based allocators that we call reaps. Reaps are a combination of regions and heaps, providing a full range of region semantics with the addition of individual object deletion. We show that our implementation of reaps provides high performance, outperforming other allocators with region-like semantics. We then use a case study to demonstrate the space advantages and software engineering benefits of reaps in practice. Our results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator.

symposium on operating systems principles | 2011

Dthreads: efficient deterministic multithreading

Tongping Liu; Charlie Curtsinger; Emery D. Berger

Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads. This paper presents Dthreads, an efficient deterministic multithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.

programming language design and implementation | 2001

Composing high-performance memory allocators

Emery D. Berger; Benjamin G. Zorn; Kathryn S. McKinley

Current general-purpose memory allocators do not provide sufficient speed or flexibility for modern high-performance applications. Highly-tuned general purpose allocators have per-operation costs around one hundred cycles, while the cost of an operation in a custom memory allocator can be just a handful of cycles. To achieve high performance, programmers often write custom memory allocators from scratch - a difficult and error-prone process. In this paper, we present a flexible and efficient infrastructure for building memory allocators that is based on C++ templates and inheritance. This novel approach allows programmers to build custom and general-purpose allocators as heap layers that can be composed without incurring any additional runtime overhead or additional programming cost. We show that this infrastructure simplifies allocator construction and results in allocators that either match or improve the performance of heavily-tuned allocators written in C, including the Kingsley allocator and the GNU obstack library. We further show this infrastructure can be used to rapidly build a general-purpose allocator that has performance comparable to the Lea allocator, one of the best uniprocessor allocators available. We thus demonstrate a clean, easy-to-use allocator interface that seamlessly combines the power and efficiency of any number of general and custom allocators within a single application.

computer and communications security | 2010

DieHarder: securing the heap

Gene Novark; Emery D. Berger

Heap-based attacks depend on a combination of memory management error and an exploitable memory allocator. Many allocators include ad hoc countermeasures against particular exploits but their effectiveness against future exploits has been uncertain. This paper presents the first formal treatment of the impact of allocator design on security. It analyzes a range of widely-deployed memory allocators, including those used by Windows, Linux, FreeBSD and OpenBSD, and shows that they remain vulnerable to attack. It them presents DieHarder, a new allocator whose design was guided by this analysis. DieHarder provides the highest degree of security from heap-based attacks of any practical allocator of which we are aware while imposing modest performance overhead. In particular, the Firefox web browser runs as fast with DieHarder as with the Linux allocator.

programming language design and implementation | 2007

Exterminator: automatically correcting memory errors with high probability

Gene Novark; Emery D. Berger; Benjamin G. Zorn

Programs written in C and C++ are susceptible to memory errors, including buffer overflows and dangling pointers. These errors, whichcan lead to crashes, erroneous execution, and security vulnerabilities, are notoriously costly to repair. Tracking down their location in the source code is difficult, even when the full memory state of the program is available. Once the errors are finally found, fixing them remains challenging: even for critical security-sensitive bugs, the average time between initial reports and the issuance of a patch is nearly one month. We present Exterminator, a system that automatically correct sheap-based memory errors without programmer intervention. Exterminator exploits randomization to pinpoint errors with high precision. From this information, Exterminator derives runtime patches that fix these errors both in current and subsequent executions. In addition, Exterminator enables collaborative bug correction by merging patches generated by multiple users. We present analytical and empirical results that demonstrate Exterminators effectiveness at detecting and correcting both injected and real faults.

ACM Transactions in Embedded Computing Systems | 2013

PROARTIS: Probabilistically Analyzable Real-Time Systems

Francisco J. Cazorla; Eduardo Quiñones; Tullio Vardanega; Liliana Cucu; Benoit Triquet; Guillem Bernat; Emery D. Berger; Jaume Abella; Franck Wartel; Michael Houston; Luca Santinelli; Leonidas Kosmidis; Code Lo; Dorin Maxim

Static timing analysis is the state-of-the-art practice of ascertaining the timing behavior of current-generation real-time embedded systems. The adoption of more complex hardware to respond to the increasing demand for computing power in next-generation systems exacerbates some of the limitations of static timing analysis. In particular, the effort of acquiring (1) detailed information on the hardware to develop an accurate model of its execution latency as well as (2) knowledge of the timing behavior of the program in the presence of varying hardware conditions, such as those dependent on the history of previously executed instructions. We call these problems the timing analysis walls. In this vision-statement article, we present probabilistic timing analysis, a novel approach to the analysis of the timing behavior of next-generation real-time embedded systems. We show how probabilistic timing analysis attacks the timing analysis walls; we then illustrate the mathematical foundations on which this method is based and the challenges we face in the effort of efficiently implementing it. We also present experimental evidence that shows how probabilistic timing analysis reduces the extent of knowledge about the execution platform required to produce probabilistically accurate WCET estimations.

architectural support for programming languages and operating systems | 2008

Archipelago: trading address space for reliability and security

Vitaliy B. Lvin; Gene Novark; Emery D. Berger; Benjamin G. Zorn

Memory errors are a notorious source of security vulnerabilities that can lead to service interruptions, information leakage and unauthorized access. Because such errors are also difficult to debug, the absence of timely patches can leave users vulnerable to attack for long periods of time. A variety of approaches have been introduced to combat these errors, but these often incur large runtime overheads and generally abort on errors, threatening availability. This paper presents Archipelago, a runtime system that takes advantage of available address space to substantially reduce the likelihood that a memory error will affect program execution. Archipelago randomly allocates heap objects far apart in virtual address space, effectively isolating each object from buffer overflows. Archipelago also protects against dangling pointer errors by preserving the contents of freed objects after they are freed. Archipelago thus trades virtual address space---a plentiful resource on 64-bit systems---for significantly improved program reliability and security, while limiting physical memory consumption by tracking the working set of an application and compacting cold objects. We show that Archipelago allows applications to continue to run correctly in the face of thousands of memory errors. Across a suite of server applications, Archipelagos performance overhead is 6% on average (between -7% and 22%), making it especially suitable to protect servers that have known security vulnerabilities due to heap memory errors.

Explore More