Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Derek Bruening is active.

Publication


Featured researches published by Derek Bruening.


symposium on code generation and optimization | 2011

Practical memory checking with Dr. Memory

Derek Bruening; Qin Zhao

Memory corruption, reading uninitialized memory, using freed memory, and other memory-related errors are among the most difficult programming bugs to identify and fix due to the delay and non-determinism linking the error to an observable symptom. Dedicated memory checking tools are invaluable for finding these errors. However, such tools are difficult to build, and because they must monitor all memory accesses by the application, they incur significant overhead. Accuracy is another challenge: memory errors are not always straightforward to identify, and numerous false positive error reports can make a tool unusable. A third obstacle to creating such a tool is that it depends on low-level operating system and architectural details, making it difficult to port to other platforms and difficult to target proprietary systems like Windows. This paper presents Dr. Memory, a memory checking tool that operates on both Windows and Linux applications. Dr. Memory handles the complex and not fully documented Windows environment, and avoids reporting false positive memory leaks that plague traditional leak locating algorithms. Dr. Memory employs efficient instrumentation techniques; a direct comparison with the state-of-the-art Valgrind Memcheck tool reveals that Dr. Memory is twice as fast as Memcheck on average and up to four times faster on individual benchmarks.


virtual execution environments | 2012

Transparent dynamic instrumentation

Derek Bruening; Qin Zhao; Saman P. Amarasinghe

Process virtualization provides a virtual execution environment within which an unmodified application can be monitored and controlled while it executes. The provided layer of control can be used for purposes ranging from sandboxing to compatibility to profiling. The additional operations required for this layer are performed clandestinely alongside regular program execution. Software dynamic instrumentation is one method for implementing process virtualization which dynamically instruments an application such that the applications code and the inserted code are interleaved together. DynamoRIO is a process virtualization system implemented using software code cache techniques that allows users to build customized dynamic instrumentation tools. There are many challenges to building such a runtime system. One major obstacle is transparency. In order to support executing arbitrary applications, DynamoRIO must be fully transparent so that an application cannot distinguish between running inside the virtual environment and native execution. In addition, any desired extra operations for a particular tool must avoid interfering with the behavior of the application. Transparency has historically been provided on an ad-hoc basis, as a reaction to observed problems in target applications. This paper identifies a necessary set of transparency requirements for running mainstream Windows and Linux applications. We discuss possible solutions to each transparency issue, evaluate tradeoffs between different choices, and identify cases where maintaining transparency is not practically solvable. We believe this will provide a guideline for better design and implementation of transparent dynamic instrumentation, as well as other similar process virtualization systems using software code caches.


virtual execution environments | 2011

Dynamic cache contention detection in multi-threaded applications

Qin Zhao; David Koh; Syed Raza; Derek Bruening; Weng-Fai Wong; Saman P. Amarasinghe

In todays multi-core systems, cache contention due to true and false sharing can cause unexpected and significant performance degradation. A detailed understanding of a given multi-threaded applications behavior is required to precisely identify such performance bottlenecks. Traditionally, however, such diagnostic information can only be obtained after lengthy simulation of the memory hierarchy. In this paper, we present a novel approach that efficiently analyzes interactions between threads to determine thread correlation and detect true and false sharing. It is based on the following key insight: although the slowdown caused by cache contention depends on factors including the thread-to-core binding and parameters of the memory hierarchy, the amount of data sharing is primarily a function of the cache line size and application behavior. Using memory shadowing and dynamic instrumentation, we implemented a tool that obtains detailed sharing information between threads without simulating the full complexity of the memory hierarchy. The runtime overhead of our approach --- a 5x slowdown on average relative to native execution --- is significantly less than that of detailed cache simulation. The information collected allows programmers to identify the degree of cache contention in an application, the correlation among its threads, and the sources of significant false sharing. Using our approach, we were able to improve the performance of some applications up to a factor of 12x. For other contention-intensive applications, we were able to shed light on the obstacles that prevent their performance from scaling to many cores.


symposium on code generation and optimization | 2010

Umbra: efficient and scalable memory shadowing

Qin Zhao; Derek Bruening; Saman P. Amarasinghe

Shadow value tools use metadata to track properties of application data at the granularity of individual machine instructions. These tools provide effective means of monitoring and analyzing the runtime behavior of applications. However, the high runtime overhead stemming from fine-grained monitoring often limits the use of such tools. Furthermore, 64-bit architectures pose a new challenge to the building of efficient memory shadowing tools. Current tools are not able to efficiently monitor the full 64-bit address space due to limitations in their shadow metadata translation. This paper presents an efficient and scalable memory shadowing framework called Umbra. Employing a novel translation scheme, Umbra supports efficient mapping from application data to shadow metadata for both 32-bit and 64-bit applications. Umbras translation scheme does not rely on any platform features and is not restricted to any specific shadow memory size. We also present several mapping optimizations and general dynamic instrumentation techniques that substantially reduce runtime overhead, and demonstrate their effectiveness on a real-world shadow value tool. We show that shadow memory translation overhead can be reduced to just 133% on average.


virtual execution environments | 2008

Process-shared and persistent code caches

Derek Bruening; Vladimir Kiriansky

Software code caches are increasingly being used to amortizethe runtime overhead of tools such as dynamic optimizers, simulators, and instrumentation engines. The additional memory consumed by these caches, along with the data structures used to manage them, limits the scalability of dynamic tool deployment. Inter-process sharing of code caches significantly improves the ability to efficiently apply code caching tools to many processes simultaneously. In this paper, we present a method of code cache sharing among processes for dynamic tools operating on native applications. Our design also supports code cache persistence for improved cold code execution in short-lived processes or long initialization sequences. Sharing raises security concerns, and we show how to achieve sharing without risk of privilege escalation and with read-only code caches and associated data structures. We evaluate process-shared and persisted code caches implemented in the DynamoRIO industrial-strength dynamic instrumentation engine, where we achieve a two-thirds reduction in both memory usage and startup time.


international symposium on memory management | 2010

Efficient memory shadowing for 64-bit architectures

Qin Zhao; Derek Bruening; Saman P. Amarasinghe

Shadow memory is used by dynamic program analysis tools to store metadata for tracking properties of application memory. The efficiency of mapping between application memory and shadow memory has substantial impact on the overall performance of such analysis tools. However, traditional memory mapping schemes that work well on 32-bit architectures cannot easily port to 64-bit architectures due to the much larger 64-bit address space. This paper presents EMS64, an efficient memory shadowing scheme for 64-bit architectures. By taking advantage of application reference locality and unused regions in the 64-bit address space, EMS64 provides a fast and flexible memory mapping scheme without relying on any underlying platform features or requiring any specific shadow memory size. Our experiments show that EMS64 is able to reduce the runtime shadow memory translation overhead to 81% on average, which almost halves the overhead of the fastest 64-bit shadow memory system we are aware of.


symposium on code generation and optimization | 2015

Optimizing binary translation of dynamically generated code

Byron Hawkins; Brian Demsky; Derek Bruening; Qin Zhao

Dynamic binary translation serves as a core technology that enables a wide range of important tools such as profiling, bug detection, program analysis, and security. Many of the target applications often include large amounts of dynamically generated code, which poses a special performance challenge in maintaining consistency between the source application and the translated application. This paper introduces two approaches for optimizing binary translation of JITs and other dynamic code generators. First we present a system of efficient source code annotations that allow developers to demarcate dynamic code regions and identify code changes within those regions. The second technique avoids the annotation and source code requirements by automatically inferring the presence of a JIT and instrumenting its write instructions with translation consistency operations. We implemented these techniques in DynamoRIO and demonstrate performance improvements over the state-of-the-art DBT systems on JIT applications as high as 7.3× over base DynamoRIO and Pin.


symposium on code generation and optimization | 2013

Instant profiling: Instrumentation sampling for profiling datacenter applications

Hyoun Kyu Cho; Tipp Moseley; Richard E. Hank; Derek Bruening; Scott A. Mahlke

Profile-guided optimization possesses huge potential to save costs for datacenters. Hardware performance monitoring units enable profiling with negligible overhead and they have been proven to be effective to help programmers find code regions to optimize by monitoring datacenter applications continuously on live traffic. However, these hardware features are inflexible and often buggy, limiting the types of data that can be gathered. Instrumentation-based profiling can complement or replace hardware functionality by providing more flexible and targeted information gathering. Unfortunately, the overhead of existing instrumentation mechanisms prevents their use in production runs. In order to be used in datacenters, we need a profiling mechanism to impose overheads of less than a few percent, in terms of both throughput and latency, while still generating meaningful profile data. This paper presents instant profiling, an instrumentation sampling technique using dynamic binary translation. Instead of instrumenting the entire execution, instant profiling periodically interleaves native execution and instrumented execution according to configurable profiling duration and frequency parameters. It further reduces the latency degradation of initial profiling phases by pre-populating a software code cache. We evaluate the performance and effectiveness of this new profiling technique on the SPEC CINT2006 benchmark suite and two datacenter application benchmarks. We show that it is well-suited for deployment to datacenters by incurring less than 6% slowdown and 3% computational overhead on average.


usenix annual technical conference | 2012

AddressSanitizer: a fast address sanity checker

Konstantin Serebryany; Derek Bruening; Alexander Potapenko; Dmitriy Vyukov


Archive | 2006

Constraint injection system for immunizing software programs against vulnerabilities and attacks

Saman P. Amarasinghe; Bharath Chandramohan; Charles Renert; Derek Bruening; Vladimir Kiriansky; Timothy Garnett; Sandy Wilbourn; Warren Wu

Collaboration


Dive into the Derek Bruening's collaboration.

Top Co-Authors

Avatar

Saman P. Amarasinghe

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Vladimir Kiriansky

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge