Matthew J. Bridges | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew J. Bridges is active.

Explore More

Publication

Featured researches published by Matthew J. Bridges.

international symposium on microarchitecture | 2004

RIFLE: An Architectural Framework for User-Centric Information-Flow Security

Neil Vachharajani; Matthew J. Bridges; Jonathan Chang; Ram Rangan; Guilherme Ottoni; Jason A. Blome; George A. Reis; Manish Vachharajani; David I. August

Even as modern computing systems allow the manipulation and distribution of massive amounts of information, users of these systems are unable to manage the confidentiality of their data in a practical fashion. Conventional access control security mechanisms cannot prevent the illegitimate use of privileged data once access is granted. For example, information provided by a user during an online purchase may be covertly delivered to malicious third parties by an untrustworthy web browser. Existing information-flow security mechanisms do provide this assurance, but only for programmer-specified policies enforced during program development as a static analysis on special-purpose type-safe languages. Not only are these techniques not applicable to many commonly used programs, but they leave the user with no defense against malicious programmers or altered binaries. In this paper, we propose RIFLE, a runtime information-flow security system designed from the users perspective. By addressing information-flow security using architectural support, RIFLE gives users a practical way to enforce their own information-flow security policy on all programs. We prove that, contrary to statements in the literature, run-time systems like RIFLE are no less secure than existing language-based techniques. Using a model of the architectural framework and a binary translator, we demonstrate RIFLEs correctness and illustrate that the performance cost is reasonable.

international symposium on microarchitecture | 2007

Revisiting the Sequential Programming Model for Multi-Core

Matthew J. Bridges; Neil Vachharajani; Yun Zhang; Thomas B. Jablin; David I. August

Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and efforts to avoid race conditions, deadlocks, and other problems associated with parallel programming. To address these costs, other approaches, such as automatic thread extraction, have been explored. Unfortunately, the amount of parallelism that has been automatically extracted is generally insufficient to keep many cores busy. This paper argues that this lack of parallelism is not an intrinsic limitation of the sequential programming model, but rather occurs for two reasons. First, there exists no framework for automatic thread extraction that brings together key existing state-of-the-art compiler and hardware techniques. This paper shows that such a framework can yield scalable parallelization on several SPEC CINT2000 benchmarks. Second, existing sequential programming languages force programmers to define a single legal program outcome, rather than allowing for a range of legal outcomes. This paper shows that natural extensions to the sequential programming model enable parallelization for the remainder of the SPEC CINT2000 suite. Our experience demonstrates that, by changing only 60 source code lines, all of the C benchmarks in the SPEC CINT2000 suite were parallelizable by automatic thread extraction. This process, constrained by the limits of modern optimizing compilers, yielded a speedup of 454% on these applications.

symposium on code generation and optimization | 2008

Parallel-stage decoupled software pipelining

Easwaran Raman; Guilherme Ottoni; Arun Raman; Matthew J. Bridges; David I. August

In recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and new applications to make effective use of CMPs, it is desirable that compilers automatically extract thread-level parallelism from single-threaded applications. DOALL is a popular automatic technique for loop-level parallelization employed successfully in the domains of scientific and numeric computing. While DOALL generally scales well with the number of iterations of the loop, its applicability is limited by the presence of loop-carried dependences. A parallelization technique with greater applicability is decoupled software pipelining (DSWP), which parallelizes loops even in the presence of loop-carried dependences. However, the scalability of DSWP is limited by the size of the loop body and the number of recurrences it contains, which are usually smaller than the loop iteration count. This work proposes a novel non-speculative compiler parallelization technique called parallel-stage decoupled software pipelining (PS-DSWP). The goal of PS-DSWP is to combine the applicability of DSWP with the scalability of DOALL parallelization. A key insight of PS-DSWP is that, after isolating the recurrences in their own stages in DSWP, portions of the loop suitable for DOALL parallelization may be exposed. PS-DSWP extends DSWP to benefit from these opportunities, utilizing multiple threads to execute the same stage of a DSWPed loop in parallel. This paper describes the PS-DSWP transformation in detail and discusses its implementation in a research compiler. PS-DSWP produces an average speedup of 114% (up to a maximum of 155%) with 6 threads on loops from a set of 5 applications. Our experiments also demonstrate that PS-DSWP achieves better scalability with the number of threads than DSWP.

international conference on parallel architectures and compilation techniques | 2007

Speculative Decoupled Software Pipelining

Neil Vachharajani; Ram Rangan; Easwaran Raman; Matthew J. Bridges; Guilherme Ottoni; David I. August

In recent years, microprocessor manufacturers have shifted their focus from single-core to multi-core processors. To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction. A recently proposed technique, Decoupled software pipelining (DSWP), has demonstrated promise by partitioning loops into long-running, fine-grained threads organized into a pipeline. Using a pipeline organization and execution decoupled by inter-core communication queues, DSWP offers increased execution efficiency that is largely independent of inter-core communication latency. This paper proposes adding speculation to DSWP and evaluates an automatic approach for its implementation. By speculating past infrequent dependences, the benefit of DSWP is increased by making it applicable to more loops, facilitating better balanced threads, and enabling parallelized loops to be run on more cores. Unlike prior speculative threading proposals, speculative DSWP focuses on breaking dependence recurrences. By speculatively breaking these recurrences, instructions that were formerly restricted to a single thread to ensure decoupling are now free to span multiple threads. Using an initial automatic compiler implementation and a validated processor model, this paper demonstrates significant gains using speculation for 4-core chip multiprocessor models running a variety of codes.

international symposium on microarchitecture | 2008

Revisiting the Sequential Programming Model for the Multicore Era

Matthew J. Bridges; Neil Vachharajani; Yun Zhang; Thomas B. Jablin; David I. August

Automatic parallelization has thus far not been successful at extracting scalable parallelism from general programs. An aggressive automatic thread extraction framework, coupled with natural extensions to the sequential programming model that allow for a range of legal outcomes rather than forcing programmers to define a single legal program outcome, will let programmers achieve the performance of parallel programming via the simpler sequential model.

symposium on code generation and optimization | 2005

Practical and Accurate Low-Level Pointer Analysis

Bolei Guo; Matthew J. Bridges; Spyridon Triantafyllis; Guilherme Ottoni; Easwaran Raman; David I. August

Pointer analysis is traditionally performed once, early in the compilation process, upon an intermediate representation (IR) with source-code semantics. However, performing pointer analysis only once at this level imposes a phase-ordering constraint, causing alias information to become stale after subsequent code transformations. Moreover, high-level pointer analysis cannot be used at link time or run time, where the source code is unavailable. This paper advocates performing pointer analysis on a low-level intermediate representation. We present the first context-sensitive and partially flow-sensitive points-to analysis designed to operate at the assembly level. As we will demonstrate, low-level pointer analysis can be as accurate as high-level analysis. Additionally, our low-level pointer analysis also enables a quantitative comparison of propagating high-level pointer analysis results through subsequent code transformations, versus recomputing them at the low level. We show that, for C programs, the former practice is considerably less accurate than the latter.

programming language design and implementation | 2006

A framework for unrestricted whole-program optimization

Spyridon Triantafyllis; Matthew J. Bridges; Easwaran Raman; Guilherme Ottoni; David I. August

Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interprocedural optimization, have been developed to alleviate this problem, but, due to code growth and compile time issues, these can be applied only sparingly.This paper introduces the Procedure Boundary Elimination (PBE) compilation framework, which allows unrestricted whole-program optimization. PBE allows all intra-procedural optimizations and analyses to operate on arbitrary subgraphs of the program, regardless of the original procedure boundaries and without resorting to inlining. In order to control compilation time, PBE also introduces novel extensions of region formation and encapsulation. PBE enables targeted code specialization, which recovers the specialization benefits of inlining while keeping code growth in check. This paper shows that PBE attains better performance than inlining with half the code growth.

IEEE Computer Architecture Letters | 2006

From sequential programs to concurrent threads

Guilherme Ottoni; Ram Rangan; Adam Stoler; Matthew J. Bridges; David I. August

Chip multiprocessors are of increasing importance due to difficulties in achieving higher clock frequencies in uniprocessors, but their success depends on finding useful work for the processor cores. This paper addresses this challenge by presenting a simple compiler approach that extracts non-speculative thread-level parallelism from sequential codes. We present initial results from this technique targeting a validated dual-core processor model, achieving speedups ranging from 9-48% with an average of 25% for important benchmark loops over their single-threaded versions. We also identify important next steps found during our pursuit of higher degrees of automatic threading

compiler construction | 2006

Selective runtime memory disambiguation in a dynamic binary translator

Bolei Guo; Youfeng Wu; Cheng Wang; Matthew J. Bridges; Guilherme Ottoni; Neil Vachharajani; Jonathan Chang; David I. August

Alias analysis, traditionally performed statically, is unsuited for a dynamic binary translator (DBT) due to incomplete control-flow information and the high complexity of an accurate analysis. Whole- program profiling, however, shows that most memory references do not alias. The current technique used in DBTs to disambiguate memory references, instruction inspection, is too simple and can only disambiguate one-third of potential aliases. To achieve effective memory disambiguation while keeping a tight bound on analysis overhead, we propose an efficient heuristic algorithm that strategically selects key memory dependences to disambiguate with runtime checks. These checks have little runtime overhead and, in the common case where aliasing does not occur, enable aggressive optimizations, particularly scheduling. We demonstrate that a small number of checks, inserted with a low-overhead analysis, can approach optimal scheduling, where all false memory dependences are removed. Simulation shows that better scheduling alone improves overall performance by 5%.

programming language design and implementation | 2006

Automatic instruction scheduler retargeting by reverse-engineering

Matthew J. Bridges; Neil Vachharajani; Guilherme Ottoni; David I. August

In order to generate high-quality code for modern processors, a compiler must aggressively schedule instructions, maximizing resource utilization for execution efficiency. For a compiler to produce such code, it must avoid structural hazards by being aware of the processors available resources and of how these resources are utilized by each instruction. Unfortunately, the most prevalent approach to constructing such a scheduler, manually discovering and specifying this information, is both tedious and error-prone. This paper presents a new approach which, when given a processor or processor model, automatically determines this information. After establishing that the problem of perfectly determining a processors structural hazards through probing is not solvable, this paper proposes a heuristic algorithm that discovers most of this information in practice. This can be used either to alleviate the problems associated with manual creation or to verify an existing specification. Scheduling with these automatically derived structural hazards yields almost all of the performance gain achieved using perfect hazard information.

Explore More