Mark W. Stephenson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mark W. Stephenson is active.

Explore More

Publication

Featured researches published by Mark W. Stephenson.

symposium on code generation and optimization | 2010

Statistically regulating program behavior via mainstream computing

Mark W. Stephenson; Ram Rangan; Emmanuel Yashchin; Eric Van Hensbergen

We introduce mainstream computing, a collaborative system that dynamically checks a program--via runtime assertion checks--to ensure that it is running according to expectation. Rather than enforcing strict, statically-defined assertions, our system allows users to run with a set of assertions that are statistically guaranteed to fail at a rate bounded by a user-defined probability, pfail. For example, a user can request a set of assertions that will fail at most 0.5% of the times the application is invoked. Users who believe their usage of an application is mainstream can use relatively large settings for pfail. Higher values of pfail provide stricter regulation of the application which likely enhances security, but will also inhibit some legitimate program behaviors; in contrast, program behavior is unregulated when pfail = 0, leaving the user vulnerable to attack. We show that our prototype is able to detect denial of service attacks, integer overflows, frees of uninitialized memory, boundary violations, and an injection attack. In addition we perform experiments with a mainstream computing system designed to protect against soft errors.

ieee international symposium on workload characterization | 2007

Characterizing and Improving the Performance of Bioinformatics Workloads on the POWER5 Architecture

Vipin Sachdeva; Evan Speight; Mark W. Stephenson; Lei Chen

This paper examines several mechanisms to improve the performance of life science applications on high-performance computer architectures typically designed for more traditional supercomputing tasks. In particular, we look at the detailed performance characteristics of some of the most popular sequence alignment and homology applications on the POWERS architecture offering from IBM. Through detailed analysis of performance counter information collected from the hardware, we identify the main performance bottleneck in the current POWER5 architecture for these applications is the high branch misprediction penalty of the most time-consuming kernels of these codes. Utilizing our PowerPC full system simulation environment, we show the performance improvement afforded by adding conditional assignments to the PowerPC ISA. We also show the impact of changing the number of functional units to a more appropriate mix for the characteristics of bioinformatics applications. Finally, we examine the benefit of removing the two-cycle penalty currently in the POWERS architecture for taken branches due to the lack of a branch target buffer. Addressing these three performance-limiting aspects provides an average 64% improvement in application performance.

high-performance computer architecture | 2009

Lightweight predication support for out of order processors

Mark W. Stephenson; Lixin Zhang; Ram Rangan

The benefits of Out of Order (OOO) processing are well known, as is the effectiveness of predicated execution for unpredictable control flow. However, as previous research has demonstrated, these techniques are at odds with one another. One common approach to reconciling their differences is to simplify the form of predication supported by the architecture. For instance, the only form of predication supported by modern OOO processors is a simple conditional move. We argue that it is the simplicity of conditional move that has allowed its widespread adoption, but we also show that this simplicity compromises its effectiveness as a compilation target. In this paper, we introduce a generalized form of hammock predication - called predicated mutually exclusive groups - that requires few modifications to an existing processor pipeline, yet presents the compiler with abundant predication opportunities. In comparison to non-predicated code running on an aggressively clocked baseline system, our technique achieves an 8% speedup averaged across three important benchmark suites.

ieee international conference on high performance computing data and analytics | 2013

A study of application-level recovery methods for transient network faults

Ignacio Laguna; Edgar A. León; Martin Schulz; Mark W. Stephenson

With the increasing number of components in HPC systems, transient faults will become commonplace. Today, network transient faults, such as lost or corrupted network packets, are addressed by middleware libraries at the cost of high memory usage and packet retransmissions. These costs, however, can be eliminated using application-level fault tolerance. In this paper, we propose recovery methods for transient network faults at the application level. These methods reconstruct missing or corrupted data via interpolation. We derive a realistic fault model using network fault rates from a production HPC cluster and use it to demonstrate the effectiveness of our reconstruction methods in an FFT kernel. We found that the normalized root-mean-square error for FFT computations can be as low as 0.1% and, thus, demonstrates that network faults can be handled at the application level with low perturbation in applications that have smoothness in their computed data.

international conference on supercomputing | 2013