Bryan Marker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bryan Marker is active.

Explore More

Publication

Featured researches published by Bryan Marker.

ACM Transactions on Mathematical Software | 2013

Elemental: A New Framework for Distributed Memory Dense Matrix Computations

Jack Poulson; Bryan Marker; Robert A. van de Geijn; Jeff R. Hammond; Nichols A. Romero

Parallelizing dense matrix computations to distributed memory architectures is a well-studied subject and generally considered to be among the best understood domains of parallel computing. Two packages, developed in the mid 1990s, still enjoy regular use: ScaLAPACK and PLAPACK. With the advent of many-core architectures, which may very well take the shape of distributed memory architectures within a single processor, these packages must be revisited since the traditional MPI-based approaches will likely need to be extended. Thus, this is a good time to review lessons learned since the introduction of these two packages and to propose a simple yet effective alternative. Preliminary performance results show the new solution achieves competitive, if not superior, performance on large clusters.

ACM Transactions on Mathematical Software | 2016

The BLIS Framework: Experiments in Portability

Field G. Van Zee; Tyler M. Smith; Bryan Marker; Tze Meng Low; Robert A. van de Geijn; Francisco D. Igual; Mikhail Smelyanskiy; Xianyi Zhang; Michael Kistler; Vernon Austel; John A. Gunnels; Lee Killough

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD’s ACML, IBM’s ESSL, and Intel’s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework’s leverage extends to the multithreaded domain.

ieee international conference on high performance computing data and analytics | 2012

Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer

Bryan Marker; Jack Poulson; Don S. Batory; Robert A. van de Geijn

To implement dense linear algebra algorithms for distributed-memory computers, an expert applies knowledge of the domain, the target architecture, and how to parallelize common operations. This is often a rote process that becomes tedious for a large collection of algorithms. We have developed a way to encode this expert knowledge such that it can be applied by a system to generate mechanically the same (and sometimes better) highly-optimized code that an expert creates by hand. This paper illustrates how we have encoded a subset of this knowledge and how our system applies it and searches a space of generated implementations automatically.

Journal of Neuroscience Methods | 2013

Automating ultrasonic vocalization analyses: the WAAVES program.

James M. Reno; Bryan Marker; Lawrence K. Cormack; Timothy Schallert; Christine L. Duvauchelle

BACKGROUND Human emotion is a crucial component of drug abuse and addiction. Ultrasonic vocalizations (USVs) elicited by rodents are a highly translational animal model of emotion in drug abuse studies. A major roadblock to comprehensive use of USV data is the overwhelming burden to attain accurate USV assessment in a timely manner. One of the most accurate methods of analyzing USVs, human auditory detection with simultaneous spectrogram inspection, requires USV sound files to be played back 4% normal speed. NEW METHOD WAAVES (WAV-file Automated Analysis of Vocalizations Environment Specific) is an automated USV assessment program utilizing MATLABs Signal and Image Processing Toolboxes in conjunction with a series of customized filters to separate USV calls from background noise, and accurately tabulate and categorize USVs as flat or frequency-modulated (FM) calls. In the current report, WAAVES functionality is demonstrated by USV analyses of cocaine self-administration data collected over 10 daily sessions. RESULTS WAAVES counts are significantly correlated with human auditory counts (r(48)=0.9925; p<0.001). Statistical analyses used WAAVES output to examine individual differences in USV responses to cocaine, cocaine-associated cues and relationships between USVs, cocaine intake and locomotor activity. COMPARISON WITH EXISTING METHOD WAAVES output is highly accurate and provides tabulated data in approximately 0.3% of the time required when using human auditory detection methods. CONCLUSIONS The development of a customized USV analysis program, such as WAAVES streamlines USV assessment and enhances the ability to utilize USVs as a tool to advance drug abuse research and ultimately develop effective treatments.

european conference on parallel processing | 2007

Toward scalable matrix multiply on multithreaded architectures

Bryan Marker; Field G. Van Zee; Kazushige Goto; Gregorio Quintana-Ortí; Robert A. van de Geijn

We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.

international conference on conceptual structures | 2013

Code Generation and Optimization of Distributed-memory Dense Linear Algebra Kernels

Bryan Marker; Don S. Batory; Robert A. van de Geijn

Design by Transformation (DxT) is an approach to software development that encodes domain-specific programs as graphs and expert design knowledge as graph transformations. The goal of DxT is to mechanize the generation of highly-optimized code. This paper demonstrates how DxT can be used to transform sequential specifications of an important set of Dense Linear Algebra (DLA) kernels, the level-3 Basic Linear Algebra Subprograms (BLAS3), into high-performing library routines targeting distributed-memory (cluster) architectures. Getting good BLAS3 performance for such platforms requires deep domain knowledge, so their implementations are manually coded by experts. Unfortunately, there are few such experts and developing the full variety of BLAS3 implementations takes a lot of repetitive e ort. A prototype tool, DxTer, automates this tedious task. We explain how we build on previous work to represent loops and multiple loop-based algorithms in DxTer. Performance results on a BlueGene/P parallel supercomputer show that the generated code meets or beats implementations that are hand-coded by a human expert and outperforms the widely used ScaLAPACK library.

ieee international conference on high performance computing data and analytics | 2013

A case study in mechanically deriving dense linear algebra code

Bryan Marker; Don S. Batory; Robert A. van de Geijn

Design by Transformation (DxT) is a top-down approach to mechanically derive high-performance algorithms for dense linear algebra. We use DxT to derive the implementation of a representative matrix operation, two- sided Trmm. We start with a knowledge base of transformations that were encoded for a simpler set of operations, the level-3 BLAS, and add only a few transformations to accommodate the more complex two- sided Trmm. These additions explode the search space of our prototype system, DxTer, requiring the novel techniques defined in this paper to eliminate large segments of the search space that contain suboptimal algorithms. Performance results for the mechanically optimized implementations on 8192 cores of a BlueGene/P architecture are given.

Concurrency and Computation: Practice and Experience | 2012

Programming many-core architectures - a case study: dense matrix computations on the Intel single-chip cloud computer processor

Bryan Marker; Ernie Chan; Jack Poulson; Robert A. van de Geijn; Rob F. Van der Wijngaart; Timothy G. Mattson; Theodore E. Kubaska

A message passing, distributed‐memory parallel computer on a chip is one possible design for future, many‐core architectures. We discuss initial experiences with the Intel Single‐chip Cloud Computer research processor, which is a prototype architecture that incorporates 48 cores on a single die that can communicate via a small, shared, on‐die buffer. The experiment is to port a state‐of‐the‐art, distributed‐memory, dense matrix library, Elemental, to this architecture and gain insight from the experience. We show that programmability addressed by this library, especially the proper abstraction for collective communication, greatly aids the porting effort. This enables us to support a wide range of functionality with limited changes to the library code. Copyright

software language engineering | 2013

Dark Knowledge and Graph Grammars in Automated Software Design

Don S. Batory; Rui Carlos Araújo Gonçalves; Bryan Marker; Janet Siegmund

Mechanizing the development of hard-to-write and costly-to-maintain software is the core problem of automated software design. Encoding expert knowledge (a.k.a. dark knowledge) about a software domain is central to its solution. We assert that a solution can be cast in terms of the ideas of language design and engineering. Graph grammars can be a foundation for modern automated software development. The sentences of a grammar are designs of complex dataflow systems. We explain how graph grammars provide a framework to encode expert knowledge, produce correct-by-construction derivations of dataflow applications, enable the generation of high-performance code, and improve how software design of dataflow applications can be taught to undergraduates.

generative programming and component engineering | 2012

Pushouts in software architecture design

Taylor L. Riché; Rui Carlos Araújo Gonçalves; Bryan Marker; Don S. Batory

A classical approach to program derivation is to progressively extend a simple specification and then incrementally refine it to an implementation. We claim this approach is hard or impractical when reverse engineering legacy software architectures. We present a case study that shows optimizations and pushouts---in addition to refinements and extensions---are essential for practical stepwise development of complex software architectures.

Explore More