Field G. Van Zee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Field G. Van Zee is active.

Explore More

Publication

Featured researches published by Field G. Van Zee.

ACM Transactions on Mathematical Software | 2009

Programming matrix algorithms-by-blocks for thread-level parallelism

Gregorio Quintana-Ortí; Enrique S. Quintana-Ortí; Robert A. van de Geijn; Field G. Van Zee; Ernie Chan

With the emergence of thread-level parallelism as the primary means for continued performance improvement, the programmability issue has reemerged as an obstacle to the use of architectural advances. We argue that evolving legacy libraries for dense and banded linear algebra is not a viable solution due to constraints imposed by early design decisions. We propose a philosophy of abstraction and separation of concerns that provides a promising solution in this problem domain. The first abstraction, FLASH, allows algorithms to express computation with matrices consisting of contiguous blocks, facilitating algorithms-by-blocks. Operand descriptions are registered for a particular operation a priori by the library implementor. A runtime system, SuperMatrix, uses this information to identify data dependencies between suboperations, allowing them to be scheduled to threads out-of-order and executed in parallel. But not all classical algorithms in linear algebra lend themselves to conversion to algorithms-by-blocks. We show how our recently proposed LU factorization with incremental pivoting and a closely related algorithm-by-blocks for the QR factorization, both originally designed for out-of-core computation, overcome this difficulty. Anecdotal evidence regarding the development of routines with a core functionality demonstrates how the methodology supports high productivity while experimental results suggest that high performance is abundantly achievable.

ACM Transactions on Mathematical Software | 2015

BLIS: A Framework for Rapidly Instantiating BLAS Functionality

Field G. Van Zee; Robert A. van de Geijn

The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels. While others have had similar insights, BLIS reduces the necessary kernels to what we believe is the simplest set that still supports the high performance that the computational science community demands. Higher-level framework code is generalized and implemented in ISO C99 so that it can be reused and/or reparameterized for different operations (and different architectures) with little to no modification. Inserting high-performance kernels into the framework facilitates the immediate optimization of any BLAS-like operations which are cast in terms of these kernels, and thus the framework acts as a productivity multiplier. Users of BLAS-dependent applications are given a choice of using the traditional Fortran-77 BLAS interface, a generalized C interface, or any other higher level interface that builds upon this latter API. Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL).The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental innovation is that virtually all computation within level-2 (matrix-vector) and level-3 (matrix-matrix) BLAS operations can be expressed and optimized in terms of very simple kernels. While others have had similar insights, BLIS reduces the necessary kernels to what we believe is the simplest set that still supports the high performance that the computational science community demands. Higher-level framework code is generalized and implemented in ISO C99 so that it can be reused and/or reparameterized for different operations (and different architectures) with little to no modification. Inserting high-performance kernels into the framework facilitates the immediate optimization of any BLAS-like operations which are cast in terms of these kernels, and thus the framework acts as a productivity multiplier. Users of BLAS-dependent applications are given a choice of using the traditional Fortran-77 BLAS interface, a generalized C interface, or any other higher level interface that builds upon this latter API. Preliminary performance of level-2 and level-3 operations is observed to be competitive with two mature open source libraries (OpenBLAS and ATLAS) as well as an established commercial product (Intel MKL).

acm sigplan symposium on principles and practice of parallel programming | 2008

SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

Ernie Chan; Field G. Van Zee; Paolo Bientinesi; Enrique S. Quintana-Ortí; Gregorio Quintana-Ortí; Robert A. van de Geijn

This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.

Computing in Science and Engineering | 2009

The libflame Library for Dense Matrix Computations

Field G. Van Zee; Ernie Chan; Robert A. van de Geijn; Enrique S. Quintana-Ortí; Gregorio Quintana-Ortí

Researchers from the Formal Linear Algebra Method Environment (Flame) project have developed new methodologies for analyzing, designing, and implementing linear algebra libraries. These solutions, which have culminated in the libflame library, seem to solve many of the programmability problems that have arisen with the advent of multicore and many-core architectures.

ACM Transactions on Mathematical Software | 2016

The BLIS Framework: Experiments in Portability

Field G. Van Zee; Tyler M. Smith; Bryan Marker; Tze Meng Low; Robert A. van de Geijn; Francisco D. Igual; Mikhail Smelyanskiy; Xianyi Zhang; Michael Kistler; Vernon Austel; John A. Gunnels; Lee Killough

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD’s ACML, IBM’s ESSL, and Intel’s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework’s leverage extends to the multithreaded domain.

ACM Transactions on Mathematical Software | 2006

Accumulating Householder transformations, revisited

Thierry Joffrain; Tze Meng Low; Enrique S. Quintana-Ortí; Robert A. van de Geijn; Field G. Van Zee

A theorem related to the accumulation of Householder transformations into a single orthogonal transformation known as the compact WY transform is presented. It provides a simple characterization of the computation of this transformation and suggests an alternative algorithm for computing it. It also suggests an alternative transformation, the UT transform, with the same utility as the compact WY Transform which requires less computation and has similar stability properties. That alternative transformation was first published over a decade ago but has gone unnoticed by the community.

ACM Transactions on Mathematical Software | 2008

Scalable parallelization of FLAME code via the workqueuing model

Field G. Van Zee; Paolo Bientinesi; Tze Meng Low; Robert A. van de Geijn

We discuss the OpenMP parallelization of linear algebra algorithms that are coded using the Formal Linear Algebra Methods Environment (FLAME) API. This API expresses algorithms at a higher level of abstraction, avoids the use loop and array indices, and represents these algorithms as they are formally derived and presented. We report on two implementations of the workqueuing model, neither of which requires the use of explicit indices to specify parallelism. The first implementation uses the experimental taskq pragma, which may influence the adoption of a similar construct into OpenMP 3.0. The second workqueuing implementation is domain-specific to FLAME but allows us to illustrate the benefits of sorting tasks according to their computational cost prior to parallel execution. In addition, we discuss how scalable parallelization of dense linear algebra algorithms via OpenMP will require a two-dimensional partitioning of operands much like a 2D data distribution is needed on distributed memory architectures. We illustrate the issues and solutions by discussing the parallelization of the symmetric rank-k update and report impressive performance on an SGI system with 14 Itanium2 processors.

european conference on parallel processing | 2007

Toward scalable matrix multiply on multithreaded architectures

Bryan Marker; Field G. Van Zee; Kazushige Goto; Gregorio Quintana-Ortí; Robert A. van de Geijn

We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.

Computing in Science and Engineering | 2009

Introducing: The Libflame Library for Dense Matrix Computations

Field G. Van Zee; Ernie Chan; Robert A. van de Geijn; Enrique S. Quintana; Gregorio Quintana-Ortí

As part of the FLAME project, we have been dilligently developing new methodologies for analyzing, designing, and implementing linear algebra libraries. While we did not know it when we started, these techniques appear to solve many of the programmability problems that now face us with the advent of multicore and many-core architectures. These efforts have culminated in a new library, libflame, which strives to replace similar libraries that date back to the late 20th century. With this paper, we introduce the scientific computing community to this library.

acm sigplan symposium on principles and practice of parallel programming | 2005

Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications

Tze Meng Low; Robert A. van de Geijn; Field G. Van Zee

We show how to exploit high-level information, available as part of the derivation of provably correct algorithms, so that SMP parallelism can be systematically identified. Recent research has shown that loop-based dense linear algebra algorithms can be systematically derived from the mathematical specification of the operation. Fundamental to the methodology is the determination of loop-invariants (in the sense of Dijkstra and Hoare) from which correct loops can be systematically derived. We show how the high-level specification of the operation together with these loop-invariants can be exploited to detect the independence of loop iterations. This in turn then allows a Workqueuing Model to be used to implement and parallelize the algorithms using a feature proposed for OpenMP 3.0, task queues. Although performance is not the main feature of this paper, performance is reported on a 4 CPU Itanium2 server for a concrete example, the symmetric rank-k update operation.

Explore More