Tyler M. Smith | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tyler M. Smith is active.

Explore More

Publication

Featured researches published by Tyler M. Smith.

ACM Transactions on Mathematical Software | 2016

The BLIS Framework: Experiments in Portability

Field G. Van Zee; Tyler M. Smith; Bryan Marker; Tze Meng Low; Robert A. van de Geijn; Francisco D. Igual; Mikhail Smelyanskiy; Xianyi Zhang; Michael Kistler; Vernon Austel; John A. Gunnels; Lee Killough

BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD’s ACML, IBM’s ESSL, and Intel’s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework’s leverage extends to the multithreaded domain.

ACM Transactions on Mathematical Software | 2016

Analytical Modeling Is Enough for High-Performance BLIS

Tze Meng Low; Francisco D. Igual; Tyler M. Smith; Enrique S. Quintana-Ortí

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning parameters for high-end instantiations of the matrix-matrix multiplication. This is of both practical and scientific importance, as it greatly reduces the development effort required for the implementation of the level-3 BLAS while also advancing our understanding of how hierarchically layered memories interact with high-performance software. This allows the community to move on from valuable engineering solutions (empirically autotuning) to scientific understanding (analytical insight).

Developmental Biology | 2016

TTG2 controls the developmental regulation of seed coat tannins in Arabidopsis by regulating vacuolar transport steps in the proanthocyanidin pathway.

Antonio Gonzalez; Matthew A. Brown; Greg Hatlestad; Neda Akhavan; Tyler M. Smith; Austin Hembd; Joshua Moore; David Montes; Trenell Mosley; Juan Resendez; Huy Nguyen; Lyndsey Wilson; Annabelle Campbell; Duncan Sudarshan; Alan Lloyd

The brown color of Arabidopsis seeds is caused by the deposition of proanthocyanidins (PAs or condensed tannins) in their inner testa layer. A transcription factor complex consisting of TT2, TT8 and TTG1 controls expression of PA biosynthetic genes, just as similar TTG1-dependent complexes have been shown to control flavonoid pigment pathway gene expression in general. However, PA synthesis is controlled by at least one other gene. TTG2 mutants lack the pigmentation found in wild-type seeds, but produce other flavonoid compounds, such as anthocyanins in the shoot, suggesting that TTG2 regulates genes in the PA biosynthetic branch of the flavonoid pathway. We analyzed the expression of PA biosynthetic genes within the developing seeds of ttg2-1 and wild-type plants for potential TTG2 regulatory targets. We found that expression of TT12, encoding a MATE type transporter, is dependent on TTG2 and that TTG2 can bind to the upstream regulatory region of TT12 suggesting that TTG2 directly regulates TT12. Ectopic expression of TT12 in ttg2-1 plants partially restores seed coat pigmentation. Moreover, we show that TTG2 regulation of TT12 is dependent on TTG1 and that TTG1 and TTG2 physically interact. The observation that TTG1 interacts with TTG2, a WRKY type transcription factor, proposes the existence of a novel TTG1-containing complex, and an addendum to the existing paradigm of flavonoid pathway regulation.

ACM Transactions on Mathematical Software | 2017

Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods

Field G. Van Zee; Tyler M. Smith

In this article, we explore the implementation of complex matrix multiplication. We begin by briefly identifying various challenges associated with the conventional approach, which calls for a carefully written kernel that implements complex arithmetic at the lowest possible level (i.e., assembly language). We then set out to develop a method of complex matrix multiplication that avoids the need for complex kernels altogether. This constraint promotes code reuse and portability within libraries such as Basic Linear Algebra Subprograms and BLAS-Like Library Instantiation Software (BLIS) and allows kernel developers to focus their efforts on fewer and simpler kernels. We develop two alternative approaches—one based on the 3m method and one that reflects the classic 4m formulation—each with multiple variants, all of which rely only on real matrix multiplication kernels. We discuss the performance characteristics of these “induced” methods and observe that the assembly-level method actually resides along the 4m spectrum of algorithmic variants. Implementations are developed within the BLIS framework, and testing on modern hardware confirms that while the less numerically stable 3m method yields the fastest runtimes, the more stable (and thus widely applicable) 4m method’s performance is somewhat limited due to implementation challenges that appear inherent in nature.

ieee international conference on high performance computing data and analytics | 2016

Strassen's algorithm reloaded

Jianyu Huang; Tyler M. Smith; Greg Henry; Robert A. van de Geijn

We dispel with “street wisdom” regarding the practical implementation of Strassens algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for small matrices. Conventional wisdom: the matrices being multiplied should be relatively square. Our implementation is practical for rank-k updates, where k is relatively small (a shape of importance for libraries like LAPACK). Conventional wisdom: it inherently requires substantial workspace. Our implementation requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations. Conventional wisdom: a Strassen DGEMM interface must pass in workspace. Our implementation requires no such workspace and can be plug-compatible with the standard DGEMM interface. Conventional wisdom: it is hard to demonstrate speedup on multi-core architectures. Our implementation demonstrates speedup over conventional DGEMM even on an Intel® Xeon Phi™ coprocessor1 utilizing 240 threads. We show how a distributed memory matrix-matrix multiplication also benefits from these advances.

international parallel and distributed processing symposium | 2014