Tyler M. Smith
University of Texas at Austin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tyler M. Smith.
ACM Transactions on Mathematical Software | 2016
Field G. Van Zee; Tyler M. Smith; Bryan Marker; Tze Meng Low; Robert A. van de Geijn; Francisco D. Igual; Mikhail Smelyanskiy; Xianyi Zhang; Michael Kistler; Vernon Austel; John A. Gunnels; Lee Killough
BLIS is a new software framework for instantiating high-performance BLAS-like dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level-3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework include state-of-the-art general-purpose, low-power, and many-core architectures. We show, with very little effort, how the BLIS framework yields sequential and parallel implementations that are competitive with the performance of ATLAS, OpenBLAS (an effort to maintain and extend the GotoBLAS), and commercial vendor implementations such as AMD’s ACML, IBM’s ESSL, and Intel’s MKL libraries. Although most of this article focuses on single-core implementation, we also provide compelling results that suggest the framework’s leverage extends to the multithreaded domain.
ACM Transactions on Mathematical Software | 2016
Tze Meng Low; Francisco D. Igual; Tyler M. Smith; Enrique S. Quintana-Ortí
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation, allows one to analytically determine tuning parameters for high-end instantiations of the matrix-matrix multiplication. This is of both practical and scientific importance, as it greatly reduces the development effort required for the implementation of the level-3 BLAS while also advancing our understanding of how hierarchically layered memories interact with high-performance software. This allows the community to move on from valuable engineering solutions (empirically autotuning) to scientific understanding (analytical insight).
Developmental Biology | 2016
Antonio Gonzalez; Matthew A. Brown; Greg Hatlestad; Neda Akhavan; Tyler M. Smith; Austin Hembd; Joshua Moore; David Montes; Trenell Mosley; Juan Resendez; Huy Nguyen; Lyndsey Wilson; Annabelle Campbell; Duncan Sudarshan; Alan Lloyd
The brown color of Arabidopsis seeds is caused by the deposition of proanthocyanidins (PAs or condensed tannins) in their inner testa layer. A transcription factor complex consisting of TT2, TT8 and TTG1 controls expression of PA biosynthetic genes, just as similar TTG1-dependent complexes have been shown to control flavonoid pigment pathway gene expression in general. However, PA synthesis is controlled by at least one other gene. TTG2 mutants lack the pigmentation found in wild-type seeds, but produce other flavonoid compounds, such as anthocyanins in the shoot, suggesting that TTG2 regulates genes in the PA biosynthetic branch of the flavonoid pathway. We analyzed the expression of PA biosynthetic genes within the developing seeds of ttg2-1 and wild-type plants for potential TTG2 regulatory targets. We found that expression of TT12, encoding a MATE type transporter, is dependent on TTG2 and that TTG2 can bind to the upstream regulatory region of TT12 suggesting that TTG2 directly regulates TT12. Ectopic expression of TT12 in ttg2-1 plants partially restores seed coat pigmentation. Moreover, we show that TTG2 regulation of TT12 is dependent on TTG1 and that TTG1 and TTG2 physically interact. The observation that TTG1 interacts with TTG2, a WRKY type transcription factor, proposes the existence of a novel TTG1-containing complex, and an addendum to the existing paradigm of flavonoid pathway regulation.
ACM Transactions on Mathematical Software | 2017
Field G. Van Zee; Tyler M. Smith
In this article, we explore the implementation of complex matrix multiplication. We begin by briefly identifying various challenges associated with the conventional approach, which calls for a carefully written kernel that implements complex arithmetic at the lowest possible level (i.e., assembly language). We then set out to develop a method of complex matrix multiplication that avoids the need for complex kernels altogether. This constraint promotes code reuse and portability within libraries such as Basic Linear Algebra Subprograms and BLAS-Like Library Instantiation Software (BLIS) and allows kernel developers to focus their efforts on fewer and simpler kernels. We develop two alternative approaches—one based on the 3m method and one that reflects the classic 4m formulation—each with multiple variants, all of which rely only on real matrix multiplication kernels. We discuss the performance characteristics of these “induced” methods and observe that the assembly-level method actually resides along the 4m spectrum of algorithmic variants. Implementations are developed within the BLIS framework, and testing on modern hardware confirms that while the less numerically stable 3m method yields the fastest runtimes, the more stable (and thus widely applicable) 4m method’s performance is somewhat limited due to implementation challenges that appear inherent in nature.
ieee international conference on high performance computing data and analytics | 2016
Jianyu Huang; Tyler M. Smith; Greg Henry; Robert A. van de Geijn
We dispel with “street wisdom” regarding the practical implementation of Strassens algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for small matrices. Conventional wisdom: the matrices being multiplied should be relatively square. Our implementation is practical for rank-k updates, where k is relatively small (a shape of importance for libraries like LAPACK). Conventional wisdom: it inherently requires substantial workspace. Our implementation requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations. Conventional wisdom: a Strassen DGEMM interface must pass in workspace. Our implementation requires no such workspace and can be plug-compatible with the standard DGEMM interface. Conventional wisdom: it is hard to demonstrate speedup on multi-core architectures. Our implementation demonstrates speedup over conventional DGEMM even on an Intel® Xeon Phi™ coprocessor1 utilizing 240 threads. We show how a distributed memory matrix-matrix multiplication also benefits from these advances.
international parallel and distributed processing symposium | 2014
Tyler M. Smith; Robert A. van de Geijn; Mikhail Smelyanskiy; Jeff R. Hammond; Field G. Van Zee
arXiv: Mathematical Software | 2016
Richard Veras; Tze Meng Low; Tyler M. Smith; Robert A. van de Geijn; Franz Franchetti
Archive | 2014
Tyler M. Smith; Robert A. van de Geijn; Mikhail Smelyanskiy; Jeff R. Hammond; Field G. Van Zee
arXiv: Computational Complexity | 2017
Tyler M. Smith; Robert A. van de Geijn
arXiv: Mathematical Software | 2016
Jianyu Huang; Tyler M. Smith; Greg Henry; Robert A. van de Geijn