Michael M. Wolf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael M. Wolf is active.

Explore More

Publication

Featured researches published by Michael M. Wolf.

ieee international conference on high performance computing data and analytics | 2010

Factors impacting performance of multithreaded sparse triangular solve

Michael M. Wolf; Michael A. Heroux; Erik G. Boman

As computational science applications grow more parallel with multi-core supercomputers having hundreds of thousands of computational cores, it will become increasingly difficult for solvers to scale. Our approach is to use hybrid MPI/threaded numerical algorithms to solve these systems in order to reduce the number of MPI tasks and increase the parallel efficiency of the algorithm. However, we need efficient threaded numerical kernels to run on the multi-core nodes in order to achieve good parallel efficiency. In this paper, we focus on improving the performance of a multithreaded triangular solver, an important kernel for preconditioning. We analyze three factors that affect the parallel performance of this threaded kernel and obtain good scalability on the multi-core nodes for a range of matrix sizes.

ieee high performance extreme computing conference | 2015

A task-based linear algebra Building Blocks approach for scalable graph analytics

Michael M. Wolf; Jonathan W. Berry; Dylan T. Stark

It is challenging to obtain scalable HPC performance on real applications, especially for data science applications with irregular memory access and computation patterns. To drive co-design efforts in architecture, system, and application design, we are developing miniapps representative of data science workloads. These in turn stress the state of the art in Graph BLAS-like Graph Algorithm Building Blocks (GABB). In this work, we outline a Graph BLAS-like, linear algebra based approach to miniTri, one such miniapp. We describe a task-based prototype implementation and give initial scalability results.

Archive | 2004

X-BAND LINEAR COLLIDER R&D IN ACCELERATING STRUCTURES THROUGH ADVANCED COMPUTING ∗

Zenghai Li; Nathan Folwell; Lixin Ge; Adam Guetz; V. Ivanov; Marc Kowalski; Cho-Kuen Ng; Greg Schussman; Ravindra Uplenchwar; Michael M. Wolf; Kwok Ko

This paper describes a major computational effort that addresses key design issues in the high gradient accelerating structures for the proposed X-band linear collider, GLC/NLC. Supported by the US DOE’s Accelerator Simulation Project, SLAC is developing a suite of parallel electromagnetic codes based on unstructured grids for modeling RF structures with higher accuracy and on a scale previously not possible. The new simulation tools have played an important role in the R&D of X-Band accelerating structures, in cell design, wakefield analysis and dark current studies.

Journal of Physics: Conference Series | 2009

Advances in parallel partitioning, load balancing and matrix ordering for scientific computing

Erik G. Boman; Céedric Chevalier; Karen Dragon Devine; Ilya Safro; Michael M. Wolf

We summarize recent advances in partitioning, load balancing, and matrix ordering for scientific computing performed by members of the CSCAPES SciDAC institute.

ieee high performance extreme computing conference | 2017

Fast linear algebra-based triangle counting with KokkosKernels

Michael M. Wolf; Mehmet Deveci; Jonathan W. Berry; Simon D. Hammond; Sivasankaran Rajamanickam

Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our miniTri data analytics miniapplication [1] and our efforts to pose graph algorithms in the language of linear algebra. We leverage KokkosKernels to implement this approach efficiently on multicore architectures. Our performance results are competitive with the fastest known graph traversal-based approaches and are significantly faster than the Graph Challenge reference implementations, up to 670,000 times faster than the C++ reference and 10,000 times faster than the Python reference on a single Intel Haswell node.

ieee high performance extreme computing conference | 2013

A nested dissection partitioning method for parallel sparse matrix-vector multiplication

Erik G. Boman; Michael M. Wolf

We consider how to map sparse matrices across processes to reduce communication costs in parallel sparse matrix-vector multiplication, an ubiquitous kernel in high performance computing. Our main contributions are: (i) an exact graph model for communication with general (two-dimensional) matrix distribution, and (ii) a recursive partitioning algorithm based on nested dissection that approximately solves this model. We have implemented our algorithm using hypergraph partitioning software to enable a fair comparison with existing methods. We present partitioning results for sparse structurally symmetric matrices from several application areas. Our new method is competitive with the best 2D algorithm (fine-grain hypergraph model) in terms of communication volume, but requires fewer messages. The nested dissection method is almost as fast to compute as 1D methods and the communication volume is significantly reduced (up to 97%) compared to 1D layout. Further improvements in quality may be possible by small modifications to existing nested dissection ordering software.

ieee high performance extreme computing conference | 2016

Advantages to modeling relational data using hypergraphs versus graphs

Michael M. Wolf; Alicia M. Klinvex; Daniel M. Dunlavy

Driven by the importance of relational aspects of data to decision-making, graph algorithms have been developed, based on simplified pairwise relationships, to solve a variety of problems. However, evidence has shown that hypergraphs-generalizations of graphs with (hyper)edges that connect any number of vertices-can better model complex, non-pairwise relationships in data and lead to better informed decisions. In this work, we compare graph and hypergraph models in the context of spectral clustering. For these problems, we demonstrate that hypergraphs are computationally more efficient and can better model complex, non-pairwise relationships for many datasets.

ieee high performance extreme computing conference | 2016

Kokkos/Qthreads task-parallel approach to linear algebra based graph analytics

Michael M. Wolf; H. Carter Edwards; Stephen L. Olivier

The Graph BLAS effort to standardize a set of graph algorithms building blocks in terms of linear algebra primitives promises to deliver high performing graph algorithms and greatly impact the analysis of big data. However, there are challenges with this approach, which our data analytics miniapp miniTri exposes. In this paper, we improve upon a previously proposed task-parallel approach to linear algebra-based miniTri formulation, addressing these challenges and describing a Kokkos/Qthreads task-parallel implementation that performs as well or slightly better than the highly optimized, baseline OpenMP data-parallel implementation.

arXiv: Distributed, Parallel, and Cluster Computing | 2018

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Mehmet Deveci; Simon D. Hammond; Michael M. Wolf; Sivasankaran Rajamanickam

Architectures with multiple classes of memory media are becoming a common part of mainstream supercomputer deployments. So called multi-level memories offer differing characteristics for each memory component including variation in bandwidth, latency and capacity. This paper investigates the performance of sparse matrix multiplication kernels on two leading high-performance computing architectures -- Intels Knights Landing processor and NVIDIAs Pascal GPU. We describe a data placement method and a chunking-based algorithm for our kernels that exploits the existence of the multiple memory spaces in each hardware platform. We evaluate the performance of these methods w.r.t. standard algorithms using the auto-caching mechanisms. Our results show that standard algorithms that exploit cache reuse performed as well as multi-memory-aware algorithms for architectures such as KNLs where the memory subsystems have similar latencies. However, for architectures such as GPUs where memory subsystems differ significantly in both bandwidth and latency, multi-memory-aware methods are crucial for good performance. In addition, our new approaches permit the user to run problems that require larger capacities than the fastest memory of each compute node without depending on the software-managed cache mechanisms.

ieee high performance extreme computing conference | 2015

Improving the performance of graph analysis through partitioning with sampling

Michael M. Wolf; Benjamin A. Millery

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale graphs is useful for dimensionality reduction of these large, noisy data sets into a more tractable analysis problem. When performing this sort of analysis across many parallel processes, the data partitioning scheme may have a significant impact on the overall running time. Previous work demonstrated that partitioning based on a sampled subset of edges still yields a substantial improvement in running time. In this work, we study this further, exploring how different sampling strategies, graph community structure, and the vertex degree distribution affect the partitioning quality. We show that sampling is an effective technique when partitioning for data analytics problems with community-like structure.

Explore More