Michael Voss
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Voss.
international parallel and distributed processing symposium | 2008
Arch D. Robison; Michael Voss; Alexey Kukanov
Intelreg Threading Building Blocks (Intelreg TBB) is a C++ library for parallel programming. Its templates for generic parallel loops are built upon nested parallelism and a work-stealing scheduler. This paper discusses optimizations where the high-level algorithm inspects or biases stealing. Two optimizations are discussed in detail. The first dynamically optimizes grain size based on observed stealing. The second improves prior work that exploits cache locality by biased stealing. This paper shows that in a task stealing environment, deferring task spawning can improve performance in some contexts. Performance results for simple kernels are presented.
international workshop on openmp | 2001
Jay Hoeflinger; Bob Kuhn; Wolfgang E. Nagel; Paul M. Petersen; Hrabri Rajic; Sanjiv Shah; Jeffrey S. Vetter; Michael Voss; Renee Woo
As cluster computing has grown, so has its use for large scientific calculations. Recently, many researchers have experimented with using MPI between nodes of a clustered machine and OpenMP within a node, to manage the use of parallel processing. Unfortunately, very few tools are available for doing an integrated analysis of an MPI/OpenMP program. KAI Software, Pallas GmbH and the US Department of Energy have partnered together to build such a tool, VGV. VGV is designed for doing scalable performance analysis - that is, to make the performance analysis process qualitatively the same for small cluster machines as it is for the largest ASCI systems. This paper describes VGV and gives a flavor of how to find performance problems using it.
international symposium on performance analysis of systems and software | 2010
Alexei Alexandrov; Douglas R. Armstrong; Hrabri Rajic; Michael Voss; Donald Hayes
Performing modeling and visualization of task-based parallel algorithms is challenging. Libraries such as Intel Threading Building Blocks (TBB) and Microsofts Parallel Patterns Library provide high-level algorithms that are implemented using low-level tasks. Current tools present performance at this lower level. Developers like to tune and debug at the same level as the coding abstraction, so in this paper we propose tools and a two step methodology that target this level of abstraction. In the first step, the system level metrics of utilization and overhead are collected to determine if performance is acceptable. If a problem is suspected, the second step of our methodology projects these metrics on to the algorithms contained in the application. Using these projections many common performance issues can be quickly diagnosed. We demonstrate our methodology using a prototype implementation that is integrated with the Intel Threading Building Blocks library. We show the flexibility of the approach by analyzing three applications, including a client-server benchmark that uses a parallel_for nested within a parallel pipeline.
international workshop on openmp | 2018
Vishakha Agrawal; Michael Voss; Pablo Reble; Vasanth R. Tovinkere; Jeff J. Hammond; Michael Klemm
With the introduction of task dependences, the OpenMP API considerably extended the expressiveness of its task-based parallel programming model. With task dependences, programmers no longer have to rely on global synchronization mechanisms like task barriers. Instead they can locally synchronize a restricted subset of generated tasks by expressing an execution order through the depend clause. With the OpenMP tools interface of Technical Report 6 of the OpenMP API specification, it becomes possible to monitor task creation and execution along with the corresponding dependence information of these tasks. We use this information to construct a Task Dependence Graph (TDG) for the Flow Graph Analyzer (FGA) tool of Intel® Advisor. The TDG representation is used in FGA for deriving metrics and performance prediction and analysis of task-based OpenMP codes. We apply the FGA tool to two sample application kernels and expose issues in their usage of OpenMP tasks.
Archive | 2001
Matthias S. Mueller; Barbara M. Chapman; Bronis R. de Supinski; Allen D. Malony; Michael Voss
Archive | 2009
Wooyoung Kim; Michael Voss
MS | 2011
Wooyoung Kim; Michael Voss
Archive | 2017
James Reinders; Michael Voss; Pablo Reble; Rafael Asenjo-Plaza
Archive | 2014
Michael Voss; Vasanth R. Tovinkere; Jaime Arteaga; Sergey Vinogradov
international symposium on performance analysis of systems and software | 2010
Alexei Alexandrov; Douglas L. Armstrong; Hrabri Rajic; Michael Voss; Donald Hayes