Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simon J. Pennycook is active.

Publication


Featured researches published by Simon J. Pennycook.


Proceedings of the Third International Workshop on Accelerator Programming Using Directives | 2016

A modern memory management system for OpenMP

Jason Sewall; Simon J. Pennycook; Alejandro Duran; Xinmin Tian; R. Narayanaswamy

Modern computers with multi-/many-core processors and accelerators feature a sophisticated and deep memory hierarchy, potentially including distinct main memory, high-bandwidth memory, texture memory and scratchpad memory. The performance characteristics of these memories are varied, and studies have demonstrated the importance of using them effectively.In this paper, we propose an extension of the OpenMP API to address the needs of programmers to efficiently optimize their applications to use new memory technologies in a platform-agnostic and portable fashion. Our proposal separately exposes the characteristics of memory resources (such as kind) and the characteristics of allocations (such as alignment), and is fully compatible with existing OpenMP constructs.


Future Generation Computer Systems | 2017

Implications of a metric for performance portability

Simon J. Pennycook; Jason Sewall; V.W. Lee

Abstract The term “performance portability” has been informally used in computing to refer to a variety of notions which generally include: (1) the ability to run one application across multiple hardware platforms; and (2) achieving some notional level of performance on these platforms. However, there has been a noticeable lack of consensus on the precise meaning of the term, and authors’ conclusions regarding their success (or failure) to achieve performance portability have thus far been subjective. Comparing one approach to performance portability with another has generally been marked with vague claims and verbose, qualitative explanation of the comparison. This article presents a concise definition for performance portability and an associated metric that accurately capture the performance and portability of an application across different platforms. Through retroactive application of this metric to previous research and a review of numerous programming languages, frameworks and libraries, we devise and suggest tractable approaches to code specialization which can aid the community in developing highly performance-portable applications with minimal impact to productivity.


international workshop on openmp | 2016

Workstealing and Nested Parallelism in SMP Systems

Larry Meadows; Simon J. Pennycook; Alex Duran; Terry Wilmarth; Jim Cownie

We present a workstealing scheduler and show its use in two separate areas: (1) to enable hierarchical parallelism and per-core load balancing in stencil codes, and (2) to reduce overhead in per-thread load balancing in particle codes.


High Performance Parallelism Pearls#R##N#Volume 2: Multicore and Many-core Programming Approaches | 2016

Cosmic Microwave Background Analysis: Nested Parallelism in Practice

James P. Briggs; Simon J. Pennycook; James R. Fergusson; Juha Jäykkä; E. P. S. Shellard

This chapter discusses the steps taken to optimize and modernize Modal, a cosmological statistical analysis code for studying the formation of the early universe developed by theoretical physicists at the University of Cambridge. In order to achieve higher levels of performance and to reduce the memory footprint, the optimization work included introducing nested parallelism. The chapter explored the different nested parallelism approaches available in OpenMP, discussing the strengths and weaknesses of each and their increasing relevance to current and future many-core microarchitectures.


ieee international conference on high performance computing, data, and analytics | 2017

IXPUG: Experiences on Intel Knights Landing at the One Year Mark

Estela Suarez; Michael Lysaght; Simon J. Pennycook; Richard A. Gerber

One year on since the launch of the 2nd generation Knights Landing (KNL) Intel Xeon Phi platform, a significant amount of application experience has been gathered by the user community. This provided IXPUG (the Intel Xeon Phi User Group) a timely opportunity to share insights on how to best exploit this new many-core processor, and in particular, on how to achieve high performance on current and upcoming large-scale KNL-based systems.


arXiv: Performance | 2016

A Metric for Performance Portability

Simon J. Pennycook; Jason Sewall; Victor W. Lee


Journal of Computational Physics | 2016

Separable projection integrals for higher-order correlators of the cosmic microwave sky

James P. Briggs; Simon J. Pennycook; James R. Fergusson; Juha Jäykkä; E. P. S. Shellard


arXiv: Cosmology and Nongalactic Astrophysics | 2018

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Amrita Mathuriya; Deborah Bard; Peter Mendygral; Lawrence Meadows; James Arnemann; Lei Shao; Siyu He; Tuomas Karna; Daina Moise; Simon J. Pennycook; Kristyn J. Maschhoff; Jason Sewall; Nalini Kumar; Shirley Ho; Michael F. Ringenburg; Prabhat; Victor W. Lee


arXiv: Performance | 2017

High-Performance Code Generation though Fusion and Vectorization.

Jason Sewall; Simon J. Pennycook


Archive | 2015

Cosmic Microwave Background Analysis

James P. Briggs; Simon J. Pennycook; James R. Fergusson; Juha Jäykkä; E. P. S. Shellard

Collaboration


Dive into the Simon J. Pennycook's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge