James C. Sexton | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James C. Sexton is active.

Explore More

Publication

Featured researches published by James C. Sexton.

Physical Review Letters | 1995

Numerical evidence for the observation of a scalar glueball.

James C. Sexton; A. Vaccarino; D. Weingarten

We compute from lattice QCD in the valence (quenched) approximation the partial decay widths of the lightest scalar glueball to pairs of pseudoscalar quark-antiquark states. These predictions and values obtained earlier for the scalar glueball{close_quote}s mass are in good agreement with the observed properties of {ital f}{sub {ital J}}(1710) and inconsistent with all other observed meson resonances. {copyright} {ital 1995 The American Physical Society.}

Ibm Journal of Research and Development | 2005

Optimizing task layout on the Blue Gene/L supercomputer

Gyan Bhanot; Alan Gara; Philip Heidelberger; Eoin M. Lawless; James C. Sexton; Robert Walkup

A general method for optimizing problem layout on the Blue Gene®/L (BG/L) supercomputer is described. The method takes as input the communication matrix of an arbitrary problem as an array with entries C(i, j), which represents the data communicated from domain i to domain j. Given C(i, j), we implement a heuristic map that attempts to sequentially map a domain and its communication neighbors either to the same BG/L node or to near-neighbor nodes on the BG/L torus, while keeping the number of domains mapped to a BG/L node constant. We then generate a Markov chain of maps using Monte Carlo simulation with free energy F =Σi,j C(i, j)H(i, j), where H(i, j) is the smallest number of hops on the BG/L torus between domain i and domain j. For two large parallel applications, SAGE and UMT2000, the method was tested against the default Message Passing Interface rank order layout on up to 2,048 BG/L nodes. It produced maps that improved communication efficiency by up to 45%.

conference on high performance computing (supercomputing) | 2006

Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform

Francois Gygi; Erik W. Draeger; Martin Schulz; Bronis R. de Supinski; John A. Gunnels; Vernon Austel; James C. Sexton; Franz Franchetti; Stefan Kral; Christoph W. Ueberhuber; Juergen Lorenz

First-principles simulations of high-Z metallic systems using the Qbox code on the BlueGene/L supercomputer demonstrate unprecedented performance and scaling for a quantum simulation code. Specifically designed to take advantage of massively-parallel systems like BlueGene/L, Qbox demonstrates excellent parallel efficiency and peak performance. A sustained peak performance of 207.3 TFlop/s was measured on 65,536 nodes, corresponding to 56.5% of the theoretical full machine peak using all 128k CPUs.

conference on high performance computing (supercomputing) | 2005

Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code

Francois Gygi; Robert Kim Yates; Juergen Lorenz; Erik W. Draeger; Franz Franchetti; Christoph W. Ueberhuber; Bronis R. de Supinski; Stefan Kral; John A. Gunnels; James C. Sexton

We demonstrate that the Qbox code supports unprecedented large-scale First-Principles Molecular Dynamics (FPMD) applications on the BlueGene/L supercomputer. Qbox is an FPMD implementation specifically designed for large-scale parallel platforms such as BlueGene/L. Strong scaling tests for a Materials Science application show an 86% scaling efficiency between 1024 and 32,768 CPUs. Measurements of performance by means of hardware counters show that 36% of the peak FPU performance can be attained.

Physical Review Letters | 1993

Hadron mass predictions of the valence approximation to lattice QCD

F. Butler; H. Chen; James C. Sexton; A. Vaccarino; Don Weingarten

We evaluate the infinite-volume, continuum limits of eight hadron mass ratios predicted by lattice QCD with Wilson quarks in the valence (quenched) approximation. Each predicted ratio differs from the corresponding observed value by less than 6%.

Presented at: SciDAC 2006, Denver, CO, United States, Jun 25 - Jun 29, 2006 | 2006

Simulating solidification in metals at high pressure: The drive to petascale computing

Frederick H. Streitz; James N. Glosli; Mehul Patel; Bor Chan; Robert Kim Yates; Bronis R. de Supinski; James C. Sexton; John A. Gunnels

We investigate solidification in metal systems ranging in size from 64,000 to 524,288,000 atoms on the IBM BlueGene/L computer at LLNL. Using the newly developed ddcMD code, we achieve performance rates as high as 103 TFlops, with a performance of 101.7 TFlop sustained over a 7 hour run on 131,072 cpus. We demonstrate superb strong and weak scaling. Our calculations are significant as they represent the first atomic-scale model of metal solidification to proceed, without finite size effects, from spontaneous nucleation and growth of solid out of the liquid, through the coalescence phase, and into the onset of coarsening. Thus, our simulations represent the first step towards an atomistic model of nucleation and growth that can directly link atomistic to mesoscopic length scales.

international symposium on performance analysis of systems and software | 2008

Next-Generation Performance Counters: Towards Monitoring Over Thousand Concurrent Events

Valentina Salapura; Karthik Ganesan; Alan Gara; Michael Karl Gschwind; James C. Sexton; Robert Walkup

We present a novel performance monitor architecture, implemented in the Blue Gene/PTM supercomputer. This performance monitor supports the tracking of a large number of concurrent events by using a hybrid counter architecture. The counters have their low order data implemented in registers which are concurrently updated, while the high order counter data is maintained in a dense SRAM array that is updated from the registers on a regular basis. The per formance monitoring architecture includes support for per- event thresholding and fast event notification, using a two- phase interrupt-arming and triggering protocol. A first implementation provides 256 concurrent 64b counters which offers an up to 64x increase in counter number compared to performance monitors typically found in microprocessors today, and thereby dramatically expands the capabilities of counter-based performance tuning.

parallel computing | 2006

Minimal data copy for dense linear algebra factorization

Fred G. Gustavson; John A. Gunnels; James C. Sexton

The full format data structures of Dense Linear Algebra hurt the performance of its factorization algorithms. Full format rectangular matrices are the input and output of level the 3 BLAS. It follows that the LAPACK and Level 3 BLAS approach has a basic performance flaw. We describe a new result that shows that representing a matrix A as a collection of square blocks will reduce the amount of data reformating required by dense linear algebra factorization algorithms from O(n3) to O(n2). On an IBM Power3 processor our implementation of Cholesky factorization achieves 92% of peak performance whereas conventional full format LAPACK DPOTRF achieves 77% of peak performance. All programming for our new data structures may be accomplished in standard Fortran, through the use of higher dimensional full format arrays. Thus, new compiler support may not be necessary. We also discuss the role of concatenating submatrices to facilitate hardware streaming. Finally, we discuss a new concept which we call the L1 / L0 cache interface.

conference on high performance computing (supercomputing) | 2006

The BlueGene/L supercomputer and quantum ChromoDynamics

Pavlos M. Vranas; Gyan Bhanot; Matthias A. Blumrich; Dong Chen; Alan Gara; Philip Heidelberger; Valentina Salapura; James C. Sexton

We describe our methods for performing quantum chromodynamics (QCD) simulations that sustain up to 20% of the peak performance on BlueGene supercomputers. We present our methods, scaling properties, and first cutting edge results relevant to QCD. We show how this enables unprecedented computational scale that brings lattice QCD to the next generation of calculations. We present our QCD simulation that achieved 12.2 Teraflops sustained performance with perfect speedup to 32K CPU cores. Among other things, these calculations are critical for cosmology, for the heavy ion experiments at RHIC-BNL, and for the upcoming experiments at CERN-Geneva. Furthermore, we demonstrate how QCD dramatically exposes memory and network latencies inherent in any computer system and propose that QCD should be used as a new, powerful HPC benchmark. Our sustained performance demonstrates the excellent properties of the BlueGene/L system.

Archive | 2014

Programming Abstractions for Data Locality

Adrian Tate; Amir Kamil; Anshu Dubey; Armin Groblinger; Brad Chamberlain; Brice Goglin; Harold C. Edwards; Chris J. Newburn; David Padua; Didem Unat; Emmanuel Jeannot; Frank Hannig; Gysi Tobias; Hatem Ltaief; James C. Sexton; Jesús Labarta; John Shalf; Karl Fuerlinger; Kathryn O'Brien; Leonidas Linardakis; Maciej Besta; Marie-Christine Sawley; Mark James Abraham; Mauro Bianco; Miquel Pericàs; Naoya Maruyama; Paul H. J. Kelly; Peter Messmer; Robert B. Ross; Romain Ciedat

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal.

Explore More