Christopher Lamb | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christopher Lamb is active.

Explore More

Publication

Featured researches published by Christopher Lamb.

international conference on parallel processing | 2011

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs

Allen D. Malony; Scott Biersdorff; Sameer Shende; Heike Jagode; Stanimire Tomov; Guido Juckeland; Robert Dietrich; Duncan Poole; Christopher Lamb

The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GPU measurement approaches are discussed to set the stage for reporting new capabilities for heterogeneous parallel performance measurement in three leading HPC tools: PAPI, Vampir, and the TAU Performance System. Our work leverages the new CUPTI tool support in NVIDIAs CUDA device library. Heterogeneous benchmarks from the SHOC suite are used to demonstrate the measurement methods and tool support.

high performance interconnects | 2015

UCX: An Open Source Framework for HPC Network APIs and Beyond

Pavel Shamis; Manjunath Gorentla Venkata; M. Graham Lopez; Matthew B. Baker; Oscar R. Hernandez; Yossi Itigin; Mike Dubman; Gilad Shainer; Richard L. Graham; Liran Liss; Yiftah Shahar; Sreeram Potluri; Davide Rossetti; Donald Becker; Duncan Poole; Christopher Lamb; Sameer Kumar; Craig B. Stunkel; George Bosilca; Aurelien Bouteiller

This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality to suit a wide variety of application domains and hardware. We envision these APIs to satisfy the networking needs of many programming models such as Message Passing Interface (MPI), OpenSHMEM, Partitioned Global Address Space (PGAS) languages, task-based paradigms and I/O bound applications. To evaluate the design we implement the APIs and protocols, and measure the performance of overhead-critical network primitives fundamental for implementing many parallel programming models and system libraries. Our results show that the latency, bandwidth, and message rate achieved by the portable UCX prototype is very close to that of the underlying driver. With UCX, we achieved a message exchange latency of 0.89 us, a bandwidth of 6138.5 MB/s, and a message rate of 14 million messages per second. As far as we know, this is the highest bandwidth and message rate achieved by any network stack (publicly known) on this hardware.

Archive | 2007

Message queuing system for parallel integrated circuit architecture and related method of operation

Monier Maher; Jean Pierre Bordes; Christopher Lamb; Sanjay J. Patel

Archive | 2017

Software-Assisted Instruction Level Execution Preemption

Philip Alexander Cuadra; Christopher Lamb; Lacky V. Shah

Archive | 2007

External memory accessing DMA request scheduling in IC of parallel processing engines according to completion notification queue occupancy level

Monier Maher; Jean Pierre Bordes; Christopher Lamb; Sanjay J. Patel

Archive | 2011

Instruction level execution preemption

Lacky V. Shah; Gregory Scott Palmer; Gernot Schaufler; Samuel H. Duncan; Philip Browning Johnson; Shirish Gadre; Robert Ohannessian; Nicholas Wang; Christopher Lamb; Philip Alexander Cuadra; Timothy John Purcell

Archive | 2008