Roger A. Pearce | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roger A. Pearce is active.

Explore More

Publication

Featured researches published by Roger A. Pearce.

ieee international conference on high performance computing data and analytics | 2010

Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

Roger A. Pearce; Maya Gokhale; Nancy M. Amato

Processing large graphs is becoming increasingly important for many domains such as social networks, bioinformatics, etc. Unfortunately, many algorithms and implementations do not scale with increasing graph sizes. As a result, researchers have attempted to meet the growing data demands using parallel and external memory techniques. We present a novel asynchronous approach to compute Breadth-First-Search (BFS), Single-Source-Shortest-Paths, and Connected Components for large graphs in shared memory. Our highly parallel asynchronous approach hides data latency due to both poor locality and delays in the underlying graph data storage. We present an experimental study applying our technique to both In-Memory and Semi-External Memory graphs utilizing multi-core processors and solid-state memory devices. Our experiments using synthetic and real-world datasets show that our asynchronous approach is able to overcome data latencies and provide significant speedup over alternative approaches. For example, on billion vertex graphs our asynchronous BFS scales up to 14x on 16-cores.

IEEE Computer | 2008

Hardware Technologies for High-Performance Data-Intensive Computing

Maya Gokhale; Jonathan D. Cohen; Andy Yoo; William Marcus Miller; Arpith C. Jacob; Craig D. Ulmer; Roger A. Pearce

Data-intensive problems challenge conventional computing architectures with demanding CPU, memory, and I/O requirements. Experiments with three benchmarks suggest that emerging hardware technologies can significantly boost performance of a wide range of applications by increasing compute cycles and bandwidth and reducing latency.

international conference on robotics and automation | 2003

Extracting optimal paths from roadmaps for motion planning

Jinsuck Kim; Roger A. Pearce; Nancy M. Amato

We present methods for extracting optimal paths from motion planning roadmaps. Our system enables any combination of optimization criteria, such as collision detection, kinematic/dynamic constraints, or minimum clearance, and relaxed definitions of the goal state, to be used when selecting paths from roadmaps. Our algorithm is an augmented version of Dijkstras shortest path algorithm which allows edge weights to be defined relative to the current path. We present simulation results maximizing minimum path clearance, minimizing localization effort, and enforcing kinematic/dynamic constraints.

WAFR | 2008

RESAMPL: A Region-Sensitive Adaptive Motion Planner

Samuel Rodriguez; Shawna L. Thomas; Roger A. Pearce; Nancy M. Amato

Automatic motion planning has applications ranging from traditional robotics to computer-aided design to computational biology and chemistry. While randomized planners, such as probabilistic roadmap methods (prms) or rapidly-exploring random trees (rrt), have been highly successful in solving many high degree of freedom problems, there are still many scenarios in which we need better methods, e.g., problems involving narrow passages or which contain multiple regions that are best suited to different planners.

international parallel and distributed processing symposium | 2013

Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

Roger A. Pearce; Maya Gokhale; Nancy M. Amato

We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).

ieee international conference on high performance computing data and analytics | 2014

Faster parallel traversal of scale free graphs at extreme scale with vertex delegates

Roger A. Pearce; Maya Gokhale; Nancy M. Amato

At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%.

international parallel and distributed processing symposium | 2012

On the Role of NVRAM in Data-intensive Architectures: An Evaluation

Brian Van Essen; Roger A. Pearce; Sasha Ames; Maya Gokhale

Data-intensive applications are best suited to high-performance computing architectures that contain large quantities of main memory. Creating these systems with DRAM-based main memory remains costly and power-intensive. Due to improvements in density and cost, non-volatile random access memories (NVRAM) have emerged as compelling storage technologies to augment traditional DRAM. This work explores the potential of future NVRAM technologies to store program state at performance comparable to DRAM. We have developed the PerMA NVRAM simulator that allows us to explore applications with working sets ranging up to hundreds of gigabytes per node. The simulator is implemented as a Linux device driver that allows application execution at native speeds. Using the simulator we show the impact of future technology generations of I/O-bus-attached NVRAM on an unstructured-access, level-asynchronous, Breadth-First Search (BFS) graph traversal algorithm. Our simulations show that within a couple of technology generations, a system architecture with local high performance NVRAM will be able to effectively augment DRAM to support highly concurrent data-intensive applications with large memory footprints. However, improvements will be needed in the I/O stack to deliver this performance to applications. The simulator shows that future technology generations of NVRAM in conjunction with an improved I/O runtime will enable parallel data-intensive applications to offload in-memory data structures to NVRAM with minimal performance loss.

acm multimedia | 2014

The Placing Task: A Large-Scale Geo-Estimation Challenge for Social-Media Videos and Images

Jaeyoung Choi; Bart Thomee; Gerald Friedland; Liangliang Cao; Karl Ni; Damian Borth; Benjamin Elizalde; Luke R. Gottlieb; Carmen J. Carrano; Roger A. Pearce; Douglas N. Poland

The Placing Task is a yearly challenge offered by the MediaEval Multimedia Benchmarking Initiative that requires participants to develop algorithms that automatically predict the geo-location of social media videos and images. We introduce a recent development of a new standardized web-scale geo-tagged dataset for Placing Task 2014, which contains 5.5 million photos and 35,000 videos. This standardized benchmark with a large persistent dataset allows research community to easily evaluate new algorithms and to analyze their performance with respect to the state-of-the-art approaches. We discuss the characteristics of this years Placing Task along with the description of the new dataset components and how they were collected.

ieee international conference on high performance computing data and analytics | 2011

A scalable eigensolver for large scale-free graphs using 2D graph partitioning

Andy Yoo; Allison H. Baker; Roger A. Pearce; Van Emden Henson

Eigensolvers are important tools for analyzing and mining useful information from scale-free graphs. Such graphs are used in many applications and can be extremely large. Unfortunately, existing parallel eigensolvers do not scale well for these graphs due to the high communication overhead in the parallel matrix-vector multiplication (MatVec). We develop a MatVec algorithm based on 2D edge partitioning that significantly reduces the communication costs and embed it into a popular eigensolver library. We demonstrate that the enhanced eigensolver can attain two orders of magnitude performance improvement compared to the original on a state-of-art massively parallel machine. We illustrate the performance of the embedded MatVec by computing eigenvalues of a scale-free graph with 300 million vertices and 5 billion edges, the largest scale-free graph analyzed by any in-memory parallel eigensolver, to the best of our knowledge.

Cluster Computing | 2015

DI-MMAP--a scalable memory-map runtime for out-of-core data-intensive applications

Brian Van Essen; Henry Hsieh; Sasha Ames; Roger A. Pearce; Maya Gokhale

We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application’s address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latency-tolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite, a new bioinformatics metagenomic classification application, and on a level-asynchronous Breadth-First Search (BFS) graph traversal algorithm. Using DI-MMAP, the metagenomics classification application performs up to 4× better than standard Linux mmap. A fully external memory configuration of BFS executes up to 7.44× faster than traditional mmap. Finally, we demonstrate that DI-MMAP shows scalable out-of-core performance for BFS traversal in main memory constrained scenarios. Such scalable memory constrained performance would allow a system with a fixed amount of memory to solve a larger problem as well as provide memory QoS guarantees for systems running multiple data-intensive applications.

Explore More