Chad Scherrer
Pacific Northwest National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chad Scherrer.
computing frontiers | 2007
Jarek Nieplocha; Andres Marquez; John Feo; Daniel G. Chavarría-Miranda; George Chin; Chad Scherrer; Nathaniel Beagley
The resurgence of current and upcoming multithreaded architectures and programming models led us to conduct a detailed study to understand the potential of these platforms to increase the performance of data-intensive, irregular scientific applications. Our study is based on a power system state estimation application and a novel anomaly detection application applied to network traffic data. We also conducted a detailed evaluation of the platforms using microbenchmarks in order to gain insight into their architectural capabilities and their interaction with programming models and application software. The evaluation was performed on the Cray MTA-2 and the Sun Niagar.
international parallel and distributed processing symposium | 2008
Daniel G. Chavarría-Miranda; Andres Marquez; Jaroslaw Nieplocha; Kristyn J. Maschhoff; Chad Scherrer
This paper describes our early experiences with a pre- production Cray XMT system that implements a scalable shared memory architecture with hardware support for multithreading. Unlike its predecessor, the Cray MTA-2 that had very limited I/O capability, the Cray XMT offers Lustre, a scalable high-performance parallel filesystem. Therefore it enables development of out-of-core applications that can deal with very large data sets that otherwise would not fit in the system main memory. Our application performs statistically-based anomaly detection for categorical data that can be used for analysis of Internet traffic data. Experimental results indicate that the preproduction version of the machine is able to achieve good performance and scalability for the in- and out-of-core versions of the application.
ieee international symposium on parallel distributed processing workshops and phd forum | 2010
Eric Goodman; David J. Haglin; Chad Scherrer; Daniel G. Chavarría-Miranda; Jace A. Mogill; John Feo
Two of the most commonly used hashing strategies-linear probing and hashing with chaining-are adapted for efficient execution on a Cray XMT. These strategies are designed to minimize memory contention. Datasets that follow a power law distribution cause significant performance challenges to shared memory parallel hashing implementations. Experimental results show good scalability up to 128 processors on two power law datasets with different data types: integer and string. These implementations can be used in a wide range of applications.
visualization for computer security | 2008
William A. Pike; Chad Scherrer; Sean J. Zabriskie
To effectively identify and respond to cyber threats, computer security analysts must understand the scale, motivation, methods, source, and target of an attack. Central to developing this situational awareness is the analyst’s world knowledge that puts these attributes in context. What known exploits or new vulnerabilities might an anomalous traffic pattern suggest? What organizational, social, or geopolitical events help forecast or explain attacks and anomalies? Few visualization tools support creating, maintaining, and applying this knowledge of the threat landscape. Through a series of formative workshops with practicing security analysts, we have developed a visualization approach inspired by the human process of contextualization; this system, called NUANCE, creates evolving behavioral models of network actors at organizational and regional levels, continuously monitors external textual information sources for themes that indicate security threats, and automatically determines if behavior indicative of those threats is present on a network.
international parallel and distributed processing symposium | 2007
Chad Scherrer; Nathaniel Beagley; Jarek Nieplocha; Andres Marquez; John Feo; Daniel G. Chavarría-Miranda
The problem of counting specified combinations of a given set of variables arises in many statistical and data mining applications. To solve this problem, we introduce the PDtree data structure, which avoids exponential time and space complexity associated with prior work by allowing user specification of the tree structure. A straightforward parallelization approach using a Cray MTA-2 provides a speedup that is linear in the number of processors, but introduces nondeterminism into probability estimates. We prove a general convergence result that bounds the non-deterministic deviation of probability estimates relative to a sequential implementation. Beyond PDtrees, this convergence result applies to any counting application that takes a multithreaded streaming approach.
international conference on e-science | 2009
Nathaniel Beagley; Chad Scherrer; Yan Shi; Brian H. Clowers; William F. Danielson; Anuj R. Shah
The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.
Advances in Computers | 2010
Anuj R. Shah; Joshua N. Adkins; Douglas J. Baxter; William R. Cannon; Daniel G. Chavarría-Miranda; Sutanay Choudhury; Ian Gorton; Deborah K. Gracio; Todd D. Halter; Navdeep Jaitly; John R. Johnson; Richard T. Kouzes; Matthew C. Macduff; Andres Marquez; Matthew E. Monroe; Christopher S. Oehmen; William A. Pike; Chad Scherrer; Oreste Villa; Bobbie-Jo M. Webb-Robertson; Paul D. Whitney; Nino Zuljevic
Abstract The total quantity of digital information in the world is growing at an alarming rate. Scientists and engineers are contributing heavily to this data “tsunami” by gathering data using computing and instrumentation at incredible rates. As data volumes and complexity grow, it is increasingly arduous to extract valuable information from the data and derive knowledge from that data. Addressing these demands of ever-growing data volumes and complexity requires game-changing advances in software, hardware, and algorithms. Solution technologies also must scale to handle the increased data collection and processing rates and simultaneously accelerate timely and effective analysis results. This need for ever faster data processing and manipulation as well as algorithms that scale to high-volume data sets have given birth to a new paradigm or discipline known as “data-intensive computing.” In this chapter, we define data-intensive computing, identify the challenges of massive data, outline solutions for hardware, software, and analytics, and discuss a number of applications in the areas of biology, cyber security, and atmospheric research.
international parallel and distributed processing symposium | 2009
Chad Scherrer; Timothy R. Shippert; Andres Marquez
The Cray XMT provides hardware support for parallel algorithms that would be communication- or memory-bound on other machines. Unfortunately, even if an algorithm meets these criteria, performance suffers if the algorithm is too numerically intensive. We present a lookup-based approach that achieves a significant performance advantage over explicit calculation. We describe an approach to balancing memory bandwidth against on-chip floating point capabilities, leading to further speedup. Finally, we provide table lookup algorithms for a number of common functions.
neural information processing systems | 2012
Chad Scherrer; Ambuj Tewari; Mahantesh Halappanavar; David J. Haglin
international conference on machine learning | 2012
Chad Scherrer; Mahantesh Halappanavar; Ambuj Tewari; David J. Haglin