Is this you? Create Your Porfile

E. Wes Bethel

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where E. Wes Bethel is active.

Explore More

Publication

Featured researches published by E. Wes Bethel.

conference on high performance computing (supercomputing) | 2000

Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization

E. Wes Bethel; Brian Tierney; Jason Lee; Dan Gunter; Stephen Lau

Visapult is a prototype application and framework for remote visualization of large scientific datasets. We approach the technical challenges of tera-scale visualization with a unique architecture that employs high speed WANs and network data caches for data staging and transmission. This architecture allows for the use of available cache and compute resources at arbitrary locations on the network. High data throughput rates and network utilization are achieved by parallelizing I/O at each stage in the application, and by pipelining the visualization process. On the desktop, the graphics interactivity is effectively decoupled from the latency inherent in network applications. We present a detailed performance analysis of the application, and improvements resulting from field-test analysis conducted as part of the DOE Combustion Corridor project.

IEEE Computer Graphics and Applications | 2010

Extreme Scaling of Production Visualization Software on Diverse Architectures

Hank Childs; David Pugmire; Sean Ahern; Brad Whitlock; Mark Howison; Prabhat; Gunther H. Weber; E. Wes Bethel

This article presents the results of experiments studying how the pure-parallelism paradigm scales to massive data sets, including 16,000 or more cores on trillion-cell meshes, the largest data sets published to date in the visualization literature. The findings on scaling characteristics and bottlenecks contribute to understanding how pure parallelism will perform in the future.

ieee international conference on high performance computing data and analytics | 2011

Parallel index and query for large scale data analysis

Jerry Chi-Yuan Chou; Mark Howison; Brian Austin; Kesheng Wu; Ji Qiang; E. Wes Bethel; Arie Shoshani; Oliver Rübel; Prabhat; Robert D. Ryne

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for processing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the- art index and query technology (FastBit) and is designed to process massive datasets on modern supercomputing plat- forms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for interesting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

Analytical Chemistry | 2013

OpenMSI: A High-Performance Web-Based Platform for Mass Spectrometry Imaging

Oliver Rübel; Annette M. Greiner; Shreyas Cholia; Katherine Louie; E. Wes Bethel; Trent R. Northen; Benjamin P. Bowen

Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicated file formats and software packages, it is a widely held viewpoint that the biggest challenge is simply opening, sharing, and analyzing a file without loss of information. Here we present OpenMSI, a software framework and platform that addresses these challenges via an advanced, high-performance, extensible file format and Web API for remote data access (http://openmsi.nersc.gov). The OpenMSI file format supports storage of raw MSI data, metadata, and derived analyses in a single, self-describing format based on HDF5 and is supported by a large range of analysis software (e.g., Matlab and R) and programming languages (e.g., C++, Fortran, and Python). Careful optimization of the storage layout of MSI data sets using chunking, compression, and data replication accelerates common, selective data access operations while minimizing data storage requirements and are critical enablers of rapid data I/O. The OpenMSI file format has shown to provide >2000-fold improvement for image access operations, enabling spectrum and image retrieval in less than 0.3 s across the Internet even for 50 GB MSI data sets. To make remote high-performance compute resources accessible for analysis and to facilitate data sharing and collaboration, we describe an easy-to-use yet powerful Web API, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.

Bioinformatics | 2004

Phylo-VISTA: interactive visualization of multiple DNA sequence alignments

Nameeta Shah; Olivier Couronne; Len A. Pennacchio; Michael Brudno; Serafim Batzoglou; E. Wes Bethel; Edward M. Rubin; Bernd Hamann; Inna Dubchak

MOTIVATION The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. RESULTS We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a framework based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. AVAILABILITY Phylo-VISTA is available at http://www-gsd.lbl.gov/phylovista. It requires an Internet browser with Java Plug-in 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu

IEEE Transactions on Visualization and Computer Graphics | 2007

Variable Interactions in Query-Driven Visualization

Luke J. Gosink; John C. Anderson; E. Wes Bethel; Kenneth I. Joy

Our ability to generate ever-larger, increasingly-complex data, has established the need for scalable methods that identify, and provide insight into, important variable trends and interactions. Query-driven methods are among the small subset of techniques that are able to address both large and highly complex datasets. This paper presents a new method that increases the utility of query-driven techniques by visually conveying statistical information about the trends that exist between variables in a query. In this method, correlation fields, created between pairs of variables, are used with the cumulative distribution functions of variables expressed in a users query. This integrated use of cumulative distribution functions and correlation fields visually reveals, with respect to the solution space of the query, statistically important interactions between any three variables, and allows for trends between these variables to be readily identified. We demonstrate our method by analyzing interactions between variables in two flame-front simulations.

ieee international conference on high performance computing data and analytics | 2008

High performance multivariate visual data exploration for extremely large data

Oliver Rübel; Prabhat; Kesheng Wu; Hank Childs; Jeremy S. Meredith; Cameron Geddes; E. Cormier-Michel; Sean Ahern; Gunther H. Weber; Peter Messmer; Hans Hagen; Bernd Hamann; E. Wes Bethel

One of the central challenges in modern science is the need to quickly derive knowledge and understanding from large, complex collections of data. We present a new approach that deals with this challenge by combining and extending techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting are implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct a detailed performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.

Archive | 2003

Virtual-Reality Based Interactive Exploration of Multiresolution Data

Oliver Kreylos; E. Wes Bethel; Terry J. Ligocki; Bernd Hamann

We describe a system supporting the interactive exploration of threedimensional scientific data sets in a virtual reality (VR) environment. This system aids a scientist in understanding a data set by interactively placing and manipulating visualization primitives, e. g., isosurfaces or streamlines, and thereby finding features in the data and understanding its overall structure.

Algorithmic Finance | 2013

A Big Data Approach to Analyzing Market Volatility

Kesheng Wu; E. Wes Bethel; Ming Gu; David Leinweber; Oliver Ruebel

Understanding the microstructure of the financial market requires the processing of a vast amount of data related to individual trades, and sometimes even multiple levels of quotes. This requires computing resources that are not easily available to financial academics and regulators. Fortunately, data-intensive scientific research has developed a series of tools and techniques for working with a large amount of data. In this work, we demonstrate that these techniques are effective for market data analysis by computing an early warning indicator called Volume-synchronized Probability of Informed trading (VPIN) on a massive set of futures trading records. The test data contains five and a half year’s worth of trading data for about 100 most liquid futures contracts, includes about 3 billion trades, and takes 140GB as text files. By using (1) a more efficient file format for storing the trading records, (2) more effective data structures and algorithms, and (3) parallelizing the computations, we are able to explore 16,000 different parameter combinations for computing VPIN in less than 20 hours on a 32-core IBM DataPlex machine. On average, computing VPIN of one futures contract over 5.5 years takes around 1.5 seconds on one core, which demonstrates that a modest computer is sufficient to monitor a vast number of trading activities in real-time – an ability that could be valuable to regulators. By examining a large number of parameter combinations, we are also able to identify the parameter settings that improves the prediction accuracy from 80% to 93%.

ieee international conference on high performance computing data and analytics | 2012

Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

E. Wes Bethel; Mark Howison

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

Explore More