Janine C. Bennett | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Janine C. Bennett is active.

Explore More

Publication

Featured researches published by Janine C. Bennett.

ieee international conference on high performance computing data and analytics | 2012

Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

Janine C. Bennett; Hasan Abbasi; Peer-Timo Bremer; Ray W. Grout; Attila Gyulassy; Tong Jin; Scott Klasky; Hemanth Kolla; Manish Parashar; Valerio Pascucci; Philippe Pierre Pebay; David C. Thompson; Hongfeng Yu; Fan Zhang; Jacqueline H. Chen

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

IEEE Transactions on Visualization and Computer Graphics | 2004

Topological segmentation in three-dimensional vector fields

Karim Mahrous; Janine C. Bennett; Gerik Scheuermann; Bernd Hamann; Kenneth I. Joy

We present a new method for topological segmentation in steady three-dimensional vector fields. Depending on desired properties, the algorithm replaces the original vector field by a derived segmented data set, which is utilized to produce separating surfaces in the vector field. We define the concept of a segmented data set, develop methods that produce the segmented data by sampling the vector field with streamlines, and describe algorithms that generate the separating surfaces. This method is applied to generate local separatrices in the field, defined by a movable boundary region placed in the field. The resulting partitions can be visualized using standard techniques for a visualization of a vector field at a higher level of abstraction.

ieee symposium on large data analysis and visualization | 2011

Analysis of large-scale scalar data using hixels

David C. Thompson; Joshua A. Levine; Janine C. Bennett; Peer-Timo Bremer; Attila Gyulassy; Valerio Pascucci; Philippe Pierre Pebay

One of the greatest challenges for todays visualization and analysis communities is the massive amounts of data generated from state of the art simulations. Traditionally, the increase in spatial resolution has driven most of the data explosion, but more recently ensembles of simulations with multiple results per data point and stochastic simulations storing individual probability distributions are increasingly common. This paper introduces a new data representation for scalar data, called hixels, that stores a histogram of values for each sample point of a domain. The histograms may be created by spatial down-sampling, binning ensemble values, or polling values from a given distribution. In this manner, hixels form a compact yet information rich approximation of large scale data. In essence, hixels trade off data size and complexity for scalar-value “uncertainty”. Based on this new representation we propose new feature detection algorithms using a combination of topological and statistical methods. In particular, we show how to approximate topological structures from hixel data, extract structures from multi-modal distributions, and render uncertain isosurfaces. In all three cases we demonstrate how using hixels compares to traditional techniques and provide new capabilities to recover prominent features that would otherwise be either infeasible to compute or ambiguous to infer. We use a collection of computer tomography data and large scale combustion simulations to illustrate our techniques.

international conference on cluster computing | 2009

Numerically stable, single-pass, parallel statistics algorithms

Janine C. Bennett; Ray W. Grout; Philippe Pierre Pebay; Diana C. Roe; David C. Thompson

Statistical analysis is widely used for countless scientific applications in order to analyze and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. In this paper we derive a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments. Using these formulas, we have built an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. The results of a scalability study demonstrate numerically stable, near-optimal scalability on up to 128 processes and results are presented in which the statistical framework is used to process large-scale turbulent combustion simulation data with 1500 processes.

IEEE Transactions on Visualization and Computer Graphics | 2011

Feature-Based Statistical Analysis of Combustion Simulation Data

Janine C. Bennett; Vaidyanathan Krishnamoorthy; Shusen Liu; Ray W. Grout; Evatt R. Hawkes; Jacqueline H. Chen; Jason F. Shepherd; Valerio Pascucci; Peer-Timo Bremer

We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion science; however, it is applicable to many other science domains.

ieee international conference on high performance computing data and analytics | 2014

In-situ feature extraction of large scale combustion simulations using segmented merge trees

Aaditya G. Landge; Valerio Pascucci; Attila Gyulassy; Janine C. Bennett; Hemanth Kolla; Jacqueline H. Chen; Peer-Timo Bremer

The ever increasing amount of data generated by scientific simulations coupled with system I/O constraints are fueling a need for in-situ analysis techniques. Of particular interest are approaches that produce reduced data representations while maintaining the ability to redefine, extract, and study features in a post-process to obtain scientific insights. This paper presents two variants of in-situ feature extraction techniques using segmented merge trees, which encode a wide range of threshold based features. The first approach is a fast, low communication cost technique that generates an exact solution but has limited scalability. The second is a scalable, local approximation that nevertheless is guaranteed to correctly extract all features up to a predefined size. We demonstrate both variants using some of the largest combustion simulations available on leadership class supercomputers. Our approach allows state-of-the-art, feature-based analysis to be performed in-situ at significantly higher frequency than currently possible and with negligible impact on the overall simulation runtime.

ieee international conference on high performance computing data and analytics | 2013

Exploring power behaviors and trade-offs of in-situ data analytics

Marc Gamell; Ivan Rodero; Manish Parashar; Janine C. Bennett; Hemanth Kolla; Jacqueline H. Chen; Peer-Timo Bremer; Aaditya G. Landge; Attila Gyulassy; Patrick S. McCormick; Scott Pakin; Valerio Pascucci; Scott Klasky

As scientific applications target exascale, challenges related to data and energy are becoming dominating concerns. For example, coupled simulation workflows are increasingly adopting in-situ data processing and analysis techniques to address costs and overheads due to data movement and I/O. However it is also critical to understand these overheads and associated trade-offs from an energy perspective. The goal of this paper is exploring data-related energy/performance trade-offs for end-to-end simulation workflows running at scale on current high-end computing systems. Specifically, this paper presents: (1) an analysis of the data-related behaviors of a combustion simulation workflow with an insitu data analytics pipeline, running on the Titan system at ORNL; (2) a power model based on system power and data exchange patterns, which is empirically validated; and (3) the use of the model to characterize the energy behavior of the workflow and to explore energy/performance tradeoffs on current as well as emerging systems.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Design and Performance of a Scalable, Parallel Statistics Toolkit

Philippe Pierre Pebay; David C. Thompson; Janine C. Bennett; Ajith Arthur Mascarenhas

Most statistical software packages implement a broad range of techniques but do so in an ad hoc fashion, leaving users who do not have a broad knowledge of statistics at a disadvantage since they may not understand all the implications of a given analysis or how to test the validity of results. These packages are also largely serial in nature, or target multicore architectures instead of distributed-memory systems, or provide only a small number of statistics in parallel. This paper surveys a collection of parallel implementations of statistics algorithm developed as part of a common framework over the last 3 years. The framework strategically groups modeling techniques with associated verification and validation techniques to make the underlying assumptions of the statistics more clear. Furthermore it employs a design pattern specifically targeted for distributed-memory parallelism, where architectural advances in large-scale high-performance computing have been focused. Moment-based statistics (which include descriptive, correlative, and multicorrelative statistics, principal component analysis (PCA), and k-means statistics) scale nearly linearly with the data set size and number of processes. Entropy-based statistics (which include order and contingency statistics) do not scale well when the data in question is continuous or quasi-diffuse but do scale well when the data is discrete and compact. We confirm and extend our earlier results by now establishing near-optimal scalability with up to 10,000 processes.

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV) | 2013

A provably-robust sampling method for generating colormaps of large data

David C. Thompson; Janine C. Bennett; C. Seshadhri; Ali Pinar

First impressions from initial renderings of data are crucial for directing further exploration and analysis. In most visualization systems, default colormaps are generated by simply linearly interpolating color in some space based on a values placement between the minimum and maximum taken on by the dataset. We design a simple sampling-based method for generating colormaps that high-lights important features. We use random sampling to determine the distribution of values observed in the data. The sample size required is independent of the dataset size and only depends on certain accuracy parameters. This leads to a computationally cheap and robust algorithm for colormap generation. Our approach (1) uses perceptual color distance to produce palettes from color curves, (2) allows the user to either emphasize or de-emphasize prominent values in the data, (3) uses quantiles to map distinct colors to values based on their frequency in the dataset, and (4) supports the highlighting of either inter- or intra-mode variations in the data.

international conference on cluster computing | 2010

Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases

Philippe Pierre Pebay; David C. Thompson; Janine C. Bennett

Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and c2 independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

Explore More