Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kerstin Kleese van Dam is active.

Publication


Featured researches published by Kerstin Kleese van Dam.


International Journal of High Performance Computing Applications | 2018

The future of scientific workflows

Ewa Deelman; Tom Peterka; Ilkay Altintas; Christopher D. Carothers; Kerstin Kleese van Dam; Kenneth Moreland; Manish Parashar; Lavanya Ramakrishnan; Jeffrey S. Vetter

Today’s computational, experimental, and observational sciences rely on computations that involve many related tasks. The success of a scientific mission often hinges on the computer automation of these workflows. In April 2015, the US Department of Energy (DOE) invited a diverse group of domain and computer scientists from national laboratories supported by the Office of Science, the National Nuclear Security Administration, from industry, and from academia to review the workflow requirements of DOE’s science and national security missions, to assess the current state of the art in science workflows, to understand the impact of emerging extreme-scale computing systems on those workflows, and to develop requirements for automated workflow management in future and existing environments. This article is a summary of the opinions of over 50 leading researchers attending this workshop. We highlight use cases, computing systems, workflow needs and conclude by summarizing the remaining challenges this community sees that inhibit large-scale scientific workflows from becoming a mainstream tool for extreme-scale science.


2016 New York Scientific Data Summit (NYSDS) | 2016

Data provenance hybridization supporting extreme-scale scientific workflow applications

Todd O. Elsethagen; Eric G. Stephan; Bibi Raju; Malachi Schram; Matt C. Macduff; Darren J. Kerbyson; Kerstin Kleese van Dam; Alok Singh; Ilkay Altintas

As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs. To that end, the US Department of Energy Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project is currently investigating an integrated approach to prediction and diagnosis of these extreme-scale scientific workflows. To gain insight and a more quantitative understanding of a workflows performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflows execution. In this paper, we describe IPPDs provenance management solution (ProvEn) and its hybrid data store combining both of these data provenance perspectives. We discuss design and implementation details that include provenance disclosure, scalability, data integration, and a discussion on query and analysis capabilities. We also present use case examples for climate modeling and thermal modeling application domains.


international conference on e-science | 2016

Management, analysis, and visualization of experimental and observational data — The convergence of data and computing

E. Wes Bethel; M. Greenwald; Kerstin Kleese van Dam; Manish Parashar; Stefan M. Wild; H. Steven Wiley

Scientific user facilities — particle accelerators, telescopes, colliders, supercomputers, light sources, sequencing facilities, and more — operated by the U.S. Department of Energy (DOE) Office of Science (SC) generate ever increasing volumes of data at unprecedented rates from experiments, observations, and simulations. At the same time there is a growing community of experimentalists that require real-time data analysis feedback, to enable them to steer their complex experimental instruments to optimized scientific outcomes and new discoveries. Recent efforts in DOE-SC have focused on articulating the data-centric challenges and opportunities facing these science communities. Key challenges include difficulties coping with data size, rate, and complexity in the context of both real-time and post-experiment data analysis and interpretation. Solutions will require algorithmic and mathematical advances, as well as hardware and software infrastructures that adequately support data-intensive scientific workloads. This paper presents the summary findings of a workshop held by DOE-SC in September 2015, convened to identify the major challenges and the research that is needed to meet those challenges.


international conference on big data | 2016

Leveraging large sensor streams for robust cloud control

Alok Singh; Eric G. Stephan; Todd O. Elsethagen; Matt C. Macduff; Bibi Raju; Malachi Schram; Kerstin Kleese van Dam; Darren J. Kerbyson; Ilkay Altintas

Todays dynamic computing deployment for commercial and scientific applications is propelling us to an era where minor inefficiencies can snowball into significant performance and operational bottlenecks. Data center operations is increasingly relying on sensors based control systems for key decision insights. The increased sampling frequencies, cheaper storage costs and prolific deployment of sensors is producing massive volumes of operational data. However, there is a lag between rapid development of analytical techniques and its widespread practical deployment. We present empirical evidence of the potential carried by analytical techniques for operations management in computing and data centers. Using Machine Learning modeling techniques on data from a real instrumented cluster, we demonstrate that predictive modeling on operational sensor data can directly reduce systems operations monitoring costs and improve system reliability.


2016 New York Scientific Data Summit (NYSDS) | 2016

Streaming data analysis on the wire

Dimitrios Katramatos; Meng Yue; Shinjae Yoo; Kerstin Kleese van Dam; Jin Xu; Jiayao Zhang

In the era of Big Data, more data can be potentially found in transit at any given moment than in storage. In this paper we discuss the feasibility of applying certain forms of processing on the wire, i.e., while data is in flight through the network. In a similar manner to cybersecurity processes and exploiting software defined networking concepts, we seek to design a framework for Analysis on the Wire that can on demand perform computations on data streaming through the network at specific network nodes. We further discuss use cases and perform preliminary evaluation of the proposed framework.


2016 New York Scientific Data Summit (NYSDS) | 2016

Software tools for X-ray photon correlation and X-ray speckle visibility spectroscopy

Sameera K. Abeykoon; Yugang Zhang; Eric D. Dill; Thomas A Caswell; Daniel Allan; Arman Akilic; Lutz Wiegart; S. B. Wilkins; Annie Heroux; Kerstin Kleese van Dam; M. Sutton; Andrei Fluerasu

A set of new data analysis software tools have been developed for the study of structural dynamics of materials using coherent scattering and photon correlation techniques. The new software tools can readily process high-throughput, multidimensional data, enabling studies of slow and fast dynamics of materials using X-ray Speckle Visibility Spectroscopy and X-ray Photon Correlation Spectroscopy techniques. They support a wide range of user expertise, from novice to developer, and are available in Scikit-beam python package which is available at https://github.com/scikit-beam/scikit-beam.


international joint conference on computer vision imaging and computer graphics theory and applications | 2018

Performance Visualization for TAU Instrumented Scientific Workflows.

Cong Xie; Wei Xu; Sungsoo Ha; Kevin A. Huck; Sameer Shende; Hubertus Van Dam; Kerstin Kleese van Dam; Klaus Mueller

In exascale scientific computing, it is essential to efficiently monitor, evaluate and improve performance. Visualization and especially visual analytics are useful and inevitable techniques in the exascale computing era to enable such a human-centered experience. In this ongoing work, we present a visual analytics framework for performance evaluation of scientific workflows. Ultimately, we aim to solve two current challenges: the capability to deal with workflows, and the scalability toward exascale scenario. On the way to achieve these goals, in this work, we first incorporate TAU (Tuning and Analysis Utilities) instrumentation tool and improve it to accommodate workflow measurements. Then we establish a web-based visualization framework, whose back end handles data storage, query and aggregation, while front end presents the visualization and takes user interaction. In order to support the scalability, a few level-of-detail mechanisms are developed. Finally, a chemistry workflow use case is adopted to verify our methods.


Visual Informatics | 2018

MultiSciView: Multivariate Scientific X-ray Image Visual Exploration with Cross-Data Space Views

Wen Zhong; Wei Xu; Kevin G. Yager; Gregory S. Doerk; Jian Zhao; Yunke Tian; Sungsoo Ha; Cong Xie; Yuan Zhong; Klaus Mueller; Kerstin Kleese van Dam

Abstract X-ray images obtained from synchrotron beamlines are large-scale, high-resolution and high-dynamic-range grayscale data encoding multiple complex properties of the measured materials. They are typically associated with a variety of metadata which increases their inherent complexity. There is a wealth of information embedded in these data but so far scientists lack modern exploration tools to unlock these hidden treasures. To bridge this gap, we propose MultiSciView, a multivariate scientific x-ray image visualization and exploration system for beamline-generated x-ray scattering data. Our system is composed of three complementary and coordinated interactive visualizations to enable a coordinated exploration across the images and their associated attribute and feature spaces. The first visualization features a multi-level scatterplot visualization dedicated for image exploration in attribute, image, and pixel scales. The second visualization is a histogram-based attribute cross filter by which users can extract desired subset patterns from data. The third one is an attribute projection visualization designed for capturing global attribute correlations. We demonstrate our framework by ways of a case study involving a real-world material scattering dataset. We show that our system can efficiently explore large-scale x-ray images, accurately identify preferred image patterns, anomalous images and erroneous experimental settings, and effectively advance the comprehension of material nanostructure properties.


arXiv: Distributed, Parallel, and Cluster Computing | 2017

Building near-real-time processing pipelines with the spark-MPI platform

Nikolay Malitsky; Aashish Chaudhary; Sébastien Jourdain; Matt Cowan; Patrick O’Leary; Marcus D. Hanwell; Kerstin Kleese van Dam

Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three V’s (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.


2017 New York Scientific Data Summit (NYSDS) | 2017

Parallelizing x-ray photon correlation spectroscopy software tools using python multiprocessing

Sameera K. Abeykoon; Meifeng Lin; Kerstin Kleese van Dam

The third generation synchrotron facilities that are designed to deliver highly intense and bright X-ray beams along with the new area detectors capable of achieving high dynamic ratios and fast frame rates have enabled novel Coherent X-ray scattering experiments. X-ray Photon Correlation Spectroscopy is such a technique that measures nano- and mesoscale dynamics in materials. The scikit-beam Python analysis library developed at the National Synchrotron Light Source-II at Brookhaven National Laboratory contains a serial version of Xray Photon Correlation Spectroscopy software tools to perform streaming analysis of structural dynamics of materials, which can be time consuming given the anticipated fast data rates and high image resolutions at the National Synchrotron Light Source-II. Therefore, it is essential to parallelize these data analysis tools to achieve the best performance on the available workstations that contain multi-core processors. In this paper, we report the progress that we have made in using the Python multiprocessing module to parallelize the time-correlation functions in scikit-beam. We will compare the results from different multiprocessing approaches, and discuss pros and cons associated with each method.

Collaboration


Dive into the Kerstin Kleese van Dam's collaboration.

Top Co-Authors

Avatar

Eric G. Stephan

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ilkay Altintas

University of California

View shared research outputs
Top Co-Authors

Avatar

Bibi Raju

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Cong Xie

Stony Brook University

View shared research outputs
Top Co-Authors

Avatar

Todd O. Elsethagen

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Wei Xu

Brookhaven National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Alok Singh

University of California

View shared research outputs
Top Co-Authors

Avatar

Darren J. Kerbyson

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Malachi Schram

Pacific Northwest National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge