Is this you? Create Your Porfile

Valerie Hendrix

Lawrence Berkeley National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valerie Hendrix is active.

Explore More

Publication

Featured researches published by Valerie Hendrix.

international conference on e-science | 2014

Experiences with User-Centered Design for the Tigres Workflow API

Lavanya Ramakrishnan; Sarah S. Poon; Valerie Hendrix; Daniel K. Gunter; Gilberto Pastorello; Deborah A. Agarwal

Scientific data volumes have been growing exponentially. This has resulted in the need for new tools that enable users to operate on and analyze data. Cyber infrastructure tools, including workflow tools, that have been developed in the last few years has often fallen short if user needs and suffered from lack of wider adoption. User-centered Design (UCD) process has been used as an effective approach to develop usable software with high adoption rates. However, UCD has largely been applied for user-interfaces and there has been limited work in applying UCD to application program interfaces and cyber infrastructure tools. We use an adapted version of UCD that we refer to as Scientist-Centered Design (SCD) to engage with users in the design and development of Tigres, a workflow application programming interface. Tigres provides a simple set of programming templates (e.g., sequence, parallel, split, merge) that can be can used to compose and execute computational and data transformation pipelines. In this paper, we describe Tigres and discuss our experiences with the use of UCD for the initial development of Tigres. Our experience-to-date is that the UCD process not only resulted in better requirements gathering but also heavily influenced the architecture design and implementation details. User engagement during the development of tools such as Tigres is critical to ensure usability and increase adoption.

network operations and management symposium | 2012

Scalable analysis of network measurements with Hadoop and Pig

Taghrid Samak; Daniel K. Gunter; Valerie Hendrix

The deployment of ubiquitous distributed monitoring infrastructure such as perfSONAR is greatly increasing the availability and quality of network performance data. Cross-cutting analyses are now possible that can detect anomalies and provide real-time automated alerts to network management services. However, scaling these analyses to the volumes of available data remains a difficult task. Although there is significant research into offline analysis techniques, most of these approaches do not address the systems and scalability issues. This work presents an analysis framework incorporating industry best-practices and tools to perform large-scale analyses. Our framework integrates the expressiveness of Pig, the scalability of Hadoop, and the analysis and visualization capabilities of R to achieve a significant increase in both speed and power of analysis. Evaluation of our framework on a large dataset of real measurements from perfSONAR demonstrate a large speedup and novel statistical capabilities.

Remote Sensing | 2016

Global surface net-radiation at 5 km from MODIS Terra

Manish Verma; Joshua B. Fisher; Kaniska Mallick; Youngryel Ryu; Hideki Kobayashi; Alexandre Guillaume; Gregory Moore; Lavanya Ramakrishnan; Valerie Hendrix; Sebastian Wolf; Munish Sikka; Gerard Kiely; Georg Wohlfahrt; Bert Gielen; Olivier Roupsard; Piero Toscano; M. Altaf Arain; Alessandro Cescatti

Reliable and fine resolution estimates of surface net-radiation are required for estimating latent and sensible heat fluxes between the land surface and the atmosphere. However, currently, fine resolution estimates of net-radiation are not available and consequently it is challenging to develop multi-year estimates of evapotranspiration at scales that can capture land surface heterogeneity and are relevant for policy and decision-making. We developed and evaluated a global net-radiation product at 5 km and 8-day resolution by combining mutually consistent atmosphere and land data from the Moderate Resolution Imaging Spectroradiometer (MODIS) on board Terra. Comparison with net-radiation measurements from 154 globally distributed sites (414 site-years) from the FLUXNET and Surface Radiation budget network (SURFRAD) showed that the net-radiation product agreed well with measurements across seasons and climate types in the extratropics (Wilmott’s index ranged from 0.74 for boreal to 0.63 for Mediterranean sites). Mean absolute deviation between the MODIS and measured net-radiation ranged from 38.0 ± 1.8 W∙m−2 in boreal to 72.0 ± 4.1 W∙m−2 in the tropical climates. The mean bias was small and constituted only 11%, 0.7%, 8.4%, 4.2%, 13.3%, and 5.4% of the mean absolute error in daytime net-radiation in boreal, Mediterranean, temperate-continental, temperate, semi-arid, and tropical climate, respectively. To assess the accuracy of the broader spatiotemporal patterns, we upscaled error-quantified MODIS net-radiation and compared it with the net-radiation estimates from the coarse spatial (1° × 1°) but high temporal resolution gridded net-radiation product from the Clouds and Earth’s Radiant Energy System (CERES). Our estimates agreed closely with the net-radiation estimates from the CERES. Difference between the two was less than 10 W·m−2 in 94% of the total land area. MODIS net-radiation product will be a valuable resource for the science community studying turbulent fluxes and energy budget at the Earth’s surface.

Future Generation Computer Systems | 2014

CAMP: Community Access MODIS Pipeline

Valerie Hendrix; Lavanya Ramakrishnan; Youngryel Ryu; Catharine van Ingen; Keith Jackson; Deborah A. Agarwal

Abstract The Moderate Resolution Imaging Spectroradiometer (MODIS) instrument’s land and atmosphere data are important to many scientific analyses that study processes at both local and global scales. The Terra and Aqua MODIS satellites acquire data of the entire Earth’s surface every one or two days in 36 spectral bands. MODIS data provide information to complement many of the ground-based observations but are extremely critical when studying global phenomena such as gross photosynthesis and evapotranspiration. However, data procurement and processing can be challenging and cumbersome due to difficulties in volume, size of data and scale of analyses. For example, the very first step in MODIS data processing is to ensure that all products are in the same resolution and coordinate system. The reprojection step involves a complex inverse gridding algorithm and involves downloading tens of thousands of files for a single year that is often infeasible to perform on a scientist’s desktop. Thus, use of large-scale resource environments such as high performance computing (HPC) environments are becoming crucial for processing of MODIS data. However, HPC environments have traditionally been used for tightly coupled applications and present several challenges for managing data-intensive pipelines. We have developed a data-processing pipeline that downloads the MODIS swath products and reprojects the data to a sinusoidal system on an HPC system. The 10 year archive of the reprojected data generated using the pipeline is made available through a web portal. In this paper, we detail a system architecture (CAMP) to manage the lifecycle of MODIS data that includes procurement, storage, processing and dissemination. Our system architecture was developed in the context of the MODIS reprojection pipeline but is extensible to other analyses of MODIS data. Additionally, our work provides a framework and valuable experiences for future developments and deployments of data-intensive pipelines from other scientific domains on HPC systems.

workflows in support of large scale science | 2014

Combining workflow templates with a shared space-based execution model

Javier Rojas Balderrama; Matthieu Simonin; Lavanya Ramakrishnan; Valerie Hendrix; Christine Morin; Deborah A. Agarwal; Cédric Tedeschi

The growth for scientific data has led to data analysis being a critical step in the scientific process. The next generation scientific data analysis environment needs to address two challenges i) productivity of the end-user and ii) scalability of the workflows. The need to ensure both goals requires us to revisit the design and implementation of workflow tools. In this paper, we study the interaction of Tigres and HOCL-TS towards meeting these goals. Tigres and HOCL-TS have evolved separately; however their complementary foci allows us to study these issues in greater detail. We describe the pros and cons of an approach that integrates Tigres and HOCL-TS and HOCL-TS extension to support common non-functional requirements such as logging and monitoring that can be made available to the users through the Tigres API.

cluster computing and the grid | 2016

Tigres Workflow Library: Supporting Scientific Pipelines on HPC Systems

Valerie Hendrix; James Fox; Devarshi Ghoshal; Lavanya Ramakrishnan

The growth in scientific data volumes has resulted in the need for new tools that enable users to operate on and analyze data on large-scale resources. In the last decade, a number of scientific workflow tools have emerged. These tools often target distributed environments, and often need expert help to compose and execute the workflows. Data-intensive workflows are often ad-hoc, they involve an iterative development process that includes users composing and testing their workflows on desktops, and scaling up to larger systems. In this paper, we present the design and implementation of Tigres, a workflow library that supports the iterative workflow development cycle of data-intensive workflows. Tigres provides an application programming interface to a set of programming templates i.e., sequence, parallel, split, merge, that can be used to compose and execute computational and data pipelines. We discuss the results of our evaluation of scientific and synthetic workflows showing Tigres performs with minimal template overheads (mean of 13 seconds over all experiments). We also discuss various factors (e.g., I/O performance, execution mechansims) that affect the performance of scientific workflows on HPC systems.

ieee international conference on cloud computing technology and science | 2014

Storage and Data Life Cycle Management in Cloud Environments with FRIEDA

Lavanya Ramakrishnan; Devarshi Ghoshal; Valerie Hendrix; Eugen Feller; Pradeep Kumar Mantha; Christine Morin

Infrastructure as a Service (IaaS) clouds provide a composable environment that is attractive for mid-range, high-throughput and data-intensive scientific workloads. However, the flexibility of IaaS clouds presents unique challenges for storage and data management in these environments. Users use manual and/or ad-hoc methods to manage storage selection, storage configuration and data management in these environments. We address these challenges via a novel storage and data life cycle management through FRIEDA (Flexible Robust Intelligent Elastic Data Management), an application specific storage and data management framework for composable infrastructure environments.

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP), OCT 14-18, 2013, Amsterdam, NETHERLANDS | 2014

Using Puppet to contextualize computing resources for ATLAS analysis on Google Compute Engine

Henrik Ohman; S. Panitkin; Valerie Hendrix

With the advent of commercial as well as institutional and national clouds, new opportunities for on-demand computing resources for the HEP community become available. The new cloud technologies also come with new challenges, and one such is the contextualization of computing resources with regard to requirements of the user and his experiment. In particular on Googles new cloud platform Google Compute Engine (GCE) upload of users virtual machine images is not possible. This precludes application of ready to use technologies like CernVM and forces users to build and contextualize their own VM images from scratch. We investigate the use of Puppet to facilitate contextualization of cloud resources on GCE, with particular regard to ease of configuration and dynamic resource scaling.

Concurrency and Computation: Practice and Experience | 2017

Web-based visual data exploration for improved radiological source detection

Gunther H. Weber; Mark S. Bandstra; Daniel H. Chivers; Hamdy Elgammal; Valerie Hendrix; John Kua; Jonathan S. Maltz; Krishna Muriki; Yeongshnn Ong; Kai Song; Michael J. Quinlan; Lavanya Ramakrishnan; Brian J. Quiter

Radiation detection can provide a reliable means of detecting radiological material. Such capabilities can help to prevent nuclear and/or radiological attacks, but reliable detection in uncontrolled surroundings requires algorithms that account for environmental background radiation. The Berkeley Data Cloud (BDC) facilitates the development of such methods by providing a framework to capture, store, analyze, and share data sets. In the era of big data, both the size and variety of data make it difficult to explore and find data sets of interest and manage the data. Thus, in the context of big data, visualization is critical for checking data consistency and validity, identifying gaps in data coverage, searching for data relevant to an analysts use cases, and choosing input parameters for analysis. Downloading the data and exploring it on an analysts desktop using traditional tools are no longer feasible due to the size of the data. This paper describes the design and implementation of a visualization system that addresses the problems associated with data exploration within the context of the BDC. The visualization system is based on a JavaScript front end communicating via REST with a back end web server.

Archive | 2017

Berkeley Nuclear Data Cloud

Mark S. Bandstra; Joshua Boverhof; Daniel H. Chivers; Shreyas Cholia; Hamdy Elgammal; Valerie Hendrix; Joshi Tenzing; John Kua; Benson Ma; Jonathan S. Maltz; Krishna Muriki; Yeongshnn Ong; Michael J. Quinlan; Brian J. Quiter; Lavanya Ramakrishnan; Kai Song; Gunther H. Weber

Explore More