Catharine van Ingen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catharine van Ingen is active.

Explore More

Publication

Featured researches published by Catharine van Ingen.

Global Biogeochemical Cycles | 2011

Integration of MODIS land and atmosphere products with a coupled-process model to estimate gross primary productivity and evapotranspiration from 1 km to global scales

Youngryel Ryu; Dennis D. Baldocchi; Hideki Kobayashi; Catharine van Ingen; Jie Li; T. Andy Black; Jason Beringer; Eva van Gorsel; Alexander Knohl; Beverly E. Law; Olivier Roupsard

linear relations with measurements of solar irradiance (r 2 = 0.95, relative bias: 8%), gross primary productivity (r 2 = 0.86, relative bias: 5%) and evapotranspiration (r 2 = 0.86, relative bias: 15%) in data from 33 flux towers that cover seven plant functional types across arctic to tropical climatic zones. A sensitivity analysis revealed that the gross primary productivity and evapotranspiration computed in BESS were most sensitive to leaf area index and solar irradiance, respectively. We quantified the mean global terrestrial estimates of gross primary productivity and evapotranpiration between 2001 and 2003 as 118 � 26 PgC yr � 1 and 500 � 104 mm yr � 1 (equivalent to 63,000 � 13,100 km 3 yr � 1 ), respectively. BESS-derived gross primary productivity and evapotranspiration estimates were consistent with the estimates from independent machine-learning, data-driven products, but the process-oriented structure has the advantage of diagnosing sensitivity of mechanisms. The process-based BESS is able to offer gridded biophysical variables everywhere from local to the total global land scales with an 8-day interval over multiple years.

international conference on cloud computing | 2010

Bridging the Gap between Desktop and the Cloud for eScience Applications

Yogesh Simmhan; Catharine van Ingen; Girish Subramanian; Jie Li

The widely discussed scientific data deluge creates a need to computationally scale out eScience applications beyond the local desktop and cope with variable loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to these needs. Yet cloud computing creates gaps that must be crossed to move existing science applications to the cloud. In this article, we propose a Generic Worker framework to deploy and invoke science applications in the cloud with minimal user effort and predictable cost-effective performance. Our framework addresses three distinct challenges posed by the cloud: the complexity of application deployment, invocation of cloud applications from desktop clients, and efficient transparent data transfers across desktop and the cloud. We present an implementation of the Generic Worker for the Microsoft Azure Cloud and evaluate its use for a genomics application. Our evaluation shows that the user complexity to port and scale the application is substantially reduced while introducing a negligible performance overhead of of <; 5% for the genomics application when scaling to 20 VM instances.

2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences | 2009

Building the Trident Scientific Workflow Workbench for Data Management in the Cloud

Yogesh Simmhan; Roger S. Barga; Catharine van Ingen; Edward D. Lazowska; Alexander S. Szalay

Scientific workflows have gained popularity for modeling and executing in silico experiments by scientists for problem-solving. These workflows primarily engage in computation and data transformation tasks to perform scientific analysis in the Science Cloud. Increasingly workflows are gaining use in managing the scientific data when they arrive from external sensors and are prepared for becoming science ready and available for use in the Cloud. While not directly part of the scientific analysis, these workflows operating behind the Cloud on behalf of the -data valets¿ play an important role in end-to-end management of scientific data products. They share several features with traditional scientific workflows: both are data intensive and use Cloud resources. However, they also differ in significant respects, for example, in the reliability required, scheduling constraints and the use of provenance collected. In this article, we investigate these two classes of workflows – Science Application workflows and Data Preparation workflows – and use these to drive common and distinct requirements from workflow systems for eScience in the Cloud. We use workflow examples from two collaborations, the NEPTUNE oceanography project and the Pan-STARRS astronomy project, to draw out our comparison. Our analysis of these workflows classes can guide the evolution of workflow systems to support emerging applications in the Cloud and the Trident Scientific Workbench is one such workflow system that has directly benefitted from this to meet the needs of these two eScience projects

Concurrency and Computation: Practice and Experience | 2013

Assessing the quality and trustworthiness of citizen science data

Jane Hunter; Abdulmonem Alabri; Catharine van Ingen

The Internet, Web 2.0 and Social Networking technologies are enabling citizens to actively participate in ‘citizen science’ projects by contributing data to scientific programmes via the Web. However, the limited training, knowledge and expertise of contributors can lead to poor quality, misleading or even malicious data being submitted. Subsequently, the scientific community often perceive citizen science data as not worthy of being used in serious scientific research—which in turn, leads to poor retention rates for volunteers. In this paper, we describe a technological framework that combines data quality improvements and trust metrics to enhance the reliability of citizen science data. We describe how online social trust models can provide a simple and effective mechanism for measuring the trustworthiness of community‐generated data. We also describe filtering services that remove unreliable or untrusted data and enable scientists to confidently reuse citizen science data. The resulting software services are evaluated in the context of the CoralWatch project—a citizen science project that uses volunteers to collect comprehensive data on coral reef health. Copyright

international conference on data engineering | 2009

Environmental Monitoring 2.0

Sebastian Michel; Ali Salehi; Liqian Luo; Nicholas Dawes; Karl Aberer; Guillermo Barrenetxea; Mathias Bavay; Aman Kansal; K. Ashwin Kumar; Suman Nath; Marc Parlange; Stewart Tansley; Catharine van Ingen; Feng Zhao; Yongluan Zhou

A sensor network data gathering and visualization infrastructure is demonstrated, comprising of Global Sensor Networks (GSN) middleware and Microsoft SensorMap. Users are invited to actively participate in the process of monitoring real-world deployments and can inspect measured data in the form of contour plots overlayed onto a high resolution map and a digital topographic model. Users can go back in time virtually to search for interesting events or simply to visualize the temporal dependencies of the data. The system presented is not only interesting and visually enticing for non-expert users but brings substantial benefits to environmental scientists. The easily installed data acquisition component as well as the powerful data sharing and visualization platform opens up new ground in collaborative data gathering and interpretation in the spirit of Web 2.0 applications.

Archive | 2012

Database Maintenance, Data Sharing Policy, Collaboration

Dario Papale; Deborah A. Agarwal; Dennis D. Baldocchi; R. B. Cook; Joshua B. Fisher; Catharine van Ingen

If I have seen further,” Sir Isaac Newton wrote to Robert Hooke in 1676, “it is by standing on the shoulders of giants.

International Journal of Agricultural and Environmental Information Systems | 2011

Using Ontologies to Relate Resource Management Actions to Environmental Monitoring Data in South East Queensland

Jane Hunter; Peter Becker; Abdulmonem Alabri; Catharine van Ingen; Eva Abal

The Health-e-Waterways Project is a multi-disciplinary collaboration between the University of Queensland, Microsoft Research and the South East Queensland Healthy Waterways Partnership (SEQ-HWP). This project develops the underlying technological framework and set of services to enable streamlined access to the expanding collection of real-time, near-real-time and static datasets related to water resource management in South East Queensland. More specifically, the system enables water resource managers to access the datasets being captured by the various agencies participating in the SEQ HWP Ecosystem Health Monitoring Program (EHMP). It also provides online access to the statistical data processing tools that enable users to analyse the data and generate online ecosystem report cards dynamically via a Web mapping interface. The authors examine the development of ontologies and semantic querying tools to integrate disparate datasets and relate management actions to water quality indicators for specific regions and periods. This semantic data integration approach enables scientists and resource managers to identify which actions are having an impact on which parameters and adapt the management strategies accordingly. This paper provides an overview of the semantic technologies developed to underpin the adaptive management framework that is the central philosophy behind the SEQ HWP.

international conference on e-science | 2009

Fluxdata.org: Publication and Curation of Shared Scientific Climate and Earth Sciences Data

Marty Humphrey; Deborah A. Agarwal; Catharine van Ingen

Many of today’s large-scale scientific projects attempt to collect data from a diverse set of sources. The traditional campaign-style approach to “synthesis” efforts gathers data through a single concentrated effort, and the data contributors know in advance exactly who will use their data and why. At even moderate scales, the cost and time required to find, gather, collate, normalize, and customize data in order to build a synthesis dataset can quickly outweigh the value of the resulting dataset. By explicitly identifying and addressing the different requirements for each data role (author, publisher, curator, and consumer), our data management architecture for large-scale shared scientific data enables the creation of such synthesis datasets that continue to grow and evolve with new data, data annotations, participants, and use rules. We show the effectiveness of our approach in the context of the FLUXNET Synthesis Dataset, one of the largest ongoing biogeophysical experiments.

international conference on e-science | 2009

Building Reliable Data Pipelines for Managing Community Data Using Scientific Workflows

Yogesh Simmhan; Catharine van Ingen; Alexander S. Szalay; Roger S. Barga; J. N. Heasley

The growing amount of scientific data from sensors and field observations is posing a challenge to “data valets” responsible for managing them in data repositories. These repositories built on commodity clusters need to reliably ingest data continuously and ensure its availability to a wide user community. Workflows provide several benefits to modeling data-intensive science applications and many of these benefits can help manage the data ingest pipelines too. But using workflows is not panacea in itself and data valets need to consider several issues when designing workflows that behave reliably on fault prone hardware while retaining the consistency of the scientific data. In this paper, we propose workflow designs for reliable data ingest in a distributed environment and identify workflow framework features to support resilience. We illustrate these using the data pipeline for the Pan-STARRS repository, one of the largest digital surveys that accumulates 100TB of data annually to support 300 astronomers.

Future Generation Computer Systems | 2014

CAMP: Community Access MODIS Pipeline

Valerie Hendrix; Lavanya Ramakrishnan; Youngryel Ryu; Catharine van Ingen; Keith Jackson; Deborah A. Agarwal

Abstract The Moderate Resolution Imaging Spectroradiometer (MODIS) instrument’s land and atmosphere data are important to many scientific analyses that study processes at both local and global scales. The Terra and Aqua MODIS satellites acquire data of the entire Earth’s surface every one or two days in 36 spectral bands. MODIS data provide information to complement many of the ground-based observations but are extremely critical when studying global phenomena such as gross photosynthesis and evapotranspiration. However, data procurement and processing can be challenging and cumbersome due to difficulties in volume, size of data and scale of analyses. For example, the very first step in MODIS data processing is to ensure that all products are in the same resolution and coordinate system. The reprojection step involves a complex inverse gridding algorithm and involves downloading tens of thousands of files for a single year that is often infeasible to perform on a scientist’s desktop. Thus, use of large-scale resource environments such as high performance computing (HPC) environments are becoming crucial for processing of MODIS data. However, HPC environments have traditionally been used for tightly coupled applications and present several challenges for managing data-intensive pipelines. We have developed a data-processing pipeline that downloads the MODIS swath products and reprojects the data to a sinusoidal system on an HPC system. The 10 year archive of the reprojected data generated using the pipeline is made available through a web portal. In this paper, we detail a system architecture (CAMP) to manage the lifecycle of MODIS data that includes procurement, storage, processing and dissemination. Our system architecture was developed in the context of the MODIS reprojection pipeline but is extensible to other analyses of MODIS data. Additionally, our work provides a framework and valuable experiences for future developments and deployments of data-intensive pipelines from other scientific domains on HPC systems.

Explore More