Is this you? Create Your Porfile

Arcot Rajasekar

University of North Carolina at Chapel Hill

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arcot Rajasekar is active.

Explore More

Publication

Featured researches published by Arcot Rajasekar.

Environmental Modelling and Software | 2016

Using a data grid to automate data preparation pipelines required for regional-scale hydrologic modeling

Mirza M. Billah; Jonathan L. Goodall; Ujjwal Narayan; Bakinam T. Essawy; Venkat Lakshmi; Arcot Rajasekar; Reagan Moore

Modeling a regional-scale hydrologic system introduces major data challenges related to the access and transformation of heterogeneous datasets into the information needed to execute a hydrologic model. These data preparation activities are difficult to automate, making the reproducibility and extensibility of model simulations conducted by others difficult or even impossible. This study addresses this challenge by demonstrating how the integrated Rule Oriented Data Management System (iRODS) can be used to support data processing pipelines needed when using data-intensive models to simulate regional-scale hydrologic systems. Focusing on the Variable Infiltration Capacity (VIC) model as a case study, data preparation steps are sequenced using rules within iRODS. VIC and iRODS are applied to study hydrologic conditions in the Carolinas, USA during the period 1998-2007 to better understand impacts of drought within the region. The application demonstrates how iRODS can support hydrologic modelers to create more reproducible and extensible model-based analyses. An approach for data processing to support hydrologic modeling is presented.The approach uses federated data grids and server-side data processing.The approach is demonstrated using the Variable Infiltration Capacity (VIC) model.The demonstration focuses on simulating drought in the Carolinas, USA.

Earth and Space Science | 2016

Server-side workflow execution using data grid technology for reproducible analyses of data-intensive hydrologic systems

Bakinam T. Essawy; Jonathan L. Goodall; Hao Xu; Arcot Rajasekar; James D. Myers; Tracy A. Kugler; Mirza M. Billah; Reagan Moore

Many geoscience disciplines utilize complex computational models for advancing understanding and sustainable management of Earth systems. Executing such models and their associated data preprocessing and postprocessing routines can be challenging for a number of reasons including (1) accessing and preprocessing the large volume and variety of data required by the model, (2) postprocessing large data collections generated by the model, and (3) orchestrating data processing tools, each with unique software dependencies, into workflows that can be easily reproduced and reused. To address these challenges, the work reported in this paper leverages the Workflow Structured Object functionality of the Integrated Rule-Oriented Data System and demonstrates how it can be used to access distributed data, encapsulate hydrologic data processing as workflows, and federate with other community-driven cyberinfrastructure systems. The approach is demonstrated for a study investigating the impact of drought on populations in the Carolinas region of the United States. The analysis leverages computational modeling along with data from the Terra Populus project and data management and publication services provided by the Sustainable Environment-Actionable Data project. The work is part of a larger effort under the DataNet Federation Consortium project that aims to demonstrate data and computational interoperability across cyberinfrastructure developed independently by scientific communities.

ieee international symposium on policies for distributed systems and networks | 2011

Demonstration of Policy-Guided Data Preservation Using iRODS

Mike C. Conway; Reagan Moore; Arcot Rajasekar; Jean-Yves Nief

Policy-oriented data management is an emerging field that provides automation of large collection management using policies encoded as machine-executable rules. Digital archives provide long term preservation of collections that can span decades across multiple technology refreshes. Policy-based data management is critical for digital archives because of the need to maintain integrity across long spans of time. The management of preservation policies for large collections can be an onerous task. Typical policies may control ingest of records, extraction of required provenance information, replication of records, retention and disposition, and validation of assessment criteria. The iRODS (the integrated Rule-Oriented Data System) is an exemplar that provides distributed data management under a policy-oriented framework that can be applied to long-term data archiving. We will present a demonstration of the preservation environment, including the composition of preservation policies and automated enforcement of policies. A user interface framework will be demonstrated that manages ingest of records into a staging area, automates application of preservation policies, and tracks status of the processing steps.

international conference on intelligent systems, modelling and simulation | 2010

Applying Rules as Policies for Large-Scale Data Sharing

Arcot Rajasekar; Reagan Moore; Mike Wan; Wayne Schroeder; Adil Hasan

Large scientific projects need collaborative data sharing environments. For projects like the Ocean Observations Initiative (OOI), the Temporal Dynamics of Learning Center (TDLC) and Large-scale Synoptic Survey Telescope (LSST) the amount of data collected will be on the order of Petabytes, stored across distributed heterogeneous resources under multiple administrative organizations. Policy-oriented data management is essential in such collaborations. The integrated Rule-Oriented Data System (iRODS) is a peer-to-peer, federated server-client architecture that uses a distributed rule engine for data management to apply policies encoded as rules. The rules are triggered on data management events (ingestion, access, modifications, annotations, format conversion, etc) as well as periodically (to check integrity of the data collections, intelligent data archiving and placement, load balancing, etc). Rules are applied by system administrators (e.g. for resource creation, user management, etc.) and by individual users, groups and data providers to tailor the sharing and access of data for their own needs. In this paper, we will discuss the architecture of the iRODS middleware system and discuss some of the applications of the software.

collaboration technologies and systems | 2009

Universal view and open policy: Paradigms for collaboration in data grids

Arcot Rajasekar; Reagan Moore; Michael Wan; Wayne Schroeder

Large-scale Data Grid Systems (LDGS) facilitate collaborative sharing of large collections (Petabytes and100s of millions of objects) containing files, databases and data streams that are geographically distributed across heterogeneous resources and multiple administrative domains. LDGS provide a “universal view” of the distributed data, resources, users and methods and hide the idiosyncrasies and the heterogeneity of the underlying infrastructure and protocols - enhancing user collaborations. To improve transparency, an “open policy” system is needed by which data providers and administrators can describe the exact processes and policies that implement LDGS services. We consider policies and processes as the essential defining characteristics of a productive LDGS collaboration. We have implemented an LDGS, called integrated Rule-Oriented Data Systems (iRODS), which provides a universal view while enabling an open policy environment for publishing descriptions of the available services. The open policy environment is supported by a distributed workflow/rule engine. The services are encoded as rules in a high-level workflow language that transparently describes the underlying functionality. Well-defined semantics are used to control the composition of the workflow functions, called micro-services, to map to the desired client-level actions. In this paper, we describe the iRODS system from the “universal view” and “open policy” perspective and show its scalability for managing more than 10 million files.

Proceedings of the 2010 Roadmap for Digital Preservation Interoperability Framework Workshop on | 2010

iRODS policy sets as standards for preservation

Reagan Moore; Arcot Rajasekar; Mike Wan

Digital data have a life cycle, of which preservation represents one stage. A traditional life cycle corresponds to creation of a collection, sharing of data, publication of data, and preservation of data. Data management at each stage can be represented by a set of policies and procedures that represent the consensus of a broad user community. In this view, preservation corresponds to the policies and procedures that are required for future users to understand the properties that have been maintained within a collection. Policy-based data management systems support policy evolution, making it possible to include preservation policies within a standard framework that is capable of supporting the evolution of policies across all stages of the data life cycle. An example is the iRODS integrated Rule-Oriented Data System, http://irods.diceresearch.org.

GREE '14 Proceedings of the 2014 Third GENI Research and Educational Experiment Workshop | 2014

A Framework for Integration of Rule-Oriented Data Management Policies with Network Policies

Shu Huang; Hao Xu; Yufeng Xin; Leasa Brieger; Reagan Moore; Arcot Rajasekar

Traditionally data management software running on top of the Internet has very limited primitives to interact with the networking layer. This limitation has become a major road-block to develop next generation data management applications requiring high-bandwidth and dynamic network configuration. In this work, we present a policy-driven software framework that acts as an adaptation layer between the data management software and SDN networks. This framework allows a tight coupling between the data grid and the network and therefore makes complex workflow-like cross-layer computation possible. We have prototyped this adaptation layer integrated with iRODS, a popular policy-driven data grid software and Floodlight, a popular OpenFlow controller, and demonstrate how network policies become part of the overall data grid policies to improve the application performance.

international conference on social computing | 2013

Sociometric Methods for Relevancy Analysis of Long Tail Science Data

Arcot Rajasekar; Sharlini Sankaran; Howard Lander; Thomas M. Carsey; Jonathan Crabtree; Mercè Crosas; Gary King; Hye-Chung Kum; Justin Zhan

As the push towards electronic storage, publication, curation, and discoverability of research data collected in multiple research domains has grown, so too have the massive numbers of small to medium datasets that are highly distributed and not easily discoverable - a region of data that is sometimes referred to as the long tail of science. The rapidly increasing, sheer volume of these long tail data present one aspect of the Big Data problem: how does one more easily access, discover, use, and reuse long tail data to lead to new multidisciplinary collaborative research and scientific advancement? In this paper, we describe Data Bridge, a new e-science collaboration environment that will realize the potential of long tail data by implementing algorithms and tools to more easily enable data discoverability and reuse. Data Bridge will define different types of semantic bridges that link diverse datasets by applying a set of sociometric network analysis (SNA) and relevance algorithms. We will measure relevancy by examining different ways datasets can be related to each other: data to data, user to data, and method to data connections. Through analysis of metadata and ontology, by pattern analysis and feature extraction, through usage tools and models, and via human connections, Data Bridge will create an environment for long tail data that is greater than the sum of its parts. In the projects initial phase, we will test and validate the new tools with real-world data contained in the Data verse Network, the largest social science data repository. In this short paper, we discuss the background and vision for the Data Bridge project, and present an introduction to the proposed SNA algorithms and analytical tools that are relevant for discoverability of long tail science data.

2013 Workshop Series on Big Data Benchmarking, WBDB.cn and WBDB.us. | 2013

Big Data Operations: Basis for Benchmarking a Data Grid

Arcot Rajasekar; Reagan Moore; Shu Huang; Yufeng Xin

Data Operations over the wide area network are very complex. The end-to-end implementations vary significantly in their efficiency, failure recovery and transactional management. Benchmarking for these operations is vital as we go forward given the exponential growth in data size. The critical evaluation of the types of data operations performed within large-scale data management systems and the comparison of the efficiency of the operations across implementations is an appropriate topic for benchmarking in a big data framework. In this paper, we identify the various operations that are important in large-scale data management and discuss a few of these in terms of data grid benchmarking. These operations form a set of core abstractions that can define interactions with big data systems by domain-centric scientific or business workflow applications. We chose these operational abstractions from our experience in dealing with large-scale distributed systems and with data-intensive computation.

international geoscience and remote sensing symposium | 2010

Cyber infrastructure for Community Remote Sensing

Arcot Rajasekar; Reagan Moore; Mike Wan; Wayne Schroeder

Community Remote Sensing (CRS) is an emerging field where information is collected about the environment by the general public and then integrated into collections to provide a holistic view of the environment with local details. We argue the need for a common architecture for the cyber-infrastructure that will be necessary to cater to the needs of Community Remote Sensing systems. We identify the challenges that such a cyber infrastructure (CRS-CI) has to meet and also proposed five principles as solutions to meet these challenges. Finally, we also describe the integrated Rule Oriented Data System, a data grid middleware that is built upon these principles which provides an ideal and exemplar implementation for CRS-CI.

Explore More