Is this you? Create Your Porfile

Arun Jagatheesan

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arun Jagatheesan is active.

Explore More

Publication

Featured researches published by Arun Jagatheesan.

ieee international conference on high performance computing data and analytics | 2010

DASH: a Recipe for a Flash-based Data Intensive Supercomputer

Jiahua He; Arun Jagatheesan; Sandeep Gupta; Jeffrey L Bennett; Allan Snavely

Data intensive computing can be defined as computation involving large datasets and complicated I/O patterns. Data intensive computing is challenging because there is a five-orders-of-magnitude latency gap between main memory DRAM and spinning hard disks; the result is that an inordinate amount of time in data intensive computing is spent accessing data on disk. To address this problem we designed and built a prototype data intensive supercomputer named DASH that exploits flash-based Solid State Drive (SSD) technology and also virtually aggregated DRAM to fill the latency gap . DASH uses commodity parts including Intel® X25-E flash drives and distributed shared memory (DSM) software from ScaleMP®. The system is highly competitive with several commercial offerings by several metrics including achieved IOPS (input output operations per second), IOPS per dollar of system acquisition cost, IOPS per watt during operation, and IOPS per gigabyte (GB) of available storage. We present here an overview of the design of DASH, an analysis of its cost efficiency, then a detailed recipe for how we designed and tuned it for high data-performance, lastly show that running data-intensive scientific applications from graph theory, biology, and astronomy, we achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures.

international conference on management of data | 2003

Data grid management systems

Arun Jagatheesan; Arcot Rajasekar

Data Grids are being built across the world as the next generation data handling systems to manage peta-bytes of inter organizational data and storage space. A data grid (datagrid) is a logical name space consisting of storage resources and digital entities that is created by the cooperation of autonomous organizations and its users based on the coordination of local and global policies. Data Grid Management Systems (DGMSs) provide services for the confluence of organizations and management of inter-organizational data and resources in the datagrid.The objective of the tutorial is to provide an introduction to the opportunities and challenges of this emerging technology. Novices and experts would benefit from this tutorial. The tutorial would cover introduction, use cases, design philosophies, architecture, research issues, existing technologies and demonstrations. Hands on sessions for the participants to use and feel the existing technologies could be provided based on the availability of internet connections.

international symposium on autonomous decentralized systems | 2001

Brokering based self organizing e-service communities

Sumi Helal; Mei Wang; Arun Jagatheesan; Raja Krithivasan

The rapid evolution of the Internet and its business tools is enabling the transformation and deployment of business processes as highly modular e-services that can be flexibly and dynamically composed to form ad-hoc workflow. This research prepares for the proliferation of automated, Internet-based workflow, by contributing a suite of protocols for self-organizing brokering communities that enables the discovery of relevant e-services. We present protocols and architecture for e-service brokering communities, and discuss their use in workbase, an Internet-based, automated workflow system over e-services. The brokering protocols are based on a three-tier architecture of agents, brokers and superbrokers. We also present an infrastructure for dynamically composing new services from exiting e-services on the Internet. An implementation using JKQML of brokering communities is provided along with the architecture and design of our e-services and workflow concepts.

international conference on e science | 2006

Production Storage Resource Broker Data Grids

Reagan Moore; Sheau Yen Chen; Wayne Schroeder; Arcot Rajasekar; Michael Wan; Arun Jagatheesan

International data grids are now being built that support joint management of shared collections. An emerging strategy is to build multiple independent data grids, each managed by the local institution. The data grids are then federated to enable controlled sharing of files. We examine the management issues associated with maintaining federations of production data grids, including management of access controls, coordinated sharing of name spaces, replication of data between data grids, and expansion of the data grid federation.

high performance distributed computing | 2004

Gridflow description, query, and execution at SCEC using the SDSC matrix

Jonathan Weinberg; Arun Jagatheesan; Allen Ding; Marcio Faerman; Yuanfang Hu

While conventional workflow systems have been around for many years, the deployment of analogous systems onto a grid infrastructure introduces a number of unique questions and challenges. Innovative approaches to grid workflow (gridflow) are needed to leverage the heterogeneity, autonomy, dynamic behavior, and wide-area distribution that characterize grid resources. The Matrix Project carries out research and development to deliver the language descriptions and protocols necessary to build collaborative gridflow management systems for the emerging grid infrastructures. We describe here our activities to date including development of the data grid language (DGL) and the usage of the matrix gridflow management system by the Southern California Earthquake Center (SCEC) to manage its gridflows.

very large data bases | 2003

Grid data management systems & services

Arun Jagatheesan; Reagan Moore; Norman W. Paton; Paul Watson

The Grid is an emerging infrastructure for providing coordinated and consistent access to distributed, heterogeneous computational and information storage resources amongst autonomous organizations. Data grids are being built across the world as the next generation data handling systems for sharing access to data and storage systems within multiple administrative domains. A data grid provides logical name spaces for digital entities and storage resources to create global identifiers that are location independent. Data grid systems provide services on the logical name space for the manipulation, management, and organization of digital entities. Databases are increasingly being used within Grid applications for data and metadata management, and several groups are now developing services for the access and integration of structured data on the Grid. The service-based approach to making data available on the Grid is being encouraged by the adoption of the Open Grid Services Architecture (OGSA), which is bringing about the integration of the Grid with Web Service technologies. The tutorial will introduce the Grid, and examine the requirements, issues and possible solutions for integrating data into the Grid. It will take examples from current systems, in particular the SDSC Storage Resource Broker and the OGSA-Database Access and Integration project.

very large data bases | 2005

Datagridflows: managing long-run processes on datagrids

Arun Jagatheesan; Jonathan Weinberg; Reena Mathew; Allen Ding; Erik Vandekieft; Daniel Moore; Reagan Moore; Lucas Gilbert; Mark Tran; Jeffrey Kuramoto

This paper is an introduction to Datagridflows. Until recently, datagrids were generally considered over-hyped and the associated technologies not widely embraced in the academic community. Today, datagrids have become a reality and an important technology for managing large, unstructured data and storage resources distributed over autonomous administrative domains. The datagrids that are operating in production provide us an idea of new requirements and challenges that will be faced in future datagrid environments. One such requirement is the coordinated execution of long-run data management processes in datagrids. We term these processes as “datagridflows”. This new area provides exciting opportunities and challenges to researchers in distributed computing and distributed databases. This paper is intended to introduce these challenges to other researchers, including those new to grid computing. We provide motivation through discussion of datagridflow requirements and real production scenarios. We introduce current work on datagridflow technologies including the Datagrid Language (DGL) for describing datagridflows in datagrids.

ieee international conference on high performance computing data and analytics | 2010