Is this you? Create Your Porfile

Chaitanya K. Baru

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chaitanya K. Baru is active.

Explore More

Publication

Featured researches published by Chaitanya K. Baru.

international conference on management of data | 1999

XML-based information mediation with MIX

Chaitanya K. Baru; Amarnath Gupta; Bertram Ludäscher; Richard Marciano; Yannis Papakonstantinou; Pavel Velikhov; Vincent Chu

The MIX mediator system, MIX<italic>m</italic>, is developed as part of the MIX Project at the San Diego Supercomputer Center, and the University of California, San Diego.<supscrpt>1</supscrpt> MIX<italic>m</italic> uses XML as the common model for data exchange. Mediator views are expressed in XMAS (<italic>XML Matching And Structuring Language</italic>), a declarative XML query language. To facilitate user-friendly query formulation and for optimization purposes, MIX<italic>m</italic> employs XML DTDs as a structural description (in effect, a “schema”) of the exchanged data. The novel features of the system include:<list><item>Data exchange and integration solely relies on XML, i.e., instance and schema information is represented by XML documents and XML DTDs, respectively. XML queries are denoted in XMAS, which builds upon ideas of languages like XML-QL, MSL, Yat, and UnQL. Additionally, XMAS features powerful grouping and order constructs for generating new integrated XML “objects” from existing ones. </item><item>The graphical user interface BBQ (<italic>Blended Browsing and Querying</italic>) is driven by the mediator view DTD and integrates browsing and querying of XML data. Complex queries can be constructed in an intuitive way, resembling QBE. Due to the nested nature of XML data and DTDs, BBQ provides graphical means to specify the nesting and grouping of query results. </item><item>Query evaluation can be demand-driven, i.e., by the users navigation into the mediated view. </item></list>

Ibm Systems Journal | 1995

DB2 parallel edition

Chaitanya K. Baru; Gilles Fecteau; Ambuj Goyal; Hui-I Hsiao; Anant Jhingran; Sriram Padmanabhan; George P. Copeland; Walter G. Wilson

The rate of increase in database size and response-time requirements has outpaced advancements in processor and mass storage technology. One way to satisfy the increasing demand for processing power and input/output bandwidth in database applications is to have a number of processors, loosely or tightly coupled, serving database requests concurrently. Technologies developed during the last decade have made commercial parallel database systems a reality, and these systems have made an inroad into the stronghold of traditionally mainframe-based large database applications. This paper describes the DB2® Parallel Edition product that evolved from a prototype developed at IBM Research in Hawthorne, New York, and now is being jointly developed with the IBM Toronto laboratory.

international conference on management of data | 1995

An overview of DB2 parallel edition

Chaitanya K. Baru; Gilles Fecteau

In this paper, we describe the architecture and features of DB2 Parallel Edition (PE). DB2 PE belongs to the IBM family of open DB2 client/server database products including DB2/6000, DB2/2, DB2 for HP-UX, and DB2 for the Solaris Operating Environment. DB2 PE employs a shared nothing architecture in which the database system consists of a set of independent logical database nodes. Each logical node represents a collection of system resources including, processes, main memory, disk storage, and communications, managed by an independent database manager. The logical nodes use message passing to exchange data with each other. Tables are partitioned across nodes using a hash partitioning strategy. The cost-based parallel query optimizer takes table partitioning information into account when generating parallel plans for execution by the runtime system. A DB2 PE system can be configured to contain one or more logical nodes per physical processor. For example, the system can be configured to implement one node per processor in a shared-nothing, MPP system or multiple nodes in a symmetric multiprocessor (SMP) system. This paper provides an overview of the storage model, query optimization, runtime system, utilities, and performance of DB2 Parallel Edition.

Technology Conference on Performance Evaluation and Benchmarking | 2012

Setting the Direction for Big Data Benchmark Standards

Chaitanya K. Baru; Milind Bhandarkar; Raghunath Nambiar; Meikel Poess; Tilmann Rabl

The Workshop on Big Data Benchmarking (WBDB2012), held on May 8-9, 2012 in San Jose, CA, served as an incubator for several promising approaches to define a big data benchmark standard for industry. Through an open forum for discussions on a number of issues related to big data benchmarking—including definitions of big data terms, benchmark processes and auditing — the attendees were able to extend their own view of big data benchmarking as well as communicate their own ideas, which ultimately led to the formation of small working groups to continue collaborative work in this area. In this paper, we summarize the discussions and outcomes from this first workshop, which was attended by about 60 invitees representing 45 different organizations, including industry and academia. Workshop attendees were selected based on their experience and expertise in the areas of management of big data, database systems, performance benchmarking, and big data applications. There was consensus among participants about both the need and the opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. Following the model of TPC benchmarks, it was felt that big data benchmarks should not only include metrics for performance, but also price/performance, along with a sound foundation for fair comparison through audit mechanisms. Additionally, the benchmarks should consider several costs relevant to big data systems including total cost of acquisition, setup cost, and the total cost of ownership, including energy cost. The second Workshop on Big Data Benchmarking will be held in December 2012 in Pune, India, and the third meeting is being planned for July 2013 in Xi’an, China.

international conference on computing for geospatial research applications | 2011

OpenTopography: a services oriented architecture for community access to LIDAR topography

Sriram Krishnan; Christopher J. Crosby; Viswanath Nandigam; Minh Q. Phan; Charles Cowart; Chaitanya K. Baru; J. Ramon Arrowsmith

High-resolution topography data acquired with LIDAR (Light Detection and Ranging) remote sensing technology have emerged as a fundamental tool for Earth science research. Because these acquisitions are often undertaken with federal and state funds at significant cost, it is important to maximize the impact of these geospatial data by providing online access to a range of potential users. The National Science Foundation-funded OpenTopography Facility hosted at the San Diego Supercomputer Center (SDSC), has developed a Geospatial Cyberinfrastructure (GCI) to enable online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. Leveraging high performance computational and data storage resources available at SDSC, OpenTopography provides access to terabytes of point cloud data, standard digital elevation models, and Google Earth image data, all co-located with computational resources for higher-level data processing. This paper describes the motivation, goals, and the technical details of the Services Oriented Architecture (SOA) and underlying cyberinfrastructure platform implemented by OpenTopography. The use of an SOA, and the co-location of processing and data resources are unique to the field of LIDAR topography data processing, and lays a foundation for providing an open system for hosting and providing access to data and computational tools for these important scientific data, and is an exemplar for similar large geospatial data and processing community-oriented cyberinfrastructure systems.

IEEE Transactions on Computers | 1989

Database operations in a cube-connected multicomputer system

Chaitanya K. Baru; Ophir Frieder

Parallel architectures for database processing should incorporate parallel CPU as well as parallel I/O (disk access) capability. The need to support parallel I/O gives rise to two important issues - data combination and non-uniform data distribution. Strategies for performing database operations in a cube-connected multicomputer system with parallel I/O are presented in this paper. The cube interconnection subsumes many other structures such as the tree, ring, etc. This property is exploited to efficiently support database operations such as Select, Aggregate, Join, and Project. The strategies presented here are unique in that they account for the non-uniform distribution of data across parallel paths by incorporating data redistribution steps as part of the overall algorithm. The two main data redistribution operations used are tuple balancing and merging. A simple analysis of the join and project operations is carried out assuming non-uniform data distributions. A more detailed simulation and study of issues related to query processing will be carried out as part of the future work.

database and expert systems applications | 1999

XViews: XML views of relational schemas

Chaitanya K. Baru

The Mediation of Information using XML (MIX) project is a joint effort between the the San Diego Supercomputer Center (SDSC) and the Database Lab at the University of California, San Diego where we are investigating the use of XML as the medium for information modeling and information interchange among heterogeneous information sources. Relational databases represent an important type of information source. We discuss issues in providing XML document views of relational schemas. We refer to these as, XViews. We also discuss related work from a project funded by DARPA and the US Patent and Trademark Office where we investigated issues in mapping SGML document type definitions to relational schemas. The work described here reflects initial results from the above mentioned projects.

ieee international conference on cloud computing technology and science | 2010

Evaluation of MapReduce for Gridding LIDAR Data

Sriram Krishnan; Chaitanya K. Baru; Christopher J. Crosby

The MapReduce programming model, introduced by Google, has become popular over the past few years as a mechanism for processing large amounts of data, using shared-nothing parallelism. In this paper, we investigate the use of MapReduce technology for a local gridding algorithm for the generation of Digital Elevation Models (DEM). The local gridding algorithm utilizes the elevation information from LIDAR (Light, Detection, and Ranging) measurements contained within a circular search area to compute the elevation of each grid cell. The method is data parallel, lending itself to implementation using the MapReduce model. Here, we compare our initial C++ implementation of the gridding algorithm to a MapReduce-based implementation, and present observations on the performance (in particular, price/performance) and the implementation complexity. We also discuss the applicability of MapReduce technologies for related applications.

acm international conference on digital libraries | 1999

XML-based information mediation for digital libraries

Chaitanya K. Baru; Vincent Chu; Amarnath Gupta; Bertram Ludäscher; Richard Marciano; Yannis Papakonstantinou; Pavel Velikhov

We demonstrate a prototype distributed architecture for a digital library, using technology being developed under the MIX Project at the San Diego Supercomputer Center (SDSC) and the University of California, San Diego. The architecture is based on XML-based modeling of metadata; use of an XML query language, and associated mediator middleware, to query distributed metadata sources; and the use of a storage system middleware to access distributed, archived data sets.

IWDM | 1988

Join on a Cube: Analysis, Simulation, and Implementation

Chaitanya K. Baru; Ophir Frieder; Dilip D. Kandlur; Mark E. Segal

Our recent research effort has been in studying database processing on a cube connected multicomputer system. This paper discusses one part of our work, viz., the study of the join operation. Novel data redistribution operations are employed to improve the performance of the various database operations including join. Though a simple analysis is provided, the data redistribution operations are, in general, difficult to characterize analytically. Thus, a simulation and implementation was carried out to study the performance of these operations and the join operation. Issues involved in the simulation and implementation and a discussion of the results from both are presented in this paper.

Explore More