Vinayak R. Borkar
University of California, Irvine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vinayak R. Borkar.
Distributed and Parallel Databases | 2011
Alexander Behm; Vinayak R. Borkar; Michael J. Carey; Raman Grover; Chen Li; Nicola Onose; Rares Vernica; Alin Deutsch; Yannis Papakonstantinou; Vassilis J. Tsotras
ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal—the storage and analysis of data pertaining to evolving-world models. We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.
extending database technology | 2011
Foto N. Afrati; Vinayak R. Borkar; Michael J. Carey; Neoklis Polyzotis; Jeffrey D. Ullman
We survey the recent wave of extensions to the popular map-reduce systems, including those that have begun to address the implementation of recursive queries using the same computing environment as map-reduce. A central problem is that recursive tasks cannot deliver their output only at the end, which makes recovery from failures much more complicated than in map-reduce and its nonrecursive extensions. We propose several algorithmic ideas for efficient implementation of recursions in the map-reduce environment and discuss several alternatives for supporting recovery from failures without restarting the entire job.
very large data bases | 2014
Yingyi Bu; Vinayak R. Borkar; Jianfeng Jia; Michael J. Carey; Tyson Condie
There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by process-centric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15X speedup compared to Apache Giraph and up to 35X speedup compared to distributed GraphLab), and more effective use of available machine resources to support Big(ger) Graph Analytics.
international symposium on memory management | 2013
Yingyi Bu; Vinayak R. Borkar; Guoqing Xu; Michael J. Carey
Over the past decade, the increasing demands on data-driven business intelligence have led to the proliferation of large-scale, data-intensive applications that often have huge amounts of data (often at terabyte or petabyte scale) to process. An object-oriented programming language such as Java is often the developers choice for implementing such applications, primarily due to its quick development cycle and rich community resource. While the use of such languages makes programming easier, significant performance problems can often be seen --- the combination of the inefficiencies inherent in a managed run-time system and the impact of the huge amount of data to be processed in the limited memory space often leads to memory bloat and performance degradation at a surprisingly early stage. This paper proposes a bloat-aware design paradigm towards the development of efficient and scalable Big Data applications in object-oriented GC enabled languages. To motivate this work, we first perform a study on the impact of several typical memory bloat patterns. These patterns are summarized from the user complaints on the mailing lists of two widely-used open-source Big Data applications. Next, we discuss our design paradigm to eliminate bloat. Using examples and real-world experience, we demonstrate that programming under this paradigm does not incur significant programming burden. We have implemented a few common data processing tasks both using this design and using the conventional object-oriented design. Our experimental results show that this new design paradigm is extremely effective in improving performance --- even for the moderate-size data sets processed, we have observed 2.5x+ performance gains, and the improvement grows substantially with the size of the data set.
very large data bases | 2009
Roger J. Bamford; Vinayak R. Borkar; Matthias Brantner; Peter Fischer; Daniela Florescu; David Graf; Donald Kossmann; Tim Kraska; Dan Muresan; Sorin Nasoi; Markos Zacharioudakis
This paper describes a number of XQuery-related projects. Its goal is to show that XQuery is a useful tool for many different application scenarios. In particular, this paper tries to correct a common myth that XQuery is merely a query language and that SQL is the better query language. Instead, XQuery is a full-fledged programming language for Web applications and services. Furthermore, this paper tries to correct a second myth that XQuery is slow. This paper gives an overview of the state-of-the-art in XQuery implementation and optimization techniques and discusses one particular open-source XQuery processor, Zorba, in more detail. Among others, this paper presents an XQuery Benchmark Service which helps practitioners and XQuery processor vendors to find performance problems in an XQuery processor.
data and knowledge engineering | 2003
Yannis Papakonstantinou; Vinayak R. Borkar; Maxim Orgiyan; Konstantinos Stathatos; Lucian Suta; Vasilis Vassalos; Pavel Velikhov
We describe the Enosys XML integration platform, focusing on the query language, algebra, and architecture of its query processor. The platform enables the development of eBusiness applications in customer relationship management, e-commerce, supply chain management, and decision support. These applications often require that data be integrated dynamically from multiple information sources. The Enosys platform allows one to build (virtual and/or materialized) integrated XML views of multiple sources, using XML queries as view definitions. During run-time, the application issues XML queries against the views. Queries and views are translated into the XCQL algebra and are combined into a single algebra expression/plan. Query plan composition and query plan decomposition challenges are faced in this process. Finally, the query processor lazily evaluates the result, using an appropriate adaptation of relational database iterator models to XML. The paper describes the platform architecture and components, the supported XML query language and the query processor architecture. It focuses on the underlying XML query algebra, which differs from the algebras that have been considered by W3C in that it is particularly tuned to semistructured data and to optimization and efficient evaluation in a system that follows the conventional architecture of database systems.
International Journal of Web Services Research | 2006
Vinayak R. Borkar; Michael J. Carey; Nitin Mangtani; Denny McKinney; Rahul Patel; Sachin Thatte
In this paper, we address the question, “In the brave new world of Web services and service-oriented architectures (SOA), how does data fit in?†We bring data modeling concepts to bear on the world of services, yielding an approach in which enterprise data access is handled by a collection of interrelated data services. We show how the approach can be realized on a foundation of XML standards, namely XML Schema, Web services, and XQuery. We show that this approach provides a uniform and declarative framework for integrating enterprise data assets that are drawn from disparate underlying sources, including both queryable and non-queryable data sources as well as data that is encapsulated by Web services. We also explain how the approach yields data services that are easily and efficiently reusable.
international conference on data engineering | 2006
Sunil Pradyumna Jigyasu; Sujeet Banerjee; Vinayak R. Borkar; Michael J. Carey; Kanad Pravin Dixit; Anil Malkani; Sachin Thatte
SQL has long been the standard language for retrieving and manipulating data in relational database systems. XML has become the standard format for data exchange, and XQuery is on its way to becoming the standard language for querying XML data. The BEA AquaLogic Data Services Platform provides a service-oriented, XML-based view of heterogeneous enterprise data sources and allows this view to be queried using XQuery. AquaLogic DSP includes a JDBC driver that connects the old (SQL) world with the new (XML) world via a SQL-to-XQuery translator. This paper outlines the issues related to creating such a driver and details the approach used to translate SQL queries into XQuery expressions. The paper also touches on performance considerations related to handling XML query results in a context where JDBC result sets are the desired output format.
international conference on data engineering | 2008
Vinayak R. Borkar; Michael J. Carey; Daniel Engovatov; Dmitry Lychagin; Till Westmann; Warren Wong
The AquaLogic Data Services Platform (ALDSP) is a BEA middleware platform for creating services that access and manipulate information drawn from multiple heterogeneous sources of data. The integration logic for read services is specified declaratively using the XQuery language. ALDSP 3.0, available in December 2007, includes a new XQuery-based Scripting Extension - XQSE - that enables developers to write procedural as well as declarative logic without leaving the XQuery world. In this paper, we describe the XQSE extensions to XQuery and show how they help to support important new classes of data services in ALDSP 3.0.
international conference on data engineering | 2009
Michael Blow; Vinayak R. Borkar; Michael J. Carey; Christopher James Hillery; Alexander Kotopoulis; Dmitry Lychagin; Radu Preotiuc-Pietro; Panagiotis Reveliotis; Joshua Spiegel; Till Westmann
The BEA AquaLogic Data Services Platform (ALDSP) is a middleware platform for creating services that integrate and manipulate information from disparate enterprise data sources. This paper provides a technical overview of the all-new update support in ALDSP 3.0, released in January 2008. It describes the update side of data services, our unique model for making update automation transparent and flexible, and the use of the XQuery Scripting Extension (XQSE) for further customizing the systems default handling of updates. It also gives an overview of the ALDSP update processing machinery, including the automatic generation of update maps from read functions, translation of update maps into Update Virtual Machine (UVM) programs, the UVM instruction interpreter, and SQL generation for updates to data drawn from relational data sources.