Rajasekar Krishnamurthy

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rajasekar Krishnamurthy is active.

Explore More

Publication

Featured researches published by Rajasekar Krishnamurthy.

international conference on data engineering | 2011

SystemML: Declarative machine learning on MapReduce

Amol Ghoting; Rajasekar Krishnamurthy; Edwin P. D. Pednault; Berthold Reinwald; Vikas Sindhwani; Shirish Tatikonda; Yuanyuan Tian; Shivakumar Vaithyanathan

MapReduce is emerging as a generic parallel programming paradigm for large clusters of machines. This trend combined with the growing need to run machine learning (ML) algorithms on massive datasets has led to an increased interest in implementing ML algorithms on MapReduce. However, the cost of implementing a large class of ML algorithms as low-level MapReduce jobs on varying data and machine cluster sizes can be prohibitive. In this paper, we propose SystemML in which ML algorithms are expressed in a higher-level language and are compiled and executed in a MapReduce environment. This higher-level language exposes several constructs including linear algebra primitives that constitute key building blocks for a broad class of supervised and unsupervised ML algorithms. The algorithms expressed in SystemML are compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines. We describe and empirically evaluate a number of optimization strategies for efficiently executing these algorithms on Hadoop, an open-source MapReduce implementation. We report an extensive performance evaluation on three ML algorithms on varying data and cluster sizes.

international conference on management of data | 2001

A general technique for querying XML documents using a relational database system

Jayavel Shanmugasundaram; Eugene J. Shekita; Jerry Kiernan; Rajasekar Krishnamurthy; Efstratios Viglas; Jeffrey F. Naughton; Igor Tatarinov

There has been recent interest in using relational database systems to store and query XML documents. Each of the techniques proposed in this context works by (a) creating tables for the purpose of storing XML documents (also called relational schema generation), (b) storing XML documents by shredding them into rows in the created tables, and (c) converting queries over XML documents into SQL queries over the created tables. Since relational schema generation is a physical database design issue -- dependent on factors such as the nature of the data, the query workload and availability of schemas -- there have been many techniques proposed for this purpose. Currently, each relational schema generation technique requires its own query processor to efficiently convert queries over XML documents into SQL queries over the created tables. In this paper, we present an efficient technique whereby the same query-processor can be used for all such relational schema generation techniques. This greatly simplifies the task of relational schema generation by eliminating the need to write a special-purpose query processor for each new solution to the problem. In addition, our proposed technique enables users to query seamlessly across relational data and XML documents. This provides users with unified access to both relational and XML data without them having to deal with separate databases.

international xml database symposium | 2003

XML-to-SQL Query Translation Literature: The State of the Art and Open Problems

Rajasekar Krishnamurthy; Raghav Kaushik; Jeffrey F. Naughton

Recently, the database research literature has seen an explosion of publications with the goal of using an RDBMS to store and/or query XML data. The problems addressed and solved in this area are diverse. This diversity renders it difficult to know how the various results presented fit together, and even makes it hard to know what open problems remain. As a first step to rectifying this situation, we present a classification of the problem space and discuss how almost 40 papers fit into this classification. As a result of this study, we find that some basic questions are still open. In particular, for the XML publishing of relational data and for “schema-based” shredding of XML documents into relations, there is no published algorithm for translating even simple path expression queries (with the axis) into SQL when the XML schema is recursive.

international conference on management of data | 2009

SystemT: a system for declarative information extraction

Rajasekar Krishnamurthy; Yunyao Li; Sriram Raghavan; Frederick R. Reiss; Shivakumar Vaithyanathan; Huaiyu Zhu

As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) -- the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammarbased extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe our extraction algebra and demonstrate the effectiveness of our optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks.

international conference on data engineering | 2008

An Algebraic Approach to Rule-Based Information Extraction

Frederick R. Reiss; Sriram Raghavan; Rajasekar Krishnamurthy; Huaiyu Zhu; Shivakumar Vaithyanathan

Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data.

very large data bases | 2003

Mixed mode XML query processing

Alan Halverson; Josef Burger; Leonidas Galanis; Ameet Kini; Rajasekar Krishnamurthy; Ajith Nagaraja Rao; Feng Tian; Stratis D. Viglas; Yuan Wang; Jeffrey F. Naughton; David J. DeWitt

Querying XML documents typically involves both tree-based navigation and pattern matching similar to that used in structured information retrieval domains. In this paper, we show that for good performance, a native XML query processing system should support query plans that mix these two processing paradigms. We describe our prototype native XML system, and report on experiments demonstrating that even for simple queries, there are a number of options for how to combine tree-based navigation and structural joins based on information retrieval-style inverted lists, and that these options can have widely varying performance. We present ways of transparently using both techniques in a single system, and provide a cost model for identifying efficient combinations of the techniques. Our preliminary experimental results prove the viability of our approach.

empirical methods in natural language processing | 2008

Regular Expression Learning for Information Extraction

Yunyao Li; Rajasekar Krishnamurthy; Sriram Raghavan; Shivakumar Vaithyanathan; H. V. Jagadish

Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose ReLIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE.

international conference on management of data | 2006

Avatar semantic search: a database approach to information retrieval

Eser Kandogan; Rajasekar Krishnamurthy; Sriram Raghavan; Shivakumar Vaithyanathan; Huaiyu Zhu

We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.

IEEE Data(base) Engineering Bulletin | 2015

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Douglas Burdick; Mauricio A. Hernández; Howard Ho; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana Stanoi; Shivakumar Vaithyanathan; Sanjiv Ranjan Das

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. The key technology components that we implemented in Midas and that enable the various financial applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop. We describe our experience in building the Midas system and also outline the key research questions that remain to be addressed towards building a generic, high-level infrastructure for large-scale data integration from public sources.

very large data bases | 2004

Efficient XML-to-SQL query translation: where to add the intelligence?

Rajasekar Krishnamurthy; Raghav Kaushik; Jeffrey F. Naughton

We consider the efficiency of queries generated by XML to SQL translation. We first show that published XML-to-SQL query translation algorithms are suboptimal in that they often translate simple path expressions into complex SQL queries even when much simpler equivalent SQL queries exist. There are two logical ways to deal with this problem. One could generate suboptimal SQL queries using a fairly naive translation algorithm, and then attempt to optimize the resulting SQL; or one could use a more intelligent translation algorithm with the hopes of generating efficient SQL directly. We show that optimizing the SQL after it is generated is problematic, becoming intractable even in simple scenarios; by contrast, designing a translation algorithm that exploits information readily available at translation time is a promising alternative. To support this claim, we present a translation algorithm that exploits translation time information to generate efficient SQL for path expression queries over tree schemas.

Explore More