Sriram Raghavan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sriram Raghavan is active.

Explore More

Publication

Featured researches published by Sriram Raghavan.

international conference on management of data | 2009

SystemT: a system for declarative information extraction

Rajasekar Krishnamurthy; Yunyao Li; Sriram Raghavan; Frederick R. Reiss; Shivakumar Vaithyanathan; Huaiyu Zhu

As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) -- the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammarbased extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe our extraction algebra and demonstrate the effectiveness of our optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks.

international conference on data engineering | 2008

An Algebraic Approach to Rule-Based Information Extraction

Frederick R. Reiss; Sriram Raghavan; Rajasekar Krishnamurthy; Huaiyu Zhu; Shivakumar Vaithyanathan

Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data.

empirical methods in natural language processing | 2008

Regular Expression Learning for Information Extraction

Yunyao Li; Rajasekar Krishnamurthy; Sriram Raghavan; Shivakumar Vaithyanathan; H. V. Jagadish

Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose ReLIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE.

international conference on management of data | 2006

Avatar semantic search: a database approach to information retrieval

Eser Kandogan; Rajasekar Krishnamurthy; Sriram Raghavan; Shivakumar Vaithyanathan; Huaiyu Zhu

We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.

international world wide web conferences | 2007

Navigating the intranet with high precision

Huaiyu Zhu; Sriram Raghavan; Shivakumar Vaithyanathan; Alexander Löser

Despite the success of web search engines, search over large enterprise intranets still suffers from poor result quality. Earlier work [6] that compared intranets and the Internet from the view point of keyword search has pointed to several reasons why the search problem is quite different in these two domains. In this paper, we address the problem of providing high quality answers to navigational queries in the intranet (e.g., queries intended to find product or personal home pages, service pages, etc.). Our approach is based on offline identification of navigational pages, intelligent generation of term-variants to associate with each page, and the construction of separate indices exclusively devoted to answering navigational queries. Using a testbed of 5.5M pages from the IBM intranet, we present evaluation results that demonstrate that for navigational queries, our approach of using custom indices produces results of significantly higher precision than those produced by a general purpose search algorithm.

international conference on data engineering | 2015

Indexing and matching trajectories under inconsistent sampling rates

Sayan Ranu; Deepak P; Aditya Telang; Prasad M. Deshpande; Sriram Raghavan

Quantifying the similarity between two trajectories is a fundamental operation in analysis of spatio-temporal databases. While a number of distance functions exist, the recent shift in the dynamics of the trajectory generation procedure violates one of their core assumptions; a consistent and uniform sampling rate. In this paper, we formulate a robust distance function called Edit Distance with Projections (EDwP) to match trajectories under inconsistent and variable sampling rates through dynamic interpolation. This is achieved by deploying the idea of projections that goes beyond matching only the sampled points while aligning trajectories. To enable efficient trajectory retrievals using EDwP, we design an index structure called TrajTree. TrajTree derives its pruning power by employing the unique combination of bounding boxes with Lipschitz embedding. Extensive experiments on real trajectory databases demonstrate EDwP to be up to 5 times more accurate than the state-of-the-art distance functions. Additionally, TrajTree increases the efficiency of trajectory retrievals by up to an order of magnitude over existing techniques.

symposium on principles of database systems | 2010

Understanding queries in a search database system

Ronald Fagin; Benny Kimelfeld; Yunyao Li; Sriram Raghavan; Shivakumar Vaithyanathan

It is well known that a search engine can significantly benefit from an auxiliary database, which can suggest interpretations of the search query by means of the involved concepts and their interrelationship. The difficulty is to translate abstract notions like concept and interpretation into a concrete search algorithm that operates over the auxiliary database. To surpass existing heuristics, there is a need for a formal basis, which is realized in this paper through the framework of a search database system, where an interpretation is identified as a parse. It is shown that the parses of a query can be generated in polynomial time in the combined size of the input and the output, even if parses are restricted to those having a nonempty evaluation. Identifying that one parse is more specific than another is important for ranking answers, and this framework captures the precise semantics of being more specific; moreover, performing this comparison between parses is tractable. Lastly, the paper studies the problem of finding the most specific parses. Unfortunately, this problem turns out to be intractable in the general case. However, under reasonable assumptions, the parses can be enumerated in an order of decreasing specificity, with polynomial delay and polynomial space.

international conference on data mining | 2014

Inferring Uncertain Trajectories from Partial Observations

Prithu Banerjee; Sayan Ranu; Sriram Raghavan

The explosion in the availability of GPS-enabled devices has resulted in an abundance of trajectory data. In reality, however, majority of these trajectories are collected at a low sampling rate and only provide partial observations on their actually traversed routes. Consequently, they are mired with uncertainty. In this paper, we develop a technique called Infer Tra to infer uncertain trajectories from network-constrained partial observations. Rather than predicting the most likely route, the inferred uncertain trajectory takes the form of an edge-weighted graph and summarizes all probable routes in a holistic manner. For trajectory inference, Infer Tra employs Gibbs sampling by learning a Network Mobility Model (NMM) from a database of historical trajectories. Extensive experiments on real trajectory databases show that the graph-based approach of Infer Tra is up to 50% more accurate, 20 times faster, and immensely more versatile than state-of-the-art techniques.

symposium on principles of database systems | 2011

Rewrite rules for search database systems

Ronald Fagin; Benny Kimelfeld; Yunyao Li; Sriram Raghavan; Shivakumar Vaithyanathan

The results of a search engine can be improved by consulting auxiliary data. In a search database system, the association between the user query and the auxiliary data is driven by rewrite rules that augment the user query with a set of alternative queries. This paper develops a framework that formalizes the notion of a rewrite program, which is essentially a collection of hedge-rewriting rules. When applied to a search query, the rewrite program produces a set of alternative queries that constitutes a least fixpoint (lfp). The main focus of the paper is on the lfp-convergence of a rewrite program, where a rewrite program is lfp-convergent if the least fixpoint of every search query is finite. Determining whether a given rewrite program is lfp-convergent is undecidable; to accommodate that, the paper proposes a safety condition, and shows that safety guarantees lfp-convergence, and that safety can be decided in polynomial time. The effectiveness of the safety condition in capturing lfp-convergence is illustrated by an application to a rewrite program in an implemented system that is intended for widespread use.

international conference on computer communications | 2015

Trajectory aware macro-cell planning for mobile users

Shubhadip Mitra; Sayan Ranu; Vinay Kolar; Aditya Telang; Arnab Bhattacharya; Ravi Kokku; Sriram Raghavan

We handle the problem of efficient user-mobility driven macro-cell planning in cellular networks. As cellular networks embrace heterogeneous technologies (including long range 3G/4G and short range WiFi, Femto-cells, etc.), most traffic generated by static users gets absorbed by the short-range technologies, thereby increasingly leaving mobile user traffic to macro-cells. To this end, we consider a novel approach that factors in the trajectories of mobile users as well as the impact of city geographies and their associated road networks for macro-cell planning. Given a budget k of base-stations that can be upgraded, our approach selects a deployment that improves the most number of user trajectories. The generic formulation incorporates the notion of quality of service of a user trajectory as a parameter to allow different application-specific requirements, and operator choices. We show that the proposed trajectory utility maximization problem is NP-hard, and design multiple heuristics. We evaluate our algorithms with real and synthetic datasets emulating different city geographies to demonstrate their efficacy. For instance, with an upgrade budget k of 20%, our algorithms perform 3-8 times better in improving the user quality of service on trajectories when compared to greedy location-based base-station upgrades.

Explore More