Ramakrishna Varadarajan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramakrishna Varadarajan is active.

Explore More

Publication

Featured researches published by Ramakrishna Varadarajan.

conference on information and knowledge management | 2006

A system for query-specific document summarization

Ramakrishna Varadarajan; Vagelis Hristidis

There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem, which has received little attention. We present a method to create query-specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document. In particular, we first add structure to the documents in the preprocessing stage and convert them to document graphs. Then, the best summaries are computed by calculating the top spanning trees on the document graphs. We present and experimentally evaluate efficient algorithms that support computing summaries in interactive time. Furthermore, the quality of our summarization method is compared to current approaches using a user survey.

conference on information and knowledge management | 2005

Structure-based query-specific document summarization

Ramakrishna Varadarajan; Vagelis Hristidis

Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequences of words and create query-independent summaries. However, ignoring the structure of the document degrades the quality of summaries. Furthermore, the popularity of web search engines requires query-specific summaries. We present a method to create query-specific summaries by adding structure to documents by extracting associations between their fragments.

extending database technology | 2009

Flexible and efficient querying and ranking on hyperlinked data sources

Ramakrishna Varadarajan; Vagelis Hristidis; Louiqa Raschid; Maria-Esther Vidal; Luis-Daniel Ibanez; Héctor Rodríguez-Drumond

There has been an explosion of hyperlinked data in many domains, e.g., the biological Web. Expressive query languages and effective ranking techniques are required to convert this data into browsable knowledge. We propose the Graph Information Discovery (GID) framework to support sophisticated user queries on a rich web of annotated and hyperlinked data entries, where query answers need to be ranked in terms of some customized ranking criteria, e.g., PageRank or ObjectRank. GID has a data model that includes a schema graph and a data graph, and an intuitive query interface. The GID framework allows users to easily formulate queries consisting of sequences of hard filters (selection predicates) and soft filters (ranking criteria); it can also be combined with other specialized graph query languages to enhance their ranking capabilities. GID queries have a well-defined semantics and are implemented by a set of physical operators, each of which produces a ranked result graph. We discuss rewriting opportunities to provide an efficient evaluation of GID queries. Soft filters are a key feature of GID and they are implemented using authority flow ranking techniques; these are query dependent rankings and are expensive to compute at runtime. We present approximate optimization techniques for GID soft filter queries based on the properties of random walks, and using novel path-length-bound and graph-sampling approximation techniques. We experimentally validate our optimization techniques on large biological and bibliographic datasets. Our techniques can produce high quality (Top K) answers with a savings of up to an order of magnitude, in comparison to the evaluation time for the exact solution.

international conference on data engineering | 2013

Materialization strategies in the Vertica analytic database: Lessons learned

Lakshmikant Shrinivas; Sreenath Bodagala; Ramakrishna Varadarajan; Ariel Cary; Vivek Bharathan; Chuck Bear

Column store databases allow for various tuple reconstruction strategies (also called materialization strategies). Early materialization is easy to implement but generally performs worse than late materialization. Late materialization is more complex to implement, and usually performs much better than early materialization, although there are situations where it is worse. We identify these situations, which essentially revolve around joins where neither input fits in memory (also called spilling joins). Sideways information passing techniques provide a viable solution to get the best of both worlds. We demonstrate how early materialization combined with sideways information passing allows us to get the benefits of late materialization, without the bookkeeping complexity or worse performance for spilling joins. It also provides some other benefits to query processing in Vertica due to positive interaction with compression and sort orders of the data. In this paper, we report our experiences with late and early materialization, highlight their strengths and weaknesses, and present the details of our sideways information passing implementation. We show experimental results of comparing these materialization strategies, which highlight the significant performance improvements provided by our implementation of sideways information passing (up to 72% on some TPC-H queries).

international conference on data engineering | 2014

DBDesigner: A customizable physical design tool for Vertica Analytic Database

Ramakrishna Varadarajan; Vivek Bharathan; Ariel Cary; Jaimin Mukesh Dave; Sreenath Bodagala

In this paper, we present Verticas customizable physical design tool, called the DBDesigner (DBD), that produces designs optimized for various scenarios and applications. For a given workload and space budget, DBD automatically recommends a physical design that optimizes query performance, storage footprint, fault tolerance and recovery to meet different customer requirements. Vertica is a distributed, massively parallel columnar database that physically organizes data into projections. Projections are attribute subsets from one or more tables with tuples sorted by one or more attributes, that are replicated or segmented (distributed) on cluster nodes. The key challenges involved in projection design are picking appropriate column sets, sort orders, cluster data distributions and column encodings. To achieve the desired trade-off between query performance and storage footprint, DBD operates under three different design policies: (a) load-optimized, (b) query-optimized or (c) balanced. These policies indirectly control the number of projections proposed and queries optimized to achieve the desired balance. To cater to query workloads that evolve over time, DBD also operates in a comprehensive and incremental design mode. In addition, DBD lets users override specific features of projection design based on their intimate knowledge about the data and query workloads. We present the complete physical design algorithm, describing in detail how projection candidates are efficiently explored and evaluated using optimizers cost and benefit model. Our experimental results show that DBD produces good physical designs that satisfy a variety of customer use cases.

Information Systems | 2013

Comparing top-k XML lists

Ramakrishna Varadarajan; Fernando Farfán; Vagelis Hristidis

Systems that produce ranked lists of results are abundant. For instance, Web search engines return ranked lists of Web pages. There has been work on distance measure for list permutations, like Kendall tau and Spearmans footrule, as well as extensions to handle top-k lists, which are more common in practice. In addition to ranking whole objects (e.g., Web pages), there is an increasing number of systems that provide keyword search on XML or other semistructured data, and produce ranked lists of XML sub-trees. Unfortunately, previous distance measures are not suitable for ranked lists of sub-trees since they do not account for the possible overlap between the returned sub-trees. That is, two sub-trees differing by a single node would be considered separate objects. In this paper, we present the first distance measures for ranked lists of sub-trees, and show under what conditions these measures are metrics. Furthermore, we present algorithms to efficiently compute these distance measures. Finally, we evaluate and compare the proposed measures on real data using three popular XML keyword proximity search systems.

web age information management | 2012

WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction

Vijil Chenthamarakshan; Ramakrishna Varadarajan; Prasad M. Deshpande; Raghuram Krishnapuram; Knut Stolze

The visual layout of a webpage can provide valuable clues for certain types of Information Extraction (IE) tasks. In traditional rule based IE frameworks, these layout cues are mapped to rules that operate on the HTML source of the webpages. In contrast, we have developed a framework in which the rules can be specified directly at the layout level. This has many advantages, since the higher level of abstraction leads to simpler extraction rules that are largely independent of the source code of the page, and, therefore, more robust. It can also enable specification of new types of rules that are not otherwise possible. To the best of our knowledge, there is no general framework that allows declarative specification of information extraction rules based on spatial layout. Our framework is complementary to traditional text based rules framework and allows a seamless combination of spatial layout based rules with traditional text based rules. We describe the algebra that enables such a system and its efficient implementation using standard relational and text indexing features of a relational database. We demonstrate the simplicity and efficiency of this system for a task involving the extraction of software system requirements from software product pages.

international acm sigir conference on research and development in information retrieval | 2006

Searching the web using composed pages

Ramakrishna Varadarajan; Vagelis Hristidis; Tao Li

Given a user keyword query, current Web search engines return a list of pages ranked by their “goodness” with respect to the query. However, this technique misses results whose contents are distributed across multiple physical pages and are connected via hyperlinks and frames [3]. That is, it is often the case that no single page contains all query keywords. Li et al. [3] make a first step towards this problem by returning a tree of hyperlinked pages that collectively contain all query keywords. The limitation of this approach is that it operates at the page-level granularity, which ignores the specific context where the keywords are found within the pages. More importantly, it is cumbersome for the user to locate the most desirable tree of pages due to the amount of data in each page tree and a large number of page trees.

IEEE Transactions on Knowledge and Data Engineering | 2010

Using Proximity Search to Estimate Authority Flow

Vagelis Hristidis; Yannis Papakonstantinou; Ramakrishna Varadarajan

Authority flow and proximity search have been used extensively in measuring the association between entities in data graphs, ranging from the web to relational and XML databases. These two ranking factors have been used and studied separately in the past. In addition to their semantic differences, a key advantage of proximity search is the existence of efficient execution algorithms. In contrast, due to the complexity of calculating the authority flow, current systems only use precomputed authority flows in runtime. This limitation prohibits authority flow to be used more effectively as a ranking factor. In this paper, we present a comparative analysis of the two ranking factors. We present an efficient approximation of authority flow based on proximity search. We analytically estimate the approximation error and how this affects the ranking of the results of a query.

international conference on data engineering | 2008