Is this you? Create Your Porfile

Vagelis Hristidis

University of California, Riverside

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vagelis Hristidis is active.

Explore More

Publication

Featured researches published by Vagelis Hristidis.

very large data bases | 2002

Discover: keyword search in relational databases

Vagelis Hristidis; Yannis Papakonstantinou

DISCOVER operates on relational databases and facilitates information discovery on them by allowing its user to issue keyword queries without any knowledge of the database schema or of SQL. DISCOVER returns qualified joining networks of tuples, that is, sets of tuples that are associated because they join on their primary and foreign keys and collectively contain all the keywords of the query. DISCOVER proceeds in two steps. First the Candidate Network Generator generates all candidate networks of relations, that is, join expressions that generate the joining networks of tuples. Then the Plan Generator builds plans for the efficient evaluation of the set of candidate networks, exploiting the opportunities to reuse common subexpressions of the candidate networks. We prove that DISCOVER finds without redundancy all relevant candidate networks, whose size can be data bound, by exploiting the structure of the schema. We prove that the selection of the optimal execution plan (way to reuse common subexpressions) is NP-complete. We provide a greedy algorithm and we show that it provides near-optimal plan execution time cost. Our experimentation also provides hints on tuning the greedy algorithm.

very large data bases | 2004

Objectrank: authority-based keyword search in databases

Andrey Balmin; Vagelis Hristidis; Yannis Papakonstantinou

The ObjectRank system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authority with respect to the particular keywords. One can adjust the weight of global importance, the weight of each keyword of the query, the importance of a result actually containing the keywords versus being referenced by nodes containing them, and the volume of authority flow via each type of semantic connection. Novel performance challenges and opportunities are addressed. First, schemas impose constraints on the graph, which are exploited for performance purposes. Second, in order to address the issue of authority ranking with respect to the given keywords (as opposed to Googles global PageRank) we precompute single keyword ObjectRanks and combine them during run time. We conducted user surveys and a set of performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of ObjectRank.

international conference on data engineering | 2008

Keyword Search on Spatial Databases

I. De Felipe; Vagelis Hristidis; Naphtali Rishe

Many applications require finding objects closest to a specified location that contains a set of keywords. For example, online yellow pages allow users to specify an address and a set of keywords. In return, the user obtains a list of businesses whose description contains these keywords, ordered by their distance from the specified address. The problems of nearest neighbor search on spatial data and keyword search on text data have been extensively studied separately. However, to the best of our knowledge there is no efficient method to answer spatial keyword queries, that is, queries that specify both a location and a set of keywords. In this work, we present an efficient method to answer top-k spatial keyword queries. To do so, we introduce an indexing structure called IR2-Tree (Information Retrieval R-Tree) which combines an R-Tree with superimposed text signatures. We present algorithms that construct and maintain an IR2-Tree, and use it to answer top-k spatial keyword queries. Our algorithms are experimentally compared to current methods and are shown to have superior performance and excellent scalability.

international conference on data engineering | 2003

Keyword proximity search on XML graphs

Vagelis Hristidis; Yannis Papakonstantinou; Andrey Balmin

XKeyword provides efficient keyword proximity queries on large XML graph databases. A query is simply a list of keywords and does not require any schema or query language knowledge for its formulation. XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the graphs schema. In particular, XKeyword consists of two stages. In the preprocessing stage a set of keyword indices are built along with indexed path relations that describe particular patterns of paths in the graph. In the query processing stage plans are developed that use a near optimal set of path relations to efficiently locate the keyword query results. The results are presented graphically using the novel idea of interactive result graphs, which are populated on-demand according to the users navigation and allow efficient information discovery. We provide theoretical and experimental points for the selection of the appropriate set of precomputed path relations. We also propose and experimentally evaluate algorithms to minimize the number of queries sent to the database to output the top-K results.

web search and data mining | 2012

Correlating financial time series with micro-blogging activity

Eduardo J. Ruiz; Vagelis Hristidis; Carlos Castillo; Aristides Gionis; Alejandro Jaimes

We study the problem of correlating micro-blogging activity with stock-market events, defined as changes in the price and traded volume of stocks. Specifically, we collect messages related to a number of companies, and we search for correlations between stock-market events for those companies and features extracted from the micro-blogging messages. The features we extract can be categorized in two groups. Features in the first group measure the overall activity in the micro-blogging platform, such as number of posts, number of re-posts, and so on. Features in the second group measure properties of an induced interaction graph, for instance, the number of connected components, statistics on the degree distribution, and other graph-based properties. We present detailed experimental results measuring the correlation of the stock market events with these features, using Twitter as a data source. Our results show that the most correlated features are the number of connected components and the number of nodes of the interaction graph. The correlation is stronger with the traded volume than with the price of the stock. However, by using a simulator we show that even relatively small correlations between price and micro-blogging features can be exploited to drive a stock trading strategy that outperforms other baseline strategies.

IEEE Transactions on Knowledge and Data Engineering | 2006

Keyword proximity search in XML trees

Vagelis Hristidis; Nick Koudas; Yannis Papakonstantinou; Divesh Srivastava

Recent works have shown the benefits of keyword proximity search in querying XML documents in addition to text documents. For example, given query keywords over Shakespeares plays in XML, the user might be interested in knowing how the keywords cooccur. In this paper, we focus on XML trees and define XML keyword, proximity queries to return the (possibly heterogeneous) set of minimum connecting trees (MCTs) of the matches to the individual keywords in the query. We consider efficiently executing keyword proximity queries on labeled trees (XML) in various settings: 1) when the XML database has been preprocessed and 2) when no indices are available on the XML database. We perform a detailed experimental evaluation to study the benefits of our approach and show that our algorithms considerably outperform prior algorithms and other applicable approaches.

very large data bases | 2004

Probabilistic ranking of database query results

Surajit Chaudhuri; Gautam Das; Vagelis Hristidis; Gerhard Weikum

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system.

statistical and scientific database management | 2009

Experiences on Processing Spatial Data with MapReduce

Ariel Cary; Zhengguo Sun; Vagelis Hristidis; Naphtali Rishe

The amount of information in spatial databases is growing as more data is made available. Spatial databases mainly store two types of data: raster data (satellite/aerial digital images), and vector data (points, lines, polygons). The complexity and nature of spatial databases makes them ideal for applying parallel processing. MapReduce is an emerging massively parallel computing model, proposed by Google. In this work, we present our experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively. We present our results on the scalability of MapReduce, and the effect of parallelism on the quality of the results. Our algorithms were executed on a Google&IBM cluster, which became available to us through an NSF-supported program. The cluster supports the Hadoop framework --- an open source implementation of MapReduce. Our results confirm the excellent scalability of the MapReduce framework in processing parallelizable problems.

ACM Transactions on Database Systems | 2008

Authority-based keyword search in databases

Vagelis Hristidis; Heasoo Hwang; Yannis Papakonstantinou

Our system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Three ranking factors are used: the relevance to the query, the specificity and the importance of the result. All factors are handled using authority-flow techniques that exploit the link-structure of the data graph, in contrast to traditional Information Retrieval. We address the performance challenges in computing the authority flows in databases by using precomputation and exploiting the database schema if present. We conducted user surveys and performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of our system.

ACM Transactions on Database Systems | 2006

Probabilistic information retrieval approach for ranking of database query results

Surajit Chaudhuri; Gautam Das; Vagelis Hristidis; Gerhard Weikum

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from information retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.

Explore More