Davood Rafiei
University of Alberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Davood Rafiei.
international conference on management of data | 1997
Davood Rafiei; Alberto O. Mendelzon
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on time-series data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a query processing algorithm that uses the underlying R-tree index of a multidimensional data set to answer similarity queries efficiently. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We relate our transformations to the general framework for similarity queries of Jagadish et al.
international world wide web conferences | 2000
Davood Rafiei; Alberto O. Mendelzon
Abstract The textual content of the Web enriched with the hyperlink structure surrounding it can be a useful source of information for querying and searching. This paper presents a search process where the input is the URL of a page, and the output is a ranked set of topics on which the page has a reputation. For example, if the input is www.gamelan.com, then a possible output is `Java. We propose several algorithmic formulations of the notion of reputation using simple random walk models of Web-browsing behavior. We give preliminary test results on the effectiveness of these algorithms.
international conference on management of data | 2006
Fan Deng; Davood Rafiei
Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios. Therefore, we target at approximately eliminating duplicates in streaming environments given a limited space. Based on a well-known bitmap sketch, we introduce a data structure, Stable Bloom Filter, and a novel and simple algorithm. The basic idea is as follows: since there is no way to store the whole history of the stream, SBF continuously evicts the stale information so that SBF has room for those more recent elements. After finding some properties of SBF analytically, we show that a tight upper bound of false positive rates is guaranteed. In our empirical study, we compare SBF to alternative methods. The results show that our method is superior in terms of both accuracy and time effciency when a fixed small space and an acceptable false positive rate are given.
international conference on data engineering | 1999
Davood Rafiei
Studies similarity queries for time series data, where similarity is defined in terms of a set of linear transformations on the Fourier series representation of a sequence. We have shown in an earlier work that this set of transformations is rich enough to formulate operations such as moving average and time scaling. In this paper, we present a new algorithm for processing queries that define similarity in terms of multiple transformations instead of a single one. The idea is, instead of searching the index multiple times and each time applying a single transformation, to search the index only once and apply a collection of transformations simultaneously to the index. Our experimental results on both synthetic and real data show that the new algorithm for simultaneously processing multiple transformations is much faster than sequential scanning or index traversal using one transformation at a time. We also examine the possibility of composing transformations in a query or of rewriting a query expression such that the resulting query can be efficiently evaluated.
international world wide web conferences | 2010
Davood Rafiei; Krishna Bharat; Anand Shukla
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how diversity interplays with quality and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14% to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.
IEEE Transactions on Knowledge and Data Engineering | 2000
Davood Rafiei; Alberto O. Mendelzon
We study similarity queries for time series data where similarity is defined, in a fairly general way, in terms of a distance function and a set of affine transformations on the Fourier series representation of a sequence. We identify a safe set of transformations supporting a wide variety of comparisons and show that this set is rich enough to formulate operations such as moving average and time scaling. We also show that queries expressed using safe transformations can efficiently be computed without prior knowledge of the transformations. We present a query processing algorithm that uses the underlying multidimensional index built over the data set to efficiently answer similarity queries. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We propose a generalization of this algorithm for simultaneously handling multiple transformations at a time, and give experimental results on the performance of the generalized algorithm.
ieee visualization | 2005
Davood Rafiei
We study the problem of visualizing large networks and develop techniques for effectively abstracting a network and reducing the size to a level that can be clearly viewed. Our size reduction techniques are based on sampling, where only a sample instead of the full network is visualized. We propose a randomized notion of focus that specifies a part of the network and the degree to which it needs to be magnified. Visualizing a sample allows our method to overcome the scalability issues inherent in visualizing massive networks. We report some characteristics that frequently occur in large networks and the conditions under which they are preserved when sampling from a network. This can be useful in selecting a proper sampling scheme that yields a sample with similar characteristics as the original network. Our method is built on top of a relational database, thus it can be easily and efficiently implemented using any off-the-shelf database software. As a proof of concept, we implement our methods and report some of our experiments over the movie database and the connectivity graph of the Web.
Social Network Analysis and Mining | 2014
Aibek Makazhanov; Davood Rafiei; Muhammad Waqar
We study the problem of predicting the political preference of users on the Twitter network, showing that the political preference of users can be predicted from their Twitter behavior towards political parties. We show this by building prediction models based on a variety of contextual and behavioral features, training the models by resorting to a distant supervision approach and considering party candidates to have a predefined preference towards their respective parties. A language model for each party is learned from the content of the tweets by the party candidates, and the preference of a user is assessed based on the alignment of user tweets with the language models of the parties. We evaluate our work in the context of two real elections: 2012 Albertan and 2013 Pakistani general elections. In both cases, we show that our model outperforms, in terms of the F-measure, sentiment and text classification approaches and is at par with the human annotators. We further use our model to analyze the preference changes over the course of the election campaign and report results that would be difficult to attain by human annotators.
very large data bases | 2008
Reza Sherkat; Davood Rafiei
We study the problem of efficiently evaluating similarity queries on histories, where a history is a d-dimensional time series for d ≥ 1. While there are some solutions for time-series and spatio-temporal trajectories where typically d ≤ 3, we are not aware of any work that examines the problem for larger values of d. In this paper, we address the problem in its general case and propose a class of summaries for histories with a few interesting properties. First, for commonly used distance functions such as the Lp-norm, LCSS, and DTW, the summaries can be used to efficiently prune some of the histories that cannot be in the answer set of the queries. Second, histories can be indexed based on their summaries, hence the qualifying candidates can be efficiently retrieved. To further reduce the number of unnecessary distance computations for false positives, we propose a finer level approximation of histories, and an algorithm to find an approximation with the least maximum distance estimation error. Experimental results confirm that the combination of our feature extraction approaches and the indexability of our summaries can improve upon existing methods and scales up for larger values of d and database sizes, based on our experiments on real and synthetic datasets of 17-dimensional histories.
database and expert systems applications | 2006
Davood Rafiei; Daniel L. Moise; Dabo Sun
Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description