Alex Thomo
University of Victoria
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alex Thomo.
symposium on principles of database systems | 2003
Gösta Grahne; Alex Thomo
In this paper we consider general path constraints for semistructured databases. Our general constraints do not suffer from the limitations of the path constraints previously studied in the literature. We investigate the containment of regular path queries under general path constraints. We show that when the path constraints and queries are expressed by words, as opposed to languages, the containment problem becomes equivalent to the word rewrite problem for a corresponding semi-Thue system. Consequently, if the corresponding semi-Thue system has an undecidable word problem, the word query containment problem will be undecidable too. Also, we show that there are word constraints, where the corresponding semi-Thue system has a decidable word rewrite problem, but the general query containment under these word constraints is undecidable. In order to overcome this, we exhibit a large, practical class of word constraints with a decidable general query containment problem.Based on the query containment under constraints, we reason about constrained rewritings -using views- of regular path queries. We give a constructive characterization for computing optimal constrained rewritings using views.
Theoretical Computer Science | 2003
Gösta Grahne; Alex Thomo
Rewriting queries using views is a powerful technique that has applications in query optimization, data integration, data warehousing etc. Query rewriting in relational databases is by now rather well investigated. However, in the framework of semistructured data the problem of rewriting has received much less attention. In this paper we focus on extracting as much information as possible from algebraic rewritings for the purpose of optimizing regular path queries. The cases when we can find a complete exact rewriting of a query using a set a views are very ideal. However, there is always information available in the views, even if this information is only partial. We introduce lower and possibility partial rewritings and provide algorithms for computing them. These rewritings are algebraic in their nature, i.e. we use only the algebraic view definitions for computing the rewritings. This fact makes them a main memory product which can be used for reducing secondary memory and remote access. We give two algorithms for utilizing the partial lower and partial possibility rewritings in the context of query optimization.
Social Network Analysis and Mining | 2013
Nikolay Korovaiko; Alex Thomo
Trust relationships between users in various online communities are notoriously hard to model for computer scientists. It can be easily verified that trying to infer trust based on the social network alone is often inefficient. Therefore, the avenue we explore is applying Data Mining algorithms to unearth latent relationships and patterns from background data. In this paper, we focus on a case where the background data are user ratings for online product reviews. We consider as a testing ground a large dataset provided by Epinions.com that contains a trust network as well as user ratings for reviews on products from a wide range of categories. In order to predict trust we define and compute a critical set of features, which we show to be highly effective in providing the basis for trust predictions. Then, we show that state-of-the-art classifiers can do an impressive job in predicting trust based on our extracted features. For this, we employ a variety of measures to evaluate the classification based on these features. We show that by carefully collecting and synthesizing readily available background information, such as ratings for online reviews, one can accurately predict social links based on trust.
conference on information and knowledge management | 2008
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.
acm symposium on applied computing | 2005
Dan C. Stefanescu; Alex Thomo; Lida Thomo
Nowadays, we are required to deal with more complex data, prime examples of which are data on the Web, XML data, biological data, etc. There are already proposed abstractions to handle these kinds of data, in particular in terms of semistructured data models. A semistructured model conceives a database essentially as a finite directed labeled graph whose nodes represent objects, and whose edges represent relationships between objects. In this paper, we focus on path queries, which are considered the basic querying mechanism for semistructured data. In essence, such queries are used to navigate, or discover paths that conform to specifications captured by regular expressions. In order to make the navigation more useful, we consider generalized path queries, in which the symbols could optionally be weighted by numbers. Such numbers can express a variety of information about the data that the query could possibly match or navigate.Motivated by the plethora of todays applications utilizing Web services and peer-to-peer architectures, we present a distributed algorithm for evaluating generalized path queries. We follow a realistic model with distributed (non-shared) memory and message-passing between processors. An optimal solution to the problem lies in the intersection of ideas related to distributed query evaluation, distributed shortest path computation, and queueing systems.
international workshop on the web and databases | 2000
Gösta Grahne; Alex Thomo
Rewriting queries using views is a powerful technique that has applications in data integration, data warehousing and query optimization. Query rewriting in relational databases is by now rather well investigated. However, in the framework of semistructured data the problem of rewriting has received much less attention. In this paper we identify some difficulties with currently known methods for using rewritings in semistructured databases. We study the problem in a realistic setting, proposed in information integration systems such as the Information Manifold, in which the data sources are modelled as sound views over a global schema. We give a new rewriting, which we call the possibility rewriting, that can be used in pruning the search space when answering queries using views. The possibility rewriting can be computed in time polynomial in the size of the original query and the view definitions. Finally, we show by means of a realistic example that our method can reduce the search space by an order of magnitude.
international conference on database theory | 2007
Gösta Grahne; Alex Thomo; William W. Wadge
In this paper, we introduce preferential regular path queries. These are regular path queries whose symbols are annotated with preference weights for “scaling” up or down the intrinsic importance of matching a symbol against a (semistructured) database edge label. Annotated regular path queries are expressed syntactically as annotated regular expressions. We interpret these expressions in a uniform semiring framework, which allows different semantics specializations for the same syntactic annotations. For our preference queries, we study three important aspects: (1) (progressive) query answering (2) (certain) query answering in LAV data-integration systems, and (3) query containment and equivalence. In all of these, we obtain important positive results, which encourage the use of our preference framework for enhanced querying of semistructured databases.
Annals of Mathematics and Artificial Intelligence | 2006
Gösta Grahne; Alex Thomo
We give a general framework for approximate query processing in semistructured databases. We focus on regular path queries, which are the integral part of most of the query languages for semistructured databases. To enable approximations, we allow the regular path queries to be distorted. The distortions are expressed in the system by using weighted regular expressions, which correspond to weighted regular transducers. After defining the notion of weighted approximate answers we show how to compute them in order of their proximity to the query. In the new approximate setting, query containment has to be redefined in order to take into account the quantitative proximity information in the query answers. For this, we define the approximate containment, and its variants k-containment and reliable contain-ment. Then, we give an optimal algorithm for deciding the k-containment. Regarding the reliable approximate containment, we show that it is polynomial time equivalent to the notorious limitedness problem in distance automata.
foundations of information and knowledge systems | 2004
Gösta Grahne; Alex Thomo
We give a general framework for approximate query processing in semistructured databases. We focus on regular path queries, which are the integral part of most of the query languages for semistructured databases. To enable approximations, we allow the regular path queries to be distorted. The distortions are expressed in the system by using weighted regular expressions, which correspond to weighted regular transducers. After defining the notion of weighted approximate answers we show how to compute them in order of their proximity to the query. In the new approximate setting, query containment has to be redefined in order to take into account the quantitative proximity information in the query answers. For this, we define approximate containment, and its variants k-containment and reliable containment. Then, we give an optimal algorithm for deciding the k-containment. Regarding the reliable approximate containment, we show that it is polynomial time equivalent to the notorious limitedness problem in distance automata.
conference on information and knowledge management | 2009
Marina Barsky; Ulrike Stege; Alex Thomo; Chris Upton
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. All the existing practical algorithms perform random access to the input string, thus requiring that the input be small enough to be kept in main memory. We are the first to present an algorithm which is able to construct suffix trees for input sequences significantly larger than the size of the available main memory. As a proof of concept, we show that our method allows to build the suffix tree for 12GB of real DNA sequences in 26 hours on a single machine with 2GB of RAM. This input is four times the size of the Human Genome, and the construction of suffix trees for inputs of such magnitude was never reported before.