Gösta Grahne | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gösta Grahne is active.

Explore More

Publication

Featured researches published by Gösta Grahne.

IEEE Transactions on Knowledge and Data Engineering | 2005

Fast algorithms for frequent itemset mining using FP-trees

Gösta Grahne; Jianfei Zhu

Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.

Theoretical Computer Science | 1991

On the representation and querying of sets of possible worlds

Serge Abiteboul; Paris C. Kanellakis; Gösta Grahne

Abstract We represent a set of possible worlds using an incomplete information database. The representation techniques that we study range from the very simple Codd-table (a relation over constants and uniquely occurring variables called nulls) to much more complex mechanisms involving views of conditioned-tables (programs applied to Codd-tables augmented by equality and inequality conditions). (1) We provide matching upper and lower bounds on the data-complexity of testing containment, membership and uniqueness for sets of possible worlds. We fully classify these problems with respect to our representations. (2) We investigate the data-complexity of querying incomplete information databases for both possible and certain facts. For each fixed positive existential query on conditioned-tables we present a polynomial time algorithm solving the possible fact problem. We match this upper bound by two NP-completeness lower bounds, when the fixed query contains either negation or recursion and is applied to Codd-tables. Finally, we show that the certain fact problem is coNP-complete, even for a fixed first order query applied to a Codd-table.

international conference on database theory | 1999

Tableau Techniques for Querying Information Sources through Global Schemas

Gösta Grahne; Alberto O. Mendelzon

The foundational homomorphism techniques introduced by Chandra and Merlin for testing containment of conjunctive queries have recently attracted renewed interest due to their central role in information integration applications. We show that generalizations of the classical tableau representation of conjunctive queries are useful for computing query answers in information integration systems where information sources are modeled as views defined on a virtual global schema. We consider a general situation where sources may or may not be known to be correct and complete. We characterize the set of answers to a global query and give algorithms to compute a finite representation of this possibly infinite set, as well as its certain and possible approximations. We show how to rewrite a global query in terms of the sources in two special cases, and show that one of these is equivalent to the Information Manifold rewriting of Levy et al.

international conference on data engineering | 2000

Efficient mining of constrained correlated sets

Gösta Grahne; Laks V. S. Lakshmanan; Xiaohong Wang

Studies the problem of efficiently computing correlated item sets satisfying given constraints. We call them valid correlated item sets. It turns out that constraints can have subtle interactions with correlated item sets, depending on their underlying properties. We show that, in general, the set of minimal valid correlated item sets does not coincide with that of minimal correlated item sets that are valid, and we characterize classes of constraints for which these sets coincide. We delineate the meaning of these two spaces and give algorithms for computing them. We also give an analytical evaluation of their performance and validate our analysis with a detailed experimental evaluation.

Archive | 1991

The problem of incomplete information in relational databases

Gösta Grahne

Relational databases.- Semantic aspects of incomplete information.- Syntactic and algorithmic aspects of incomplete information.- Computational complexity aspects of incomplete information.- Some conclusive aspects.

symposium on principles of database systems | 2003

Query containment and rewriting using views for regular path queries under constraints

Gösta Grahne; Alex Thomo

In this paper we consider general path constraints for semistructured databases. Our general constraints do not suffer from the limitations of the path constraints previously studied in the literature. We investigate the containment of regular path queries under general path constraints. We show that when the path constraints and queries are expressed by words, as opposed to languages, the containment problem becomes equivalent to the word rewrite problem for a corresponding semi-Thue system. Consequently, if the corresponding semi-Thue system has an undecidable word problem, the word query containment problem will be undecidable too. Also, we show that there are word constraints, where the corresponding semi-Thue system has a decidable word rewrite problem, but the general query containment under these word constraints is undecidable. In order to overcome this, we exhibit a large, practical class of word constraints with a decidable general query containment problem.Based on the query containment under constraints, we reason about constrained rewritings -using views- of regular path queries. We give a constructive characterization for computing optimal constrained rewritings using views.

international conference on data mining | 2004

Mining frequent itemsets from secondary memory

Gösta Grahne; Jianfei Zhu

Mining frequent itemsets is at the core of mining association rules, and is by now quite well understood algorithmically for main memory databases. In this paper, we investigate approaches to mining frequent itemsets when the database or the data structures used in the mining are too large to fit in main memory. Experimental results show that our techniques reduce the required disk accesses by orders of magnitude, and enable truly scalable data mining.

symposium on principles of database systems | 1994

Reasoning about strings in databases

Gösta Grahne; Matti Nykänen; Esko Ukkonen

In order to enable the database programmer to reason about relations over strings of arbitrary length we introduce alignment logic, a modal extension of relational calculus. In addition to relations, a state in the model consists of a two-dimensional array where the strings are aligned on top of each other. The basic modality in the language (a transpose, or “slide”) allows for a rearrangement of the alignment, and more complex formulas can be formed using a syntax reminiscent of regular expressions, in addition to the usual connectives and quantifiers. It turns out that the computational counterpart of the string-based portion of the logic is the class of multitape two-way finite state automata, which are devices particularly well suited for the implementation of string matching. A computational counterpart of the full logic is obtained from relational algebra by extending the selection operator into filters based on these multitape machines. Safety of formulas in alignment logic implies that new strings generated from old ones have to be of bounded length. While an undecidable property in general, this boundedness is decidable for an important subclass of formulas. As far as expressive power is concerned, alignment logic includes previous proposals for querying string databases, and gives full Turing computability. The language can be restricted to define exactly regular sets and sets in the polynomial hierarchy.

Theoretical Computer Science | 2003

Algebraic rewritings for optimizing regular path queries

Gösta Grahne; Alex Thomo

Rewriting queries using views is a powerful technique that has applications in query optimization, data integration, data warehousing etc. Query rewriting in relational databases is by now rather well investigated. However, in the framework of semistructured data the problem of rewriting has received much less attention. In this paper we focus on extracting as much information as possible from algebraic rewritings for the purpose of optimizing regular path queries. The cases when we can find a complete exact rewriting of a query using a set a views are very ideal. However, there is always information available in the views, even if this information is only partial. We introduce lower and possibility partial rewritings and provide algorithms for computing them. These rewritings are algebraic in their nature, i.e. we use only the algebraic view definitions for computing the rewritings. This fact makes them a main memory product which can be used for reducing secondary memory and remote access. We give two algorithms for utilizing the partial lower and partial possibility rewritings in the context of query optimization.

symposium on principles of database systems | 1989

Horn tables-an efficient tool for handling incomplete information in databases

Gösta Grahne

The basic semantic assumption is that an incomplete information database is a set of possible worlds (i.e. a set of complete databases). The main issue is then the problem of representing the set of possible worlds in a fashion that is suitable for storing and processing the database. There now exist a variety of representations, including the so called logical databases [15, 201 and algebraic generalizations of ordinary relations [2,4,10, 121. In particular, [2] presents a hierarchy of representations obtained by allowing variables and conditions on variables as entries in relations.

Explore More