Jan Daciuk
Gdańsk University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jan Daciuk.
Algorithmica | 2009
Rafael C. Carrasco; Jan Daciuk; Mikel L. Forcada
We describe an algorithm that allows the incremental addition or removal of unranked ordered trees to a minimal frontier-to-root deterministic finite-state tree automaton (DTA). The algorithm takes a tree t and a minimal DTA A as input; it outputs a minimal DTA A′ which accepts the language L(A) accepted by A incremented (or decremented) with the tree t. The algorithm can be used to efficiently maintain dictionaries which store large collections of trees or tree fragments.
international conference on implementation and application of automata | 2007
Rafael C. Carrasco; Jan Daciuk; Mikel L. Forcada
A frontier-to-root deterministic finite-state tree automaton (DTA) can be used as a compact data structure to store collections of unranked ordered trees. DTAs are usually sparser than string automata, as most transitions are undefined and therefore, special care must be taken in order to minimize them efficiently. However, it is difficult to find simple and detailed descriptions of the minimization procedure in the published literature. Here, we fully describe a simple implementation of the standard minimization algorithm that needs a time in O(|A|2), with |A| being the size of the DTA.
intelligent information systems | 2004
Jan Daciuk
Finite-state machines are widely used as dictionaries in natural language processing. They offer fast processing time and low memory requirements. We present a new algorithm for adding new words to the language of a cyclic finite-state automaton. The algorithm is an extension to cyclic automata of a semi-incremental Watson’s algorithm for acyclic automata. The conversion is done in the spirit of Carrasco and Foracada’s algorithm for adding new words to cyclic automata. The new algorithm makes use of sorted data in order not to reprocess states that are added or modified as the result of adding the whole set of words. This should make it faster than the Carrasco and Forcada’s algorithm.
Theoretical Computer Science | 2012
Jan Daciuk; Dawid Weiss
This paper is a follow-up to Jan Daciuks experiments on space-efficient finite state automata representation that can be used directly for traversals in main memory (Daciuk, 2000) [4]. We investigate several techniques for reducing the memory footprint of minimal automata, mainly exploiting the fact that transition labels and transition pointer offset values are not evenly distributed and so are suitable for compression. We achieve a size gain of around 20%-30% compared to the original representation given in [4]. This result is comparable to the state-of-the-art dictionary compression techniques like the LZ-trie (Ristov and Laporte, 1999) [15] method, but remains memory and CPU efficient during construction.
intelligent information systems | 2005
Jan Daciuk; Denis Maurel; Agata Savary
Minimal perfect hashing provides a mapping between a set of n unique words and n consecutive numbers. When implemented with minimal finite-state automata, the mapping is determined only by the (usually alphabetical) order of words in the set. Addition of new words would change the order of words already in the language of the automaton, changing the whole mapping, and making it useless in many domains. Therefore, we call it static. Dynamic minimal perfect hashing assigns consecutive numbers to consecutive words as they are added to the language of the automaton. Dynamic perfect hashing is important in many domains, including text retrieval and databases. We investigate three methods for its implementation.
international conference on implementation and application of automata | 2005
Jan Daciuk; Denis Maurel; Agata Savary
Pseudo-minimal automata ([1],[2]) are minimal acyclic automata that have a proper element (a transition or a state) for each word belonging to the language of the automaton. That proper element is not shared with any other word, and it can be used for implementing a function on words belonging to the language. For instance, dynamic perfect hashing (e.g. a mapping from n unique words to n consecutive numbers, such that addition of new elements does not change the order of the previous elements) can be implemented using a pseudo-minimal automaton ([3]).
Theoretical Informatics and Applications | 2009
Rafael C. Carrasco; Jan Daciuk
We describe a technique that maps unranked trees to arbitrary hash codes using a bottom-up deterministic tree automaton (DTA). In contrast to other hashing techniques based on automata, our procedure builds a pseudo-minimal DTA for this purpose. A pseudo-minimal automaton may be larger than the minimal one accepting the same language but, in turn, it contains proper elements (states or transitions which are unique) for every input accepted by the automaton. Therefore, pseudo-minimal DTA are a suitable structure to implement stable hashing schemes, that is, schemes where the output for every key can be determined prior to the automaton construction. We provide incremental procedures to build the pseudo-minimal DTA and the mapping that associates an integer value to every transition that will be used to compute the hash codes. This incremental construction allows for the incorporation of new trees and their hash codes without the need to rebuild the whole DTA from scratch.
Archive | 2008
Jan Daciuk; Rafael C. Carrasco
Treizième conférence annuelle sur le traitement automatique des langues naturelles | 2006
Denis Maurel; Jan Daciuk
Lecture Notes in Computer Science | 2006
Jan Daciuk; Denis Maurel; Agata Savary