Marinella Sciortino | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marinella Sciortino is active.

Explore More

Publication

Featured researches published by Marinella Sciortino.

Journal of the ACM | 2005

Boosting textual compression in optimal linear time

Paolo Ferragina; Raffaele Giancarlo; Giovanni Manzini; Marinella Sciortino

We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree data structure, and a greedy algorithm to process them. Specifically, we show that there exists a proper partition of the Burrows--Wheeler Transform of a string s that shows a deep combinatorial relation with the kth order entropy of s. That partition can be identified via a greedy processing of the suffix tree of s with the aim of minimizing a proper objective function over its nodes. The final compressed string is then obtained by compressing individually each substring of the partition by means of the base compressor we wish to boost.Our boosting technique is inherently combinatorial because it does not need to assume any prior probabilistic model about the source emitting s, and it does not deploy any training, parameter estimation and learning. Various corollaries are derived from this main achievement. Among the others, we show analytically that using our booster, we get better compression algorithms than some of the best existing ones, that is, LZ77, LZ78, PPMC and the ones derived from the Burrows--Wheeler Transform. Further, we settle analytically some long-standing open problems about the algorithmic structure and the performance of BWT-based compressors. Namely, we provide the first family of BWT algorithms that do not use Move-To-Front or Symbol Ranking as a part of the compression process.

Information Processing Letters | 2003

Burrows--Wheeler transform and Sturmian words

Sabrina Mantaci; Antonio Restivo; Marinella Sciortino

right Michael Burrows and David Wheeler introduced in 1994 (cf. [1]) a reversible transformation on strings (BWT from now on) that arouses considerable interest and curiosity in the field of Data Compression. Such a transformation produces a permutation BWT(w) of an input stringw that is easier to compress than the original one, in the sense that there exist very simple universal data compression algorithms providing surprisingly good performances when the original string is preprocessed by applying BWT . Actually they achieve compression rates that are very close to the best known compression rate. Moreover, due to their simplicity, such algorithms can be implemented with relatively low complexity. The underlying idea consists of grouping together symbols that appear in similar contexts. The output of such a transformation gives a sequence where it is very likely to have many instances of a same character close one another. A theoretical analysis of some BWT-based algorithms is available in [7].

Theoretical Computer Science | 2007

An extension of the Burrows–Wheeler Transform

Sabrina Mantaci; Antonio Restivo; Giovanna Rosone; Marinella Sciortino

We describe and highlight a generalization of the Burrows-Wheeler Transform (bwt) to a multiset of words. The extended transformation, denoted by ebwt, is reversible. Moreover, it allows to define a bijection between the words over a finite alphabet A and the finite multisets of conjugacy classes of primitive words in A^*. Besides its mathematical interest, the extended transform can be useful for applications in the context of string processing. In the last part of this paper we illustrate one such application, providing a similarity measure between sequences based on ebwt.

Theoretical Computer Science | 2002

Words and forbidden factors

Filippo Mignosi; Antonio Restivo; Marinella Sciortino

Given a finite or infinite word v, we consider the set M(v) of minimal forbidden factors of v. We show that the set M(v) is of fundamental importance in determining the structure of the word v. In the case of a finite word w we consider two parameters that are related to the size of M(w): the first counts the minimal forbidden factors of w and the second gives the length of the longest minimal forbidden factor of w. We derive sharp upper and lower bounds for both parameters. We prove also that the second parameter is related to the minimal period of the word w. We are further interested to the algorithmic point of view. Indeed, we design linear time algorithm for the following two problems: (i) given w, construct the set M(w) and, conversely, (ii) given M(w), reconstruct the word w. In the case of an infinite word x, we consider the following two functions: gx that counts, for each n, the allowed factors of x of length n and fx that counts, for each n, the minimal forbidden factors of x of length n. We address the following general problem: what information about the structure of x can be derived from the pair (gx,fx)? We prove that these two functions characterize, up to the automorphism exchanging the two letters, the language of factors of each single infinite Sturmian word.

International Journal of Approximate Reasoning | 2008

Distance measures for biological sequences: Some recent approaches

Sabrina Mantaci; Antonio Restivo; Marinella Sciortino

Sequence comparison has become a very essential tool in modern molecular biology. In fact, in biomolecular sequences high similarity usually implies significant functional or structural similarity. Traditional approaches use techniques that are based on sequence alignment able to measure character level differences. However, the recent developments of whole genome sequencing technology give rise to need of similarity measures able to capture the rearrangements involving large segments contained in the sequences. This paper is devoted to illustrate different methods recently introduced for the alignment-free comparison of biological sequences. Goal of the paper is both to highlight the peculiarities of each of such approaches by focusing on its advantages and disadvantages and to find the common features of all these different methods.

combinatorial pattern matching | 2005

An extension of the burrows wheeler transform and applications to sequence comparison and data compression

Sabrina Mantaci; Antonio Restivo; Giovanna Rosone; Marinella Sciortino

We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give some consideration about compressing a set of words by using the E transformation as preprocessing.

Theory of Computing Systems \/ Mathematical Systems Theory | 2008

A New Combinatorial Approach to Sequence Comparison

Sabrina Mantaci; Antonio Restivo; Giovanna Rosone; Marinella Sciortino

Abstract In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.

Theoretical Computer Science | 2006

Word assembly through minimal forbidden words

Gabriele Fici; Filippo Mignosi; Antonio Restivo; Marinella Sciortino

We give a linear-time algorithm to reconstruct a finite word w over a finite alphabet A of constant size starting from a finite set of factors of w verifying a suitable hypothesis. We use combinatorics techniques based on the minimal forbidden words, which have been introduced in previous papers. This improves a previous algorithm which worked under the assumption of stronger hypothesis.

combinatorial pattern matching | 2003

Optimal partitions of strings: a new class of Burrows-Wheeler compression algorithms

Raffaele Giancarlo; Marinella Sciortino

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.

developments in language theory | 2007

Suffix automata and standard sturmian words

Marinella Sciortino; Luca Q. Zamboni

Blumer et al. showed (cf. [3,2]) that the suffix automaton of a word w must have at least |w|+1 states and at most 2|w|-1 states. In this paper we characterize the language L of all binary words w whose minimal suffix automaton S(w) has exactly |w| + 1 states; they are precisely all prefixes of standard Sturmian words. In particular, we give an explicit construction of suffix automaton of words that are palindromic prefixes of standard words. Moreover, we establish a necessary and sufficient condition on S(w) which ensures that if w ∈ L and a ∈ {0, 1} then wa ∈ L. By using such a condition, we show how to construct the automaton S(wa) from S(w). More generally, we provide a simple construction that by starting from an automaton recognizing all suffixes of a word w over a finite alphabet A, allows to obtain an automaton that recognizes the suffixes of wa, a ∈ A.

Explore More