Is this you? Create Your Porfile

Gabriel Valiente

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gabriel Valiente is active.

Explore More

Publication

Featured researches published by Gabriel Valiente.

Pattern Recognition Letters | 2001

A graph distance metric combining maximum common subgraph and minimum common supergraph

Mirtha-Lina Fernández; Gabriel Valiente

Abstract The relationship between two important problems in pattern recognition using attributed relational graphs, the maximum common subgraph and the minimum common supergraph of two graphs, is established by means of simple constructions, which allow to obtain the maximum common subgraph from the minimum common supergraph, and vice versa. On this basis, a new graph distance metric is proposed for measuring similarities between objects represented by attributed relational graphs. The proposed metric can be computed by a straightforward extension of any algorithm that implements error-correcting graph matching, when run under an appropriate cost function, and the extension only takes time linear in the size of the graphs.

Mathematical Structures in Computer Science | 2002

Constraint satisfaction algorithms for graph pattern matching

Javier Larrosa; Gabriel Valiente

Graph pattern matching is a central problem in many application fields. Since it is NP-complete, we cannot expect to find algorithms with a good worst-case performance. However, there is still room for general procedures with a good average performance. In this paper we explore four different solving approaches within the constraint satisfaction framework, and introduce a new algorithm, which we call nRF+. The algorithm is a refinement of really full look ahead that takes advantage of the problem structure in order to enhance the look ahead procedure. We give a formal proof that nRF+ is superior to the other approaches in terms of number of visited nodes. An additional contribution of this paper is the introduction of a new benchmark for testing algorithms in this domain. It is formed by a large set of well-defined graphs of very diverse nature. In this benchmark, we show that nRF+ can efficiently solve a broad range of problems, while still leaving many problem instances unsolved. The use of this challenging benchmark is encouraged for future algorithms evaluation.

BMC Bioinformatics | 2007

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

Paolo Ferragina; Raffaele Giancarlo; Valentina Greco; Giovanni Manzini; Gabriel Valiente

BackgroundSimilarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available.ResultsWe experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next.ConclusionUCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009

Comparison of Tree-Child Phylogenetic Networks

Gabriel Cardona; Francesc Rosselló; Gabriel Valiente

Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/.

string processing and information retrieval | 2001

An efficient bottom-up distance between trees

Gabriel Valiente

A new bottom-up distance measure for labeled trees, which is based on the largest common forest of the trees and has the threefold advantage of independence ofparticular edit costs, low complexity, and coverage of ordered and unordered trees, is introduced and related in this paper with other distance measures published in the literature. Algorithms for computing the bottom-up distance in time linear in the number ofnodes are given in full detail.

Briefings in Bioinformatics | 2012

Reference databases for taxonomic assignment in metagenomics

Monica Santamaria; Bruno Fosso; Arianna Consiglio; Giorgio De Caro; Giorgio Grillo; Flavio Licciulli; Sabino Liuni; Marinella Marzano; Daniel Alonso-Alemany; Gabriel Valiente

Metagenomics is providing an unprecedented access to the environmental microbial diversity. The amplicon-based metagenomics approach involves the PCR-targeted sequencing of a genetic locus fitting different features. Namely, it must be ubiquitous in the taxonomic range of interest, variable enough to discriminate between different species but flanked by highly conserved sequences, and of suitable size to be sequenced through next-generation platforms. The internal transcribed spacers 1 and 2 (ITS1 and ITS2) of the ribosomal DNA operon and one or more hyper-variable regions of 16S ribosomal RNA gene are typically used to identify fungal and bacterial species, respectively. In this context, reliable reference databases and taxonomies are crucial to assign amplicon sequence reads to the correct phylogenetic ranks. Several resources provide consistent phylogenetic classification of publicly available 16S ribosomal DNA sequences, whereas the state of ribosomal internal transcribed spacers reference databases is notably less advanced. In this review, we aim to give an overview of existing reference resources for both types of markers, highlighting strengths and possible shortcomings of their use for metagenomics purposes. Moreover, we present a new database, ITSoneDB, of well annotated and phylogenetically classified ITS1 sequences to be used as a reference collection in metagenomic studies of environmental fungal communities. ITSoneDB is available for download and browsing at http://itsonedb.ba.itb.cnr.it/.

Bioinformatics | 2008

A distance metric for a class of tree-sibling phylogenetic networks

Gabriel Cardona; Mercè Llabrés; Francesc Rosselló; Gabriel Valiente

Motivation: The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to define a sound distance measure on the class of all phylogenetic networks have failed so far. Thus, the only practical solutions have been either the use of rough estimates of similarity (based on comparison of the trees embedded in the networks), or narrowing the class of phylogenetic networks to a certain class where such a distance is known and can be efficiently computed. The first approach has the problem that one may identify two networks as equivalent, when they are not; the second one has the drawback that there may not exist algorithms to reconstruct such networks from biological sequences. Results: We present in this article a distance measure on the class of semi-binary tree-sibling time consistent phylogenetic networks, which generalize tree-child time consistent phylogenetic networks, and thus also galled-trees. The practical interest of this distance measure is 2-fold: it can be computed in polynomial time by means of simple algorithms, and there also exist polynomial-time algorithms for reconstructing networks of this class from DNA sequence data. Availability: The Perl package Bio::PhyloNetwork, included in the BioPerl bundle, implements many algorithms on phylogenetic networks, including the computation of the distance presented in this article. Contact: [email protected] Supplementary information: Some counterexamples, proofs of the results not included in this article, and some computational experiments are available at Bioinformatics online.

BMC Bioinformatics | 2008

Extended Newick: it is time for a standard representation of phylogenetic networks

Gabriel Cardona; Francesc Rosselló; Gabriel Valiente

BackgroundPhylogenetic trees resulting from molecular phylogenetic analysis are available in Newick format from specialized databases but when it comes to phylogenetic networks, which provide an explicit representation of reticulate evolutionary events such as recombination, hybridization or lateral gene transfer, the lack of a standard format for their representation has hindered the publication of explicit phylogenetic networks in the specialized literature and their incorporation in specialized databases. Two different proposals to represent phylogenetic networks exist: as a single Newick string (where each hybrid node is splitted once for each parent) or as a set of Newick strings (one for each hybrid node plus another one for the phylogenetic network).ResultsThe standard we advocate as extended Newick format describes a whole phylogenetic network with k hybrid nodes as a single Newick string with k repeated nodes, and this representation is unique once the phylogenetic network is drawn or the ordering among children nodes is fixed. The extended Newick format facilitates phylogenetic data sharing and exchange, and also allows for the practical use of phylogenetic networks in computer programs and scripts. This standard has been recently agreed upon by a number of computational biologists, is already supported by several phylogenetic tools, and avoids the different drawbacks of using an a priori unknown number of Newick strings without any additional mark-up to represent a phylogenetic network.ConclusionThe adoption of the extended Newick format as a standard for the representation of phylogenetic network is an important step towards the publication of explicit phylogenetic networks in peer-reviewed journals and their incorporation in a future database of published phylogenetic networks.

string processing and information retrieval | 2000

An image similarity measure based on graph matching

Ricardo A. Baeza-Yates; Gabriel Valiente

The problem of computing the similarity between two images is transformed to that of approximating the distance between two extended region adjacency graphs, which are extracted from the images in time and space linear in the number of pixels. Invariance to translation and rotation is thus achieved. Invariance to scaling is also achieved by taking the relative size of regions into account. Furthermore, the method provides a trade-off between pixel similarity threshold and approximation of the distance measure, which can be used to bound the error in image recognition as well as the time complexity of the computation.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009

Metrics for Phylogenetic Networks I: Generalizations of the Robinson-Foulds Metric

Gabriel Cardona; Mercè Llabrés; Francesc Rosselló; Gabriel Valiente

The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time-consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartition distance, and the mu-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time-consistent phylogenetic network.

Explore More