Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Susana Vinga is active.

Publication


Featured researches published by Susana Vinga.


Bioinformatics | 2003

Alignment-free sequence comparison: a review

Susana Vinga; Jonas S. Almeida

MOTIVATION Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignment-free methods that overcome this limitation. The formulation of alternative metrics for dissimilarity between sequences and their algorithmic implementations are reviewed. RESULTS The overwhelming majority of work on alignment-free sequence has taken place in the past two decades, with most reports published in the past 5 years. Two main categories of methods have been proposed-methods based on word (oligomer) frequency, and methods that do not require resolving the sequence with fixed word length segments. The first category is based on the statistics of word frequency, on the distances defined in a Cartesian space defined by the frequency vectors, and on the information content of frequency distribution. The second category includes the use of Kolmogorov complexity and Chaos Theory. Despite their low visibility, alignment-free metrics are in fact already widely used as pre-selection filters for alignment-based querying of large applications. Recent work is furthering their usage as a scale-independent methodology that is capable of recognizing homology when loss of contiguity is beyond the possibility of alignment. AVAILABILITY Most of the alignment-free algorithms reviewed were implemented in MATLAB code and are available at http://bioinformatics.musc.edu/resources.html


BMC Systems Biology | 2008

Parameter optimization in S-system models.

Marco Vilela; I-Chun Chou; Susana Vinga; Ana Tereza Ribeiro de Vasconcelos; Eberhard O. Voit; Jonas S. Almeida

BackgroundThe inverse problem of identifying the topology of biological networks from their time series responses is a cornerstone challenge in systems biology. We tackle this challenge here through the parameterization of S-system models. It was previously shown that parameter identification can be performed as an optimization based on the decoupling of the differential S-system equations, which results in a set of algebraic equations.ResultsA novel parameterization solution is proposed for the identification of S-system models from time series when no information about the network topology is known. The method is based on eigenvector optimization of a matrix formed from multiple regression equations of the linearized decoupled S-system. Furthermore, the algorithm is extended to the optimization of network topologies with constraints on metabolites and fluxes. These constraints rejoin the system in cases where it had been fragmented by decoupling. We demonstrate with synthetic time series why the algorithm can be expected to converge in most cases.ConclusionA procedure was developed that facilitates automated reverse engineering tasks for biological networks using S-systems. The proposed method of eigenvector optimization constitutes an advancement over S-system parameter identification from time series using a recent method called Alternating Regression. The proposed method overcomes convergence issues encountered in alternate regression by identifying nonlinear constraints that restrict the search space to computationally feasible solutions. Because the parameter identification is still performed for each metabolite separately, the modularity and linear time characteristics of the alternating regression method are preserved. Simulation studies illustrate how the proposed algorithm identifies the correct network topology out of a collection of models which all fit the dynamical time series essentially equally well.


Biotechnology Advances | 2013

From physiology to systems metabolic engineering for the production of biochemicals by lactic acid bacteria.

Paula Gaspar; Ana Luísa Carvalho; Susana Vinga; Helena Santos; Ana Rute Neves

The lactic acid bacteria (LAB) are a functionally related group of low-GC Gram-positive bacteria known essentially for their roles in bioprocessing of foods and animal feeds. Due to extensive industrial use and enormous economical value, LAB have been intensively studied and a large body of comprehensive data on their metabolism and genetics was generated throughout the years. This knowledge has been instrumental in the implementation of successful applications in the food industry, such as the selection of robust starter cultures with desired phenotypic traits. The advent of genomics, functional genomics and high-throughput experimentation combined with powerful computational tools currently allows for a systems level understanding of these food industry workhorses. The technological developments in the last decade have provided the foundation for the use of LAB in applications beyond the classic food fermentations. Here we discuss recent metabolic engineering strategies to improve particular cellular traits of LAB and to design LAB cell factories for the bioproduction of added value chemicals.


BMC Bioinformatics | 2007

Automated smoother for the numerical decoupling of dynamics models

Marco Vilela; Carlos Cristiano H. Borges; Susana Vinga; Ana Tereza Ribeiro de Vasconcelos; Helena Santos; Eberhard O. Voit; Jonas S. Almeida

BackgroundStructure identification of dynamic models for complex biological systems is the cornerstone of their reverse engineering. Biochemical Systems Theory (BST) offers a particularly convenient solution because its parameters are kinetic-order coefficients which directly identify the topology of the underlying network of processes. We have previously proposed a numerical decoupling procedure that allows the identification of multivariate dynamic models of complex biological processes. While described here within the context of BST, this procedure has a general applicability to signal extraction. Our original implementation relied on artificial neural networks (ANN), which caused slight, undesirable bias during the smoothing of the time courses. As an alternative, we propose here an adaptation of the Whittakers smoother and demonstrate its role within a robust, fully automated structure identification procedure.ResultsIn this report we propose a robust, fully automated solution for signal extraction from time series, which is the prerequisite for the efficient reverse engineering of biological systems models. The Whittakers smoother is reformulated within the context of information theory and extended by the development of adaptive signal segmentation to account for heterogeneous noise structures. The resulting procedure can be used on arbitrary time series with a nonstationary noise process; it is illustrated here with metabolic profiles obtained from in-vivo NMR experiments. The smoothed solution that is free of parametric bias permits differentiation, which is crucial for the numerical decoupling of systems of differential equations.ConclusionThe method is applicable in signal extraction from time series with nonstationary noise structure and can be applied in the numerical decoupling of system of differential equations into algebraic equations, and thus constitutes a rather general tool for the reverse engineering of mechanistic model descriptions from multivariate experimental time series.


Antimicrobial Agents and Chemotherapy | 2005

Analysis of the Genetic Variability of Virulence-Related Loci in Epidemic Clones of Methicillin-Resistant Staphylococcus aureus

A. R. Gomes; Susana Vinga; Mihaela Zavolan; H. de Lencastre

ABSTRACT Methicillin-resistant Staphylococcus aureus (MRSA) isolates have previously been classified into major epidemic clonal types by pulsed-field gel electrophoresis in combination with multilocus sequence typing (MLST) and staphylococcal cassette chromosome mec typing. We aimed to investigate whether genetic variability in potentially polymorphic domains of virulence-related factors could provide another level of differentiation in a diverse collection of epidemic MRSA clones. The target regions of strains representative of epidemic clones and genetically related methicillin-susceptible S. aureus isolates from the 1960s that were sequenced included the R domains of clfA and clfB; the D, W, and M regions of fnbA and fnbB; and three regions in the agr operon. Sequence variation ranged from very conserved regions, such as those for RNAIII and the agr interpromoter region, to the highly polymorphic R regions of the clf genes. The sequences of the clf R domains could be grouped into six major sequence types on the basis of the sequences in their 3′ regions. Six sequence types were also observed for the fnb sequences at the amino acid level. From an evolutionary point of view, it was interesting that a small DNA stretch at the 3′ clf R-domain sequence and the fnb sequences agreed with the results of MLST for this set of strains. In particular, clfB R-domain sequences, which had a high discriminatory capacity and with which the types distinguished were congruent with those obtained by other molecular typing methods, have potential for use for the typing of S. aureus. Clone- and strain-specific sequence motifs in the clf and fnb genes may represent useful additions to a typing methodology with a DNA array.


BMC Systems Biology | 2009

Identification of neutral biochemical network models from time series data.

Marco Vilela; Susana Vinga; Marco A Grivet Mattoso Maia; Eberhard O. Voit; Jonas S. Almeida

BackgroundThe major difficulty in modeling biological systems from multivariate time series is the identification of parameter sets that endow a model with dynamical behaviors sufficiently similar to the experimental data. Directly related to this parameter estimation issue is the task of identifying the structure and regulation of ill-characterized systems. Both tasks are simplified if the mathematical model is canonical, i.e., if it is constructed according to strict guidelines.ResultsIn this report, we propose a method for the identification of admissible parameter sets of canonical S-systems from biological time series. The method is based on a Monte Carlo process that is combined with an improved version of our previous parameter optimization algorithm. The method maps the parameter space into the network space, which characterizes the connectivity among components, by creating an ensemble of decoupled S-system models that imitate the dynamical behavior of the time series with sufficient accuracy. The concept of sloppiness is revisited in the context of these S-system models with an exploration not only of different parameter sets that produce similar dynamical behaviors but also different network topologies that yield dynamical similarity.ConclusionThe proposed parameter estimation methodology was applied to actual time series data from the glycolytic pathway of the bacterium Lactococcus lactis and led to ensembles of models with different network topologies. In parallel, the parameter optimization algorithm was applied to the same dynamical data upon imposing a pre-specified network topology derived from prior biological knowledge, and the results from both strategies were compared. The results suggest that the proposed method may serve as a powerful exploration tool for testing hypotheses and the design of new experiments.


BMC Bioinformatics | 2002

Universal sequence map (USM) of arbitrary discrete sequences

Jonas S. Almeida; Susana Vinga

BackgroundFor over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.ResultsWe have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/.ConclusionsUSM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.


Bioinformatics | 2004

Comparative evaluation of word composition distances for the recognition of SCOP relationships

Susana Vinga; Rodrigo Gouveia-Oliveira; Jonas S. Almeida

MOTIVATION Alignment-free metrics were recently reviewed by the authors, but have not until now been object of a comparative study. This paper compares the classification accuracy of word composition metrics therein reviewed. It also presents a new distance definition between protein sequences, the W-metric, which bridges between alignment metrics, such as scores produced by the Smith-Waterman algorithm, and methods based solely in L-tuple composition, such as Euclidean distance and Information content. RESULTS The comparative study reported here used the SCOP/ASTRAL protein structure hierarchical database and accessed the discriminant value of alternative sequence dissimilarity measures by calculating areas under the Receiver Operating Characteristic curves. Although alignment methods resulted in very good classification accuracy at family and superfamily levels, alignment-free distances, in particular Standard Euclidean Distance, are as good as alignment algorithms when sequence similarity is smaller, such as for recognition of fold or class relationships. This observation justifies its advantageous use to pre-filter homologous proteins since word statistics techniques are computed much faster than the alignment methods. AVAILABILITY All MATLAB code used to generate the data is available upon request to the authors. Additional material available at http://bioinformatics.musc.edu/wmetric


Briefings in Bioinformatics | 2014

Information theory applications for biological sequence analysis

Susana Vinga

Abstract Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.


BMC Bioinformatics | 2007

Local Renyi entropic profiles of DNA sequences

Susana Vinga; Jonas S. Almeida

BackgroundIn a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzens window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.ResultsThe new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/.ConclusionThe ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

Collaboration


Dive into the Susana Vinga's collaboration.

Top Co-Authors

Avatar

Jonas S. Almeida

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Rafael S. Costa

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar

André Veríssimo

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar

Ana Rute Neves

Universidade Nova de Lisboa

View shared research outputs
Top Co-Authors

Avatar

J.M. Lemos

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar

Eberhard O. Voit

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge