Alexandra M. Carvalho

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexandra M. Carvalho is active.

Explore More

Publication

Featured researches published by Alexandra M. Carvalho.

Nucleic Acids Research | 2007

YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae

Pedro T. Monteiro; Nuno D. Mendes; Miguel C. Teixeira; Sofia d’Orey; Sandra Tenreiro; Nuno P. Mira; Hélio Pais; Alexandre P. Francisco; Alexandra M. Carvalho; Artur B. Lourenço; Isabel Sá-Correia; Arlindo L. Oliveira; Ana T. Freitas

The Yeast search for transcriptional regulators and consensus tracking (YEASTRACT) information system (www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in September 2007, this database contains over 30 990 regulatory associations between Transcription Factors (TFs) and target genes and includes 284 specific DNA binding sites for 108 characterized TFs. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions, in particular the ones that involve the analysis of global gene expression results. In this new release, YEASTRACT includes DISCOVERER, a set of computational tools that can be used to identify complex motifs over-represented in the promoter regions of co-regulated genes. The motifs identified are then clustered in families, represented by a position weight matrix and are automatically compared with the known transcription factor binding sites described in YEASTRACT. Additionally, in this new release, it is possible to generate graphic depictions of transcriptional regulatory networks for documented or potential regulatory associations between TFs and target genes. The visual display of these networks of interactions is instrumental in functional studies. Tutorials are available on the system to exemplify the use of all the available tools.

latin american symposium on theoretical informatics | 2006

RISOTTO: fast extraction of motifs with mismatches

Nadia Pisanti; Alexandra M. Carvalho; Laurent Marsan; Marie-France Sagot

We present in this paper an exact algorithm for motif extraction. Efficiency is achieved by means of an improvement in the algorithm and data structures that applies to the whole class of motif inference algorithms based on suffix trees. An average case complexity analysis shows a gain over the best known exact algorithm for motif extraction. A full implementation was developed and made available online. Experimental results show that the proposed algorithm is more than two times faster than the best known exact algorithm for motif extraction.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2006

An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences

Alexandra M. Carvalho; Ana T. Freitas; Arlindo L. Oliveira; Marie-France Sagot

We propose a new algorithm for identifying cis-regulatory modules in genomic sequences. The proposed algorithm, named RISO, uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the data set sequences. This type of conserved regions, called structured motifs, is extremely relevant in the research of gene regulatory mechanisms since it can effectively represent promoter models. The complexity analysis shows a time and space gain over the best known exact algorithms that is exponential in the spacings between binding sites. A full implementation of the algorithm was developed and made available online. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than four orders of magnitude. The application of the method to biological data sets shows its ability to extract relevant consensi

asia-pacific bioinformatics conference | 2005

A highly scalable algorithm for the extraction of cis-regulatory regions

Alexandra M. Carvalho; Ana T. Freitas; Arlindo L. Oliveira; Marie-France Sagot

In this paper we propose a new algorithm for identifying cis-regulatory modules in genomic sequences. In particular, the algorithm extracts structured motifs, defined as a collection of highly conserved regions with pre-specified sizes and spacings between them. This type of motifs is extremely relevant in the research of gene regulatory mechanisms since it can e! ectively represent promoter models. The proposed algorithm uses a new data structure, called box-link, to store the information about conserved regions that occur in a well-ordered and regularly spaced manner in the dataset sequences. The complexity analysis shows a time and space gain over previous algorithms that is exponential on the spacings between binding sites. Experimental results show that the algorithm is much faster than existing ones, sometimes by more than two orders of magnitude. The application of the method to biological datasets shows its ability to extract relevant consensi.

acm symposium on applied computing | 2004

A parallel algorithm for the extraction of structured motifs

Alexandra M. Carvalho; Arlindo L. Oliveira; Ana T. Freitas; Marie-France Sagot

In this work we propose a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm, based on an existing approach, extracts structured motifs, that consist of an ordered collection of p ≥ 1 boxes with sizes and spacings between them specified by given parameters. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure. By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis, also presented in this paper.

string processing and information retrieval | 2004

Efficient Extraction of Structured Motifs Using Box-Links

Alexandra M. Carvalho; Ana T. Freitas; Arlindo L. Oliveira; Marie-France Sagot

In this paper we propose a new data structure for the efficient extraction of structured motifs from DNA sequences. A structured motif is defined as a collection of highly conserved motifs with pre-specified sizes and spacings between them. The new data structure, called box-link, stores the information on how to jump over the spacings which separate each motif in a structured motif. A factor tree, a variation of a suffix tree, endowed with box-links provide the means for the efficient extraction of structured motifs.

Algorithms for Molecular Biology | 2012

Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis

Susana Vinga; Alexandra M. Carvalho; Alexandre P. Francisco; Luís M. S. Russo; Jonas S. Almeida

BackgroundChaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.ResultsThe exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.ConclusionsThe analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.

Pattern Recognition | 2014

Hybrid learning of Bayesian multinets for binary classification

Alexandra M. Carvalho; Pedro Adão; Paulo Mateus

We propose a scoring criterion, named mixture-based factorized conditional log-likelihood (mfCLL), which allows for efficient hybrid learning of mixtures of Bayesian networks in binary classification tasks. The learning procedure is decoupled in foreground and background learning, being the foreground the single concept of interest that we want to distinguish from a highly complex background. The overall procedure is hybrid as the foreground is discriminatively learned, whereas the background is generatively learned. The learning algorithm is shown to run in polynomial time for network structures such as trees and consistent κ-graphs. To gauge the performance of the mfCLL scoring criterion, we carry out a comparison with state-of-the-art classifiers. Results obtained with a large suite of benchmark datasets show that mfCLL-trained classifiers are a competitive alternative and should be taken into consideration.

international conference on machine learning and applications | 2007

Learning bayesian networks consistent with the optimal branching

Alexandra M. Carvalho; Arlindo L. Oliveira

We introduce a polynomial-time algorithm to learn Bayesian networks whose structure is restricted to nodes with in-degree at most k and to edges consistent with the optimal branching, that we call consistent k-graphs (CkG). The optimal branching is used as an heuristic for a primary causality order between network variables, which is subsequently refined, according to a certain score, into an optimal CkG Bayesian network. This approach augments the search space exponentially, in the number of nodes, relatively to trees, yet keeping a polynomial-time bound. The proposed algorithm can be applied to scores that decompose over the network structure, such as the well known LL, MDL, AIC, BIC, K2, BD, BDe, BDeu and MIT scores. We tested the proposed algorithm in a classification task. We show that the induced classifier always score better than or the same as the Naive Bayes and Tree Augmented Naive Bayes classifiers. Experiments on the UCI repository show that, in many cases, the improved scores translate into increased classification accuracy.

australasian joint conference on artificial intelligence | 2007

Efficient Learning of Bayesian Network Classifiers

Alexandra M. Carvalho; Arlindo L. Oliveira; Marie-France Sagot

We introduce a Bayesian network classifier less restrictive than Naive Bayes (NB) and Tree Augmented Naive Bayes (TAN) classifiers. Considering that learning an unrestricted network is unfeasible the proposed classifier is confined to be consistent with the breadth-first search order of an optimal TAN. We propose an efficient algorithm to learn such classifiers for any score that decompose over the network structure, including the well known scores based on information theory and Bayesian scoring functions. We show that the induced classifier always scores better than or the same as the NB and TAN classifiers. Experiments on modeling transcription factor binding sites show that, in many cases, the improved scores translate into increased classification accuracy.

Explore More