Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Carlos A. C. Bastos is active.

Publication


Featured researches published by Carlos A. C. Bastos.


Bioinformatics | 2009

Genome analysis with inter-nucleotide distances

Vera Afreixo; Carlos A. C. Bastos; Armando J. Pinho; Sara P. Garcia; Paulo Jorge S. G. Ferreira

Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description. Results: We developed a methodology to process DNA sequences based on inter-nucleotide distances. Our main contribution is a method to obtain genomic signatures for complete genomes, based on the inter-nucleotide distances, that are able to discriminate between different species. Using these signatures and hierarchical clustering, it is possible to build phylogenetic trees. Phylogenetic trees lead to genome differentiation and allow the inference of phylogenetic relations. The phylogenetic trees generated in this work display related species close to each other, suggesting that the inter-nucleotide distances are able to capture essential information about the genomes. To create the genomic signature, we construct a vector which describes the inter-nucleotide distance distribution of a complete genome and compare it with the reference distance distribution, which is the distribution of a sequence where the nucleotides are placed randomly and independently. It is the residual or relative error between the data and the reference distribution that is used to compare the DNA sequences of different organisms. Contact: [email protected]


PLOS ONE | 2011

On the representability of complete genomes by multiple competing finite-context (Markov) models.

Armando J. Pinho; Paulo Jorge S. G. Ferreira; António J. R. Neves; Carlos A. C. Bastos

A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.


IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control | 1999

Spectrum of Doppler ultrasound signals from nonstationary blood flow

Carlos A. C. Bastos; Peter J. Fish; Francisco Vaz

A new formulation for the Doppler signal generation process in pulsatile flow has been developed enabling easier identification and quantification of the mechanisms involved in spectral broadening and the development of a simple estimation formula for the measured rms spectral width. The accuracy of the estimation formula was tested by comparing it with the spectral widths found by using conventional spectral estimation on simulated Doppler signals from pulsatile flow. The influence of acceleration, sample volume size, and time window duration on the Doppler spectral width was investigated for flow with blunt and parabolic velocity profiles passing through Gaussian-shaped sample volumes. Our results show that, for short duration windows, the spectral width is dominated by window broadening and that acceleration has a small effect on the spectral width. For long duration windows, the effect of acceleration must be taken into account. The size of the sample volume affects the spectral width of the Doppler signal in two ways: by intrinsic broadening and by the range of velocities passing through it. These effects act in opposite directions. The simple spectral width estimation formula was shown to have excellent agreement with widths calculated using the model and indicates the potential for correcting not only for window and nonstationarity broadening but also for intrinsic broadening.


international conference on acoustics, speech, and signal processing | 2009

DNA coding using finite-context models and arithmetic coding

Armando J. Pinho; António J. R. Neves; Carlos A. C. Bastos; Paulo Jorge S. G. Ferreira

The interest in DNA coding has been growing with the availability of extensive genomic databases. Although only two bits are sufficient to encode the four DNA bases, efficient lossless compression methods are still needed due to the size of DNA sequences and because standard compression algorithms do not perform well on DNA sequences. As a result, several specific coding methods have been proposed. Most of these methods are based on searching procedures for finding exact or approximate repeats. Low order finite-context models have only been used as secondary, fall back mechanisms. In this paper, we show that finite-context models can also be used as main DNA encoding methods. We propose a coding method based on two finite-context models that compete for the encoding of data, on a block by block basis. The experimental results confirm the effectiveness of the proposed method.


PLOS ONE | 2011

Minimal Absent Words in Prokaryotic and Eukaryotic Genomes

Sara P. Garcia; Armando J. Pinho; João M. O. S. Rodrigues; Carlos A. C. Bastos; Paulo Jorge S. G. Ferreira

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we explore different sets of minimal absent words in the genomes of 22 organisms (one archaeota, thirteen bacteria and eight eukaryotes). We investigate if the mutational biases that may explain the deficit of the shortest absent words in vertebrates are also pervasive in other absent words, namely in minimal absent words, as well as to other organisms. We find that the compositional biases observed for the shortest absent words in vertebrates are not uniform throughout different sets of minimal absent words. We further investigate the hypothesis of the inheritance of minimal absent words through common ancestry from the similarity in dinucleotide relative abundances of different sets of minimal absent words, and find that this inheritance may be exclusive to vertebrates.


IEEE Transactions on Biomedical Engineering | 2006

A Three-State Model for DNA Protein-Coding Regions

Armando J. Pinho; António J. R. Neves; Vera Afreixo; Carlos A. C. Bastos; Paulo Jorge S. G. Ferreira

It is known that the protein-coding regions of DNA are usually characterized by a three-base periodicity. In this paper, we exploit this property, studying a DNA model based on three deterministic states, where each state implements a finite-context model. The experimental results obtained confirm the appropriateness of the proposed approach, showing compression gains in relation to the single finite-context model counterpart. Additionally, and potentially more interesting than the compression gain on its own, is the observation that the entropy associated to each of the three base positions of a codon differs and that this variation is not the same among the organisms analyzed


international conference of the ieee engineering in medicine and biology society | 2001

Blood and wall signal simulator for Doppler ultrasound signal analysis algorithm development

Peter J. Fish; Carlos A. C. Bastos; Francisco Vaz

Doppler ultrasound instruments, used for the detection and monitoring of vascular disease, require a means of separating the large, low frequency Doppler signal from the vessel wall from the signal arising from blood followed by a means of analysing he blood flow signal in order to characterise the flow conditions. This is normally achieved by using a high-pass filter that removes the signal reflected from the vessel wall. Unfortunately, the filter also removes the low frequency Doppler signals arising from slow moving blood. A better signal segmentation method that reduces the loss of signal from slowly moving blood is needed to permit the measurement of lower blood velocities. A signal simulator that generates Doppler signals that include the contributions from blood and vessel wall will be very useful for the development of new Doppler signal segmentation methods. This work presents a new simulator incorporating the contribution of blood and vessel wall movements; the characteristics of the simulator output signal are similar to those found in practice.


Ultrasonics | 2000

Doppler power spectrum from a Gaussian sample volume

Carlos A. C. Bastos; Peter J. Fish; Robin Steel; Francisco Vaz

A closed-form expression for the Doppler power spectrum due solely to the range of blood velocities passing through a Gaussian sample volume placed anywhere in a vessel under conditions of axisymmetric flow, uniform backscatter and negligible intrinsic spectral broadening has been derived. The formulation presented here allows the independent specification of the sample volume position and width, in the three dimensions, and enables simple estimations of spectral shape for pulsed wave Doppler systems. Simpler expressions were derived for the cases of symmetric sample volume projections onto the vessel cross-section and/or sample volumes centred in the vessel. Closed form expressions were derived for mean frequency and spectral width in the case of a symmetric sample volume projection centred in the vessel. The effects of sample volume size and position on the Doppler spectral width and mean frequency are shown for a range of velocity profiles.


Biostatistics | 2015

Analysis of single-strand exceptional word symmetry in the human genome: new measures

Vera Afreixo; João M. O. S. Rodrigues; Carlos A. C. Bastos

Some previous studies suggest the extension of Chargaffs second rule (the phenomenon of symmetry in a single DNA strand) to long DNA words. However, in random sequences generated under an independent symbol model where complementary nucleotides have equal occurrence probabilities, we expect the phenomenon of symmetry to hold for any word length. In this work, we develop new statistical methods to measure the exceptional symmetry. Exceptional symmetry is a refinement of Chargaffs second parity rule that highlights the words whose frequency of occurrence is similar to that of its reversed complement but dissimilar to the frequencies of occurrence of other words which contain the same number of nucleotides A or T. We analyze words of lengths up to 12 in the complete human genome and in each chromosome separately. We assess exceptional symmetry globally, by word group, and by word. We conclude that the global symmetry present in the human genome is clearly exceptional and significant. The chromosomes present distinct exceptional symmetry profiles. There are several exceptional word groups and exceptional words with a strong exceptional symmetry.


Journal of Theoretical Biology | 2013

The breakdown of the word symmetry in the human genome.

Vera Afreixo; Carlos A. C. Bastos; Sara P. Garcia; João M. O. S. Rodrigues; Armando J. Pinho; Paulo Jorge S. G. Ferreira

Previous studies have suggested that Chargaffs second rule may hold for relatively long words (above 10nucleotides), but this has not been conclusively shown. In particular, the following questions remain open: Is the phenomenon of symmetry statistically significant? If so, what is the word length above which significance is lost? Can deviations in symmetry due to the finite size of the data be identified? This work addresses these questions by studying word symmetries in the human genome, chromosomes and transcriptome. To rule out finite-length effects, the results are compared with those obtained from random control sequences built to satisfy Chargaffs second parity rule. We use several techniques to evaluate the phenomenon of symmetry, including Pearsons correlation coefficient, total variational distance, a novel word symmetry distance, as well as traditional and equivalence statistical tests. We conclude that word symmetries are statistical significant in the human genome for word lengths up to 6nucleotides. For longer words, we present evidence that the phenomenon may not be as prevalent as previously thought.

Collaboration


Dive into the Carlos A. C. Bastos's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge