Armando J. Pinho | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Armando J. Pinho is active.

Explore More

Publication

Featured researches published by Armando J. Pinho.

Nucleic Acids Research | 2012

GReEn: a tool for efficient compression of genome resequencing data

Armando J. Pinho; Diogo Pratas; Sara P. Garcia

Research in the genomic sciences is confronted with the volume of sequencing and resequencing data increasing at a higher pace than that of data storage and communication resources, shifting a significant part of research budgets from the sequencing component of a project to the computational one. Hence, being able to efficiently store sequencing and resequencing data is a problem of paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), a tool for compressing genome resequencing data using a reference genome sequence. It overcomes some drawbacks of the recently proposed tool GRS, namely, the possibility of compressing sequences that cannot be handled by GRS, faster running times and compression gains of over 100-fold for some sequences. This tool is freely available for non-commercial use at ftp://ftp.ieeta.pt/∼ap/codecs/GReEn1.tar.gz.

IEEE Signal Processing Letters | 2002

An online preprocessing technique for improving the lossless compression of images with sparse histograms

Armando J. Pinho

This article addresses the problem of improving the efficiency of lossless compression of images with sparse histograms. An online preprocessing technique is proposed, which, although very simple, is able to provide significant improvements in the compression ratio of the images that it targets and shows a good robustness on other images.

Bioinformatics | 2009

Genome analysis with inter-nucleotide distances

Vera Afreixo; Carlos A. C. Bastos; Armando J. Pinho; Sara P. Garcia; Paulo Jorge S. G. Ferreira

Motivation: DNA sequences can be represented by sequences of four symbols, but it is often useful to convert the symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but they seem unrelated to any intrinsic characteristic of DNA. The objective of this work was to find a mapping scheme directly related to DNA characteristics and that would be useful in discriminating between different species. Mathematical models to explore DNA correlation structures may contribute to a better knowledge of the DNA and to find a concise DNA description. Results: We developed a methodology to process DNA sequences based on inter-nucleotide distances. Our main contribution is a method to obtain genomic signatures for complete genomes, based on the inter-nucleotide distances, that are able to discriminate between different species. Using these signatures and hierarchical clustering, it is possible to build phylogenetic trees. Phylogenetic trees lead to genome differentiation and allow the inference of phylogenetic relations. The phylogenetic trees generated in this work display related species close to each other, suggesting that the inter-nucleotide distances are able to capture essential information about the genomes. To create the genomic signature, we construct a vector which describes the inter-nucleotide distance distribution of a complete genome and compare it with the reference distance distribution, which is the distribution of a sequence where the nucleotides are placed randomly and independently. It is the residual or relative error between the data and the reference distribution that is used to compare the DNA sequences of different organisms. Contact: [email protected]

IEEE Transactions on Image Processing | 2004

A survey on palette reordering methods for improving the compression of color-indexed images

Armando J. Pinho; António J. R. Neves

Palette reordering is a well-known and very effective approach for improving the compression of color-indexed images. In this paper, we provide a survey of palette reordering methods, and we give experimental results comparing the ability of seven of them in improving the compression efficiency of JPEG-LS and lossless JPEG 2000. We concluded that the pairwise merging heuristic proposed by Memon et al. is the most effective, but also the most computationally demanding. Moreover, we found that the second most effective method is a modified version of Zengs reordering technique, which was 3%-5% worse than pairwise merging, but much faster.

PLOS ONE | 2011

On the representability of complete genomes by multiple competing finite-context (Markov) models.

Armando J. Pinho; Paulo Jorge S. G. Ferreira; António J. R. Neves; Carlos A. C. Bastos

A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.

IEEE Signal Processing Letters | 2002

Why does histogram packing improve lossless compression rates

Paulo Jorge S. G. Ferreira; Armando J. Pinho

The performance of state-of-the-art lossless image coding methods [such as JPEG-LS, lossless JPEG-2000, and context-based adaptive lossless image coding (CALIC)] can be considerably improved by a recently introduced preprocessing technique that can be applied whenever the images have sparse histograms. Bitrate savings of up to 50% have been reported, but so far no theoretical explanation of the fact has been advanced. This letter addresses this issue and analyzes the effect of the technique in terms of the interplay between histogram packing and the image total variation, emphasizing the lossless JPEG-2000 case.

IEEE Signal Processing Letters | 2004

A note on Zeng's technique for color reindexing of palette-based images

Armando J. Pinho; António J. R. Neves

Palette reindexing is a well-known and very effective approach for improving the compression of color-indexed images. In this letter, we address the reindexing technique proposed by Zeng et al. and we show how its performance can be improved through a theoretically motivated choice of parameters. Experimental results show the practical appropriateness of the proposed modification.

Bioinformatics | 2014

MFCompress: a compression tool for FASTA and multi-FASTA data.

Armando J. Pinho; Diogo Pratas

Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. Results: In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Availability: Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

BMC Bioinformatics | 2009

On finding minimal absent words

Armando J. Pinho; Paulo Jorge S. G. Ferreira; Sara P. Garcia; João M. O. S. Rodrigues

BackgroundThe problem of finding the shortest absent words in DNA data has been recently addressed, and algorithms for its solution have been described. It has been noted that longer absent words might also be of interest, but the existing algorithms only provide generic absent words by trivially extending the shortest ones.ResultsWe show how absent words relate to the repetitions and structure of the data, and define a new and larger class of absent words, called minimal absent words, that still captures the essential properties of the shortest absent words introduced in recent works. The words of this new class are minimal in the sense that if their leftmost or rightmost character is removed, then the resulting word is no longer an absent word. We describe an algorithm for generating minimal absent words that, in practice, runs in approximately linear time. An implementation of this algorithm is publicly available at ftp://www.ieeta.pt/~ap/maws.ConclusionBecause the set of minimal absent words that we propose is much larger than the set of the shortest absent words, it is potentially more useful for applications that require a richer variety of absent words. Nevertheless, the number of minimal absent words is still manageable since it grows at most linearly with the string size, unlike generic absent words that grow exponentially. Both the algorithm and the concepts upon which it depends shed additional light on the structure of absent words and complement the existing studies on the topic.

portuguese conference on artificial intelligence | 2007

An omnidirectional vision system for soccer robots

António J. R. Neves; Gustavo A. Corrente; Armando J. Pinho

This paper describes a complete and efficient vision system developed for the robotic soccer team of the University of Aveiro, CAMBADA (Cooperative Autonomous Mobile roBots with Advanced Distributed Architecture). The system consists on a firewire camera mounted vertically on the top of the robots. A hyperbolic mirror placed above the camera reflects the 360 degrees of the field around the robot. The omnidirectional system is used to find the ball, the goals, detect the presence of obstacles and the white lines, used by our localization algorithm. In this paper we present a set of algorithms to extract efficiently the color information of the acquired images and, in a second phase, extract the information of all objects of interest. Our vision system architecture uses a distributed paradigm where the main tasks, namely image acquisition, color extraction, object detection and image visualization, are separated in several processes that can run at the same time. We developed an efficient color extraction algorithm based on lookup tables and a radial model for object detection. Our participation in the last national robotic contest, ROBOTICA 2007, where we have obtained the first place in the Medium Size League of robotic soccer, shows the effectiveness of our algorithms. Moreover, our experiments show that the system is fast and accurate having a maximum processing time independently of the robot position and the number of objects found in the field.

Explore More