Alvaro J. González | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alvaro J. González is active.

Explore More

Publication

Featured researches published by Alvaro J. González.

BMC Bioinformatics | 2010

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Alvaro J. González; Li Liao

BackgroundProtein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles.ResultsIn this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure.ConclusionsWe show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows.

Bioinformatics | 2013

Prediction of contact matrix for protein–protein interaction

Alvaro J. González; Li Liao; Cathy H. Wu

MOTIVATION Prediction of protein-protein interaction has become an important part of systems biology in reverse engineering the biological networks for better understanding the molecular biology of the cell. Although significant progress has been made in terms of prediction accuracy, most computational methods only predict whether two proteins interact but not their interacting residues-the information that can be very valuable for understanding the interaction mechanisms and designing modulation of the interaction. In this work, we developed a computational method to predict the interacting residue pairs-contact matrix for interacting protein domains, whose rows and columns correspond to the residues in the two interacting domains respectively and whose values (1 or 0) indicate whether the corresponding residues (do or do not) interact. RESULTS Our method is based on supervised learning using support vector machines. For each domain involved in a given domain-domain interaction (DDI), an interaction profile hidden Markov model (ipHMM) is first built for the domain family, and then each residue position for a member domain sequence is represented as a 20-dimension vector of Fisher scores, characterizing how similar it is as compared with the family profile at that position. Each element of the contact matrix for a sequence pair is now represented by a feature vector from concatenating the vectors of the two corresponding residues, and the task is to predict the element value (1 or 0) from the feature vector. A support vector machine is trained for a given DDI, using either a consensus contact matrix or contact matrices for individual sequence pairs, and is tested by leave-one-out cross validation. The performance averaged over a set of 115 DDIs collected from the 3 DID database shows significant improvement (sensitivity up to 85%, and specificity up to 85%), as compared with a multiple sequence alignment-based method (sensitivity 57%, and specificity 78%) previously reported in the literature. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

international conference on bioinformatics | 2009

Constrained Fisher Scores Derived from Interaction Profile Hidden Markov Models Improve Protein to Protein Interaction Prediction

Alvaro J. González; Li Liao

Protein-protein interaction plays critical roles in cellular functions. In this paper, we propose a computational method to predict protein-protein interaction by using support vector machines and the constrained Fisher scores derived from interaction profile hidden Markov models (ipHMM) that characterize domains involved in the interaction. The constrained Fisher scores are obtained as the gradient, with respect to the model parameters, of the posterior probability for the protein to be aligned with the ipHMM as conditioned on a specified path through the model state space, in this case we used the most probable path ---as determined by the Viterbi algorithm. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy measured by ROC score has shown significant improvement as compared to the previous methods.

IEEE Transactions on Nanobioscience | 2013

Predicting Interacting Residues Using Long-Distance Information and Novel Decoding in Hidden Markov Models

Colin Kern; Alvaro J. González; Li Liao; K. Vijay-Shanker

Identification of interacting residues involved in protein-protein and protein-ligand interaction is critical for the prediction and understanding of the interaction and has practical impact on mutagenesis and drug design. In this work, we introduce a new decoding algorithm, ETB-Viterbi, with an early traceback mechanism, and apply it to interaction profile hidden Markov models (ipHMMs) to enable optimized incorporation of long-distance correlations between interacting residues, leading to improved prediction accuracy. The method was applied and tested to a set of domain-domain interaction families from the 3DID database, and showed statistically significant improvement in accuracy measured by F-score. To gauge and assess the methods effectiveness and robustness in capturing the correlation signals, sets of simulated data based on the 3DID dataset with controllable correlation between interacting residues were also used, as well as reversed sequence orientation. It was demonstrated that the prediction consistently improves as the correlations increase and is not significantly affected by sequence orientation.

BMC Bioinformatics | 2008

Clustering exact matches of pairwise sequence alignments by weighted linear regression

Alvaro J. González; Li-Ying Liao

BackgroundAt intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.ResultsWe have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest.ConclusionThis algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.

Journal of Electronic Imaging | 2008

Alpha stable modeling of human visual systems for digital halftoning in rectangular and hexagonal grids

Alvaro J. González; Jan Bacca Rodríguez; Gonzalo R. Arce; Daniel L. Lau

Human visual system (HVS) modeling has become a critical component in the design of digital halftoning algorithms. Methods that exploit the characteristics of the HVS include the direct binary search (DBS) and optimized tone-dependent halftoning approaches. The spatial sensitivity of the HVS is low-pass in nature, reflecting the physiological characteristics of the eye. Several HVS models have been proposed in the literature, among them, the broadly used Nasanen’s exponential model, which was later shown to be constrained in shape. Richer models are needed to attain better halftone attributes and to control the appearance of undesired patterns. As an alternative, models based on the mixture of bivariate Gaussian density functions have been proposed. The mathematical characteristics of the HVS model thus play a key role in the synthesis of model-based halftoning. In this work, alpha stable functions, an elegant class of functions richer than mixed Gaussians, are exploited to design HVS models to be used in two different contexts: monochrome halftoning over rectangular and hexagonal sampling grids. In the two scenarios, alpha stable models prove to be more efficient than Gaussian mixtures, as they use less parameters to characterize the tails and bandwidth of the model. It is shown that a decrease in the model’s bandwidth leads to homogeneous halftone patterns, and conversely, models with heavier tails yield smoother textures. These characteristics, added to their simplicity, make alpha stable models a powerful tool for HVS characterization.

electronic imaging | 2006

Alpha stable human visual system models for digital halftoning

Alvaro J. González; Jan Bacca; Gonzalo R. Arce; Daniel L. Lau

Human visual system (HVS) modeling has become a critical component in the design of digital halftoning algorithms. Methods that exploit the characteristics of the HVS include the direct binary search (DBS) and optimized tone-dependent halftoning approaches. The spatial sensitivity of the HVS is lowpass in nature, reflecting the physiological characteristics of the eye. Several HVS models have been proposed in the literature, among them, the broadly used Nasanens exponential model. As shown experimentally by Kim and Allebach,1 Nasanens model is constrained in shape and richer models are needed in order to attain better halftone attributes and to control the appearance of undesired patterns. As an alternative, they proposed a class of HVS models based on mixtures of bivariate Gaussian density functions. The mathematical characteristics of the HVS model thus play a key role in the synthesis of model-based halftoning. In this work, alpha stable functions, an elegant class of models richer than mixed Gaussians, are exploited. These are more efficient than Gaussian mixtures as they use less parameters to characterize the tails and bandwidth of the model. It is shown that a decrease in the models bandwidth leads to homogeneous halftone patterns and conversely, models with heavier tails yield smoother textures. These characteristics, added to their simplicity, make alpha stable models a powerful tool for HVS characterization.

bioinformatics and biomedicine | 2009

Predicting functional sites in biological sequences using canonical correlation analysis

Alvaro J. González; Li Liao; Cathy H. Wu

Protein functional site prediction plays a key role in understanding protein function and in protein engineering. In this work we developed a novel method using canonical correlation analysis to predict protein ligand binding sites. The method was tested with a well-known benchmark dataset and consistently outperformed the existing method Xdet, which is based on Pearson correlation, by improving the lowest and highest ranked positives for more than 18% and 22% respectively.

bioinformatics and biomedicine | 2012

Improving interacting residue prediction using long-distance information in hidden Markov models

Colin Kern; Alvaro J. González; Li Liao; K. Vijay-Shanker

Identification of interacting residues involved in protein-protein and protein-ligand interaction is critical for the prediction and understanding of the interaction and has practical impact on mutagenesis and drug design. In this work, we introduce a new decoding algorithm, ETB-Viterbi, with early trace back mechanism built into interaction profile hidden Markov models (ipHMMs) that can incorporate the long-distance correlations between interacting residues to improve prediction accuracy. The method was applied and tested to a set of domain-domain interaction families from the 3DID database, and showed statistically significant improvement in accuracy measured by F-score. To gauge and assess the methods effectiveness in capturing the correlation signals, sets of simulated data based on the 3DID dataset with controllable correlation between interacting residues were also used, and it was demonstrated that the prediction consistently improves as the correlations increase.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012