Huzefa Rangwala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Huzefa Rangwala is active.

Explore More

Publication

Featured researches published by Huzefa Rangwala.

Bioinformatics | 2005

Profile-based direct kernels for remote homology detection and fold recognition

Huzefa Rangwala; George Karypis

MOTIVATIONnProtein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them.nnnRESULTSnWe introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes.nnnAVAILABILITYnThe programs for computing the various kernel functions are available on request from the authors.

Proteins | 2008

fRMSDPred: Predicting local RMSD between structural fragments using sequence information

Huzefa Rangwala; George Karypis

The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence‐structure alignment. Motivated by the approaches used to align protein structures, this article focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment‐level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high‐quality alignment segments. We present algorithms to solve this fragment‐level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second‐order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared with the profile‐to‐profile scoring schemes. We also show that for protein pairs with low sequence similarity (less than 12% sequence identity) these new local structural features alone or in conjunction with profile‐based information lead to alignments that are considerably accurate than those obtained by schemes that use only profile and/or predicted secondary structure information. Proteins 2008.

BMC Bioinformatics | 2006

Building multiclass classifiers for remote homology detection and fold recognition

Huzefa Rangwala; George Karypis

BackgroundProtein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems.ResultsWe present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes.ConclusionAnalyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results.

computational systems bioinformatics | 2008

Improving Homology Models for Protein-Ligand Binding Sites

Chris Kauffman; Huzefa Rangwala; George Karypis

In order to improve the prediction of protein-ligand binding sites through homology modeling, we incorporate knowledge of the binding residues into the modeling framework. Residues are identified as binding or nonbinding based on their true labels as well as labels predicted from structure and sequence. The sequence predictions were made using a support vector machine framework which employs a sophisticated window-based kernel. Binding labels are used with a very sensitive sequence alignment method to align the target and template. Relevant parameters governing the alignment process are searched for optimal values. Based on our results, homology models of the binding site can be improved if a priori knowledge of the binding residues is available. For target-template pairs with low sequence identity and high structural diversity our sequence-based prediction method provided sufficient information to realize this improvement.

Bioinformatics | 2007

Incremental window-based protein sequence alignment algorithms

Huzefa Rangwala; George Karypis

MOTIVATIONnProtein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling.nnnMETHODSnWe have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO.nnnRESULTSnWe evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable--a critical piece of information for comparative modeling applications.

european conference on machine learning | 2008

TOPTMH: topology predictor for transmembrane α-helices

Rezwan Ahmed; Huzefa Rangwala; George Karypis

Alpha-helical transmembrane proteins mediate many key biological processes and represent 20%-30% of all genes in many organisms. Due to the difficulties in experimentally determining their high-resolution 3D structure, computational methods to predict the location and orientation of transmembrane helix segments using sequence information are essential. We present TOPTMH, a new transmembrane helix topology prediction method that combines support vector machines, hidden Markov models, and a widely used rule-based scheme. The contribution of this work is the development of a prediction approach that first uses a binary SVM classifier to predict the helix residues and then it employs a pair of HMM models that incorporate the SVM predictions and hydropathy-based features to identify the entire transmembrane helix segments by capturing the structural characteristics of these proteins. TOPTMH outperforms state-of-the-art prediction methods and achieves the best performance on an independent static benchmark.

computational systems bioinformatics | 2007

fRMSDPred: predicting local RMSD between structural fragments using sequence information.

Huzefa Rangwala; George Karypis

The effectiveness of comparative modeling approaches for protein structure prediction can be substantially improved by incorporating predicted structural information in the initial sequence-structure alignment. Motivated by the approaches used to align protein structures, this paper focuses on developing machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel functions. Our comprehensive empirical study shows superior results compared to the profile-to-profile scoring schemes.

asia pacific bioinformatics conference | 2007

fRMSDAlign: Protein Sequence Alignment Using Predicted Local Structure Information for Pairs with Low Sequence Identity

Huzefa Rangwala; George Karypis

As the sequence identity between a pair of proteins decreases, alignment strategies that are based on sequence and/or sequence profiles become progressively less effective in identifying the correct structural correspondence between residue pairs. This significantly reduces the ability of comparative modelingbased approaches to build accurate structural models. Incorporating into the alignment process predicted information about the local structure of the protein holds the promise of significantly improving the alignment quality of distant proteins. This paper studies the impact on the alignment quality of a new class of predicted local structural features that measure how well fixed-length backbone fragments centered around each residue-pair align with each other. It presents a comprehensive experimental evaluation comparing these new features against existing state-of-the-art approaches utilizing profile-based and predicted secondary-structure information. It shows that for protein pairs with low sequence similarity (less than 12% sequence identity) the new structural features alone or in conjunction with profile-based information lead to alignments that are considerably better than those obtained by previous schemes.

Archive | 2007