Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Brian R. King is active.

Publication


Featured researches published by Brian R. King.


Genome Biology | 2007

ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes

Brian R. King; Chittibabu Guda

We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.


BMC Research Notes | 2012

ngLOC: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes

Brian R. King; Suleyman Vural; Sanjit Pandey; Alex Barteau; Chittibabu Guda

BackgroundUnderstanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function.FindingsWe present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively.ConclusionsngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu.


BMC Bioinformatics | 2013

Mining for class-specific motifs in protein sequence classification

Satish Mahadevan Srinivasan; Suleyman Vural; Brian R. King; Chittibabu Guda

BackgroundIn protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class.ResultsWe present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n- grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks.ConclusionThe proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.


PLOS ONE | 2009

A top-down approach to infer and compare domain-domain interactions across eight model organisms.

Chittibabu Guda; Brian R. King; Lipika R. Pal; Purnima Guda

Knowledge of specific domain-domain interactions (DDIs) is essential to understand the functional significance of protein interaction networks. Despite the availability of an enormous amount of data on protein-protein interactions (PPIs), very little is known about specific DDIs occurring in them. Here, we present a top-down approach to accurately infer functionally relevant DDIs from PPI data. We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a ‘sliding conservation’ pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research.


Scientific Programming | 2008

Semi-supervised learning for classification of protein sequence data

Brian R. King; Chittibabu Guda

Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.


Eurasip Journal on Bioinformatics and Systems Biology | 2014

Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity

Brian R. King; Maurice F. Aburdene; Alexander P Thompson; Zach Warres

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.


The Open Applied Informatics Journal | 2009

Estimation of Subcellular Proteomes in Bacterial Species

Brian R. King; Lance Latham; Chittibabu Guda

Computational methods for predicting the subcellular localization of bacterial proteins play a crucial role in the ongoing efforts to annotate the function of these proteins and to suggest potential drug targets. These methods, used in combination with other experimental and computational methods, can play an important role in biomedical research by annotating the proteomes of a wide variety of bacterial species. We use the ngLOC method, a Bayesian classifier that pre- dicts the subcellular localization of a protein based on the distribution of n-grams in a curated dataset of experimentally- determined proteins. Subcellular localization was predicted with an overall accuracy of 89.7% and 89.3% for Gram- negative and Gram-positive bacteria protein sequences, respectively. Through the use of a confidence score threshold, we improve the precision to 96.6% while covering 84.4% of Gram-negative bacterial data, and 96.0% while covering 87.9% of Gram-positive data. We use this method to estimate the subcellular proteomes of ten Gram-negative species and five Gram-positive species, covering an average of 74.7% and 80.6% of the proteome for Gram-negative and Gram-positive sequences, respectively. The current method is useful for large-scale analysis and annotation of the subcellular proteomes of bacterial species. We demonstrate that our method has excellent predictive performance while achieving superior pro- teome coverage compared to other popular methods such as PSORTb and PLoc.


international conference on bioinformatics | 2014

Predicting protein contact maps by bagging decision trees

Chuqiao Ren; Brian R. King

Substantial progress has been made in the prediction of protein structure. Despite these achievements, the computational complexity of protein folding remains a challenge. Instead, many methods aim to predict a protein contact map from sequence. We introduce an ensemble method for protein contact map prediction based on bagging multiple decision trees. The amino acid alphabet is clustered to improve generality. A random sampling method is used to address the large class imbalance in contact maps. Our results show that our technique performs favorably against existing methods.


PLOS ONE | 2018

State tobacco control expenditures and tax paid cigarette sales

John A. Tauras; Xin Xu; Jidong Huang; Brian R. King; S. René Lavinghouze; Karla S. Sneegas; Frank J. Chaloupka

This research is the first nationally representative study to examine the relationship between actual state-level tobacco control spending in each of the 5 CDC’s Best Practices for Comprehensive Tobacco Control Program categories and cigarette sales. We employed several alternative two-way fixed-effects regression techniques to estimate the determinants of cigarette sales in the United States for the years 2008–2012. State spending on tobacco control was found to have a negative and significant impact on cigarette sales in all models that were estimated. Spending in the areas of cessation interventions, health communication interventions, and state and community interventions were found to have a negative impact on cigarette sales in all models that were estimated, whereas spending in the areas of surveillance and evaluation, and administration and management were found to have negative effects on cigarette sales in only some models. Our models predict that states that spend up to seven times their current levels could still see significant reductions in cigarette sales. The findings from this research could help inform further investments in state tobacco control programs.


international conference on bioinformatics | 2013

ngPhylo: N-Gram Modeled Proteins with Substitution Matrices for Phylogenetic Analysis

Brigitte Hofmeister; Brian R. King

Phylogenetic tree constructions are important for understanding evolution and species relatedness. Most methods require a multiple sequence alignment (MSA) to be performed prior to inducing the phylogenetic tree. MSAs, however, are computationally expensive and increasingly error prone as the number of sequences increase, as the average sequence length increases, and as the sequences in the set become more divergent. We introduce a new method called ngPhylo, an n-gram based method that addresses many of the limitations of MSA-based phylogenetic methods, and computes alignment-free phylogenetic analyses on large sets of proteins that also have long sequences. Unlike other methods, we incorporate the use of standard substitution matrices to improve similarity measures between sequences. Our results show that highly similar phylogenies are produced to existing MSA-based methods with less computational resources required.

Collaboration


Dive into the Brian R. King's collaboration.

Top Co-Authors

Avatar

Chittibabu Guda

University of Nebraska Medical Center

View shared research outputs
Top Co-Authors

Avatar

Suleyman Vural

University of Nebraska Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ashwin Satyanarayana

New York City College of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Frank J. Chaloupka

University of Illinois at Chicago

View shared research outputs
Researchain Logo
Decentralizing Knowledge