Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wyatt T. Clark is active.

Publication


Featured researches published by Wyatt T. Clark.


Proteins | 2008

An integrated approach to inferring gene-disease associations in humans.

Predrag Radivojac; Kang Peng; Wyatt T. Clark; Brandon Peters; Amrita Mohan; Sean M. Boyle; Sean D. Mooney

One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene–disease associations based on the human protein–protein interaction network, known gene–disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene–disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously. Availability: www.phenopred.org. Proteins 2008.


PLOS Computational Biology | 2011

Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals

Nathan L. Nehrt; Wyatt T. Clark; Predrag Radivojac; Matthew W. Hahn

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.


Proteins | 2011

Analysis of protein function and its prediction from amino acid sequence

Wyatt T. Clark; Predrag Radivojac

Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high‐throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30–100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN‐GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure ofGO. Proteins 2011;


Nature Communications | 2015

Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms

Alexej Abyzov; Shantao Li; Daniel Rhee Kim; Marghoob Mohiyuddin; Adrian M. Stütz; Nicholas F. Parrish; Xinmeng Jasmine Mu; Wyatt T. Clark; Ken Chen; Jan O. Korbel; Hugo Y. K. Lam; Charles Lee; Mark Gerstein

Continuous and precise space-based photometry has made it p oss ble to measure the orbital frequency modulation of pulsating stars in binary systems w ith extremely high precision over long time spans. Frequency modulation caused by binary orbi tal motion manifests itself as a multiplet with equal spacing of the orbital frequency in the Fourier transform. The amplitudes and phases of the peaks in these multiplets reflect the orbita l properties, hence the orbital parameters can be extracted by analysing such precise photo metric data alone. We derive analytically the theoretical relations between the multiple t roperties and the orbital parameters, and present a method for determining these parameters, incl ud g the eccentricity and the argument of periapsis, from a quintuplet or a higher order mult iplet. This is achievable with the photometry alone, without spectroscopic radial velocity m easurements. We apply this method to Keplermission data of KIC 8264492, KIC 9651065, and KIC 10990452, e ach of which is shown to have an eccentricity exceeding 0.5. Radial velocit y curves are also derived from the Kepler photometric data. We demonstrate that the results are in goo d agreement with those obtained by another technique based on the analysis of the pu lsation phases.Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyze 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence micro-insertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These micro-insertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.


Bioinformatics | 2013

Information-theoretic evaluation of predicted ontological annotations.

Wyatt T. Clark; Predrag Radivojac

Motivation: The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, with protein function prediction and disease gene prioritization gaining wide recognition. Although various algorithms have been proposed for these tasks, evaluating their performance is difficult owing to problems caused both by the structure of biomedical ontologies and biased or incomplete experimental annotations of genes and gene products. Results: We propose an information-theoretic framework to evaluate the performance of computational protein function prediction. We use a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein’s function. We then define two concepts, misinformation and remaining uncertainty, that can be seen as information-theoretic analogs of precision and recall. Finally, we propose a single statistic, referred to as semantic distance, that can be used to rank classification models. We evaluate our approach by analyzing the performance of three protein function predictors of Gene Ontology terms and provide evidence that it addresses several weaknesses of currently used metrics. We believe this framework provides useful insights into the performance of protein function prediction tools. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2014

The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective.

Yuxiang Jiang; Wyatt T. Clark; Iddo Friedberg; Predrag Radivojac

Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy. Results: We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Human Mutation | 2011

Prediction of functional regulatory SNPs in monogenic and complex disease

Yiqiang Zhao; Wyatt T. Clark; Matthew Mort; David Neil Cooper; Predrag Radivojac; Sean D. Mooney

Next‐generation sequencing (NGS) technologies are yielding ever higher volumes of human genome sequence data. Given this large amount of data, it has become both a possibility and a priority to determine how disease‐causing single nucleotide polymorphisms (SNPs) detected within gene regulatory regions (rSNPs) exert their effects on gene expression. Recently, several studies have explored whether disease‐causing polymorphisms have attributes that can distinguish them from those that are neutral, attaining moderate success at discriminating between functional and putatively neutral regulatory SNPs. Here, we have extended this work by assessing the utility of both SNP‐based features (those associated only with the polymorphism site and the surrounding DNA) and gene‐based features (those derived from the associated gene in whose regulatory region the SNP lies) in the identification of functional regulatory polymorphisms involved in either monogenic or complex disease. Gene‐based features were found to be capable of both augmenting and enhancing the utility of SNP‐based features in the prediction of known regulatory mutations. Adopting this approach, we achieved an AUC of 0.903 for predicting regulatory SNPs. Finally, our tool predicted 225 new regulatory SNPs with a high degree of confidence, with 105 of the 225 falling into linkage disequilibrium blocks of reported disease‐associated genome‐wide association studies SNPs. Hum Mutat 32:1183–1190, 2011. ©2011 Wiley‐Liss, Inc.


Proceedings of the National Academy of Sciences of the United States of America | 2014

Comparative analysis of pseudogenes across three phyla

Cristina Sisu; Baikang Pei; Jing Leng; Adam Frankish; Zhang Y; Suganthi Balasubramanian; Rachel A. Harte; Daifeng Wang; Michael Rutenberg-Schoenberg; Wyatt T. Clark; Mark Diekhans; Joel Rozowsky; Tim Hubbard; Jennifer Harrow; Mark Gerstein

Significance Pseudogenes have long been considered nonfunctional elements. However, recent studies have shown they can potentially regulate the expression of protein-coding genes. Capitalizing on available functional-genomics data and the finished annotation of human, worm, and fly, we compared the pseudogene complements across the three phyla. We found that in contrast to protein-coding genes, pseudogenes are highly lineage specific, reflecting genome history more so than the conservation of essential biological functions. Specifically, the human pseudogene complement reflects a massive burst of retrotranspositional activity at the dawn of the primates, whereas the worm’s and flys repertoire reflects a history of deactivated duplications. However, we also observe that pseudogenes across the three phyla have a consistent level of partial activity, with ∼15% being transcribed. Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism’s genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.


Frontiers in Bioscience | 2008

From protein-disease associations to disease informatics.

Dalkilic Mm; Costello Jc; Wyatt T. Clark; Predrag Radivojac

Advancements in high-throughput technology and computational power have brought about significant progress in our understanding of cellular processes, including an increased appreciation of the intricacies of disease. The computational biology community has made strides in characterizing human disease and implementing algorithms that will be used in translational medicine. Despite this progress, most of the identified biomarkers and proposed methodologies have still not achieved the sensitivity and specificity to be effectively used, for example, in population screening against various diseases. Here we review the current progress in computational methodology developed to exploit major high-throughput experimental platforms towards improved understanding of disease, and argue that an integrated model for biomarker discovery, predictive medicine and treatment is likely to be data-driven and personalized. In such an approach, major data collection is yet to be done and comprehensive computational models are yet to be developed.


pacific symposium on biocomputing | 2013

Vector quantization kernels for the classification of protein sequences and structures.

Wyatt T. Clark; Predrag Radivojac

We propose a new kernel-based method for the classification of protein sequences and structures. We first represent each protein as a set of time series data using several structural, physicochemical, and predicted properties such as a sequence of consecutive dihedral angles, hydrophobicity indices, or predictions of disordered regions. A kernel function is then computed for pairs of proteins, exploiting the principles of vector quantization and subsequently used with support vector machines for protein classification. Although our method requires a significant pre-processing step, it is fast in the training and prediction stages owing to the linear complexity of kernel computation with the length of protein sequences. We evaluate our approach on two protein classification tasks involving the prediction of SCOP structural classes and catalytic activity according to the Gene Ontology. We provide evidence that the method is competitive when compared to string kernels, and useful for a range of protein classification tasks. Furthermore, the applicability of our approach extends beyond computational biology to any classification of time series data.

Collaboration


Dive into the Wyatt T. Clark's collaboration.

Top Co-Authors

Avatar

Predrag Radivojac

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Sean D. Mooney

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Amrita Mohan

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sean M. Boyle

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charles Lee

Brigham and Women's Hospital

View shared research outputs
Researchain Logo
Decentralizing Knowledge