Anthony K. Yan
Duke University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anthony K. Yan.
Journal of Computational Biology | 2004
Christopher James Langmead; Anthony K. Yan; Ryan H. Lilien; Lincong Wang; Bruce Randall Donald
High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of a homologous structure. These assignments are a prerequisite for probing protein-protein interactions, protein-ligand binding, and dynamics by NMR. Assignments are also the starting point for structure determination and refinement. A new algorithm, called Nuclear Vector Replacement (NVR) is introduced to compute assignments that optimally correlate experimentally measured NH residual dipolar couplings (RDCs) to a given a priori whole-protein 3D structural model. The algorithm requires only uniform( 15)N-labeling of the protein and processes unassigned H(N)-(15)N HSQC spectra, H(N)-(15)N RDCs, and sparse H(N)-H(N) NOEs (d(NN)s), all of which can be acquired in a fraction of the time needed to record the traditional suite of experiments used to perform resonance assignments. NVR runs in minutes and efficiently assigns the (H(N),(15)N) backbone resonances as well as the d(NN)s of the 3D (15)N-NOESY spectrum, in O(n(3)) time. The algorithm is demonstrated on NMR data from a 76-residue protein, human ubiquitin, matched to four structures, including one mutant (homolog), determined either by x-ray crystallography or by different NMR experiments (without RDCs). NVR achieves an assignment accuracy of 92-100%. We further demonstrate the feasibility of our algorithm for different and larger proteins, using NMR data for hen lysozyme (129 residues, 97-100% accuracy) and streptococcal protein G (56 residues, 100% accuracy), matched to a variety of 3D structural models. Finally, we extend NVR to a second application, 3D structural homology detection, and demonstrate that NVR is able to identify structural homologies between proteins with remote amino acid sequences using a database of structural models.
Journal of the American Chemical Society | 2012
Brian E. Coggins; Jonathan W. Werner-Allen; Anthony K. Yan; Pei Zhou
In structural studies of large proteins by NMR, global fold determination plays an increasingly important role in providing a first look at a targets topology and reducing assignment ambiguity in NOESY spectra of fully protonated samples. In this work, we demonstrate the use of ultrasparse sampling, a new data processing algorithm, and a 4-D time-shared NOESY experiment (1) to collect all NOEs in (2)H/(13)C/(15)N-labeled protein samples with selectively protonated amide and ILV methyl groups at high resolution in only four days, and (2) to calculate global folds from this data using fully automated resonance assignment. The new algorithm, SCRUB, incorporates the CLEAN method for iterative artifact removal but applies an additional level of iteration, permitting real signals to be distinguished from noise and allowing nearly all artifacts generated by real signals to be eliminated. In simulations with 1.2% of the data required by Nyquist sampling, SCRUB achieves a dynamic range over 10000:1 (250× better artifact suppression than CLEAN) and completely quantitative reproduction of signal intensities, volumes, and line shapes. Applied to 4-D time-shared NOESY data, SCRUB processing dramatically reduces aliasing noise from strong diagonal signals, enabling the identification of weak NOE crosspeaks with intensities 100× less than those of diagonal signals. Nearly all of the expected peaks for interproton distances under 5 Å were observed. The practical benefit of this method is demonstrated with structure calculations for 23 kDa and 29 kDa test proteins using the automated assignment protocol of CYANA, in which unassigned 4-D time-shared NOESY peak lists produce accurate and well-converged global fold ensembles, whereas 3-D peak lists either fail to converge or produce significantly less accurate folds. The approach presented here succeeds with an order of magnitude less sampling than required by alternative methods for processing sparse 4-D data.
Proteins | 2006
Shobha Potluri; Anthony K. Yan; James J. Chou; Bruce Randall Donald; Chris Bailey-Kellogg
Structural studies of symmetric homo‐oligomers provide mechanistic insights into their roles in essential biological processes, including cell signaling and cellular regulation. This paper presents a novel algorithm for homo‐oligomeric structure determination, given the subunit structure, that is both complete, in that it evaluates all possible conformations, and data‐driven, in that it evaluates conformations separately for consistency with experimental data and for quality of packing. Completeness ensures that the algorithm does not miss the native conformation, and being data‐driven enables it to assess the structural precision possible from data alone. Our algorithm performs a branch‐and‐bound search in the symmetry configuration space, the space of symmetry axis parameters (positions and orientations) defining all possible Cn homo‐oligomeric complexes for a given subunit structure. It eliminates those symmetry axes inconsistent with intersubunit nuclear Overhauser effect (NOE) distance restraints and then identifies conformations representing any consistent, well‐packed structure to within a user‐defined similarity level.
research in computational molecular biology | 2002
Christopher James Langmead; Anthony K. Yan; C. Robertson McClung; Bruce Randall Donald
We introduce a model-based analysis technique for extracting and characterizing rhythmic expression profiles from genome-wide DNA microarray hybridization data. These patterns are clues to discovering rhythmic genes implicated in cell-cycle, circadian, and other biological processes. The algorithm, implemented in a program called RAGE (Rhythmic Analysis of Gene Expression), decouples the problems of estimating a patterns periodicity and phase. Our algorithm is linear-time in frequency and phase resolution, an improvement over previous quadratic-time approaches. Unlike previous approaches, RAGE uses a true distance metric for measuring expression profile similarity, based on the Hausdorff distance. This results in better clustering of expression profiles for rhythmic analysis. The confidence of each frequency estimate is computed using Z-scores. We demonstrate that RAGE is superior to other techniques on synthetic and actual DNA microarray hybridization data. We also show how to replace the discretized phase search in our method with an exact (combinatorially precise) phase search, resulting in a faster algorithm with no complexity dependence on phase resolution.
Journal of Biomolecular NMR | 2009
Jianyang Zeng; Jeffrey Boyles; Chittaranjan Tripathy; Lincong Wang; Anthony K. Yan; Pei Zhou; Bruce Randall Donald
We present a novel structure determination approach that exploits the global orientational restraints from RDCs to resolve ambiguous NOE assignments. Unlike traditional approaches that bootstrap the initial fold from ambiguous NOE assignments, we start by using RDCs to compute accurate secondary structure element (SSE) backbones at the beginning of structure calculation. Our structure determination package, called rdc-Panda (RDC-based SSE PAcking with NOEs for Structure Determination and NOE Assignment), consists of three modules: (1) rdc-exact; (2) Packer; and (3) hana (HAusdorff-based NOE Assignment). rdc-exact computes the global optimal solution of backbone dihedral angles for each secondary structure element by exactly solving a system of quartic RDC equations derived by Wang and Donald (Proceedings of the IEEE computational systems bioinformatics conference (CSB), Stanford, CA, 2004a; J Biomol NMR 29(3):223–242, 2004b), and systematically searching over the roots, each of which is a backbone dihedral ϕ- or ψ-angle consistent with the RDC data. Using a small number of unambiguous inter-SSE NOEs extracted using only chemical shift information, Packer performs a systematic search for the core structure, including all SSE backbone conformations. hana uses a Hausdorff-based scoring function to measure the similarity between the experimental spectra and the back-computed NOE pattern for each side-chain from a statistically-diverse rotamer library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Finally, a local minimization approach is used to compute the loops and refine side-chain conformations by fixing the core structure as a rigid body while allowing movement of loops and side-chains. rdc-Panda was applied to NMR data for the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI). These results demonstrated the efficiency and accuracy of our algorithm, and show that rdc-Panda can be successfully applied for high-resolution protein structure determination using only a limited set of NMR data by first computing RDC-defined backbones.
Protein Science | 2006
Shobha Potluri; Anthony K. Yan; Bruce Randall Donald; Chris Bailey-Kellogg
Assignment of nuclear Overhauser effect (NOE) data is a key bottleneck in structure determination by NMR. NOE assignment resolves the ambiguity as to which pair of protons generated the observed NOE peaks, and thus should be restrained in structure determination. In the case of intersubunit NOEs in symmetric homo‐oligomers, the ambiguity includes both the identities of the protons within a subunit, and the identities of the subunits to which they belong. This paper develops an algorithm for simultanous intersubunit NOE assignment and Cn symmetric homo‐oligomeric structure determinations, given the subunit structure. By using a configuration space framework, our algorithm guarantees completeness, in that it identifies structures representing, to within a user‐defined similarity level, every structure consistent with the available data (ambiguous or not). However, while our approach is complete in considering all conformations and assignments, it avoids explicit enumeration of the exponential number of combinations of possible assignments. Our algorithm can draw two types of conclusions not possible under previous methods: (1) that different assignments for an NOE would lead to different structural classes, or (2) that it is not necessary to uniquely assign an NOE, since it would have little impact on structural precision. We demonstrate on two test proteins that our method reduces the average number of possible assignments per NOE by a factor of 2.6 for MinE and 4.2 for CCMP. It results in high structural precision, reducing the average variance in atomic positions by factors of 1.5 and 3.6, respectively.
research in computational molecular biology | 2003
Christopher James Langmead; Anthony K. Yan; Ryan H. Lilien; Lincong Wang; Bruce Randall Donald
High-throughput NMR structural biology can play an important role in structural genomics. We report an automated procedure for high-throughput NMR resonance assignment for a protein of known structure, or of an homologous structure. These assignments are a prerequisite for probing protein-protein interactions, protein-ligand binding, and dynamics by NMR. Assignments are also the starting point for structure determination and refinement. A new algorithm, called Nuclear Vector Replacement (NVR) is introduced to compute assignments that optimally correlate experimentally-measured NH residual dipolar couplings (RDCs) to a given a priori whole-protein 3D structural model. The algorithm requires only uniform 15N-labelling of the protein, and processes unassigned HN-15N HSQC spectra, HN-15N RDCs, and sparse HN-HN NOEs dNNs), all of which can be acquired in a fraction of the time needed to record the traditional suite of experiments used to perform resonance assignments. NVR runs in minutes and efficiently assigns the (HN,15N) backbone resonances as well as the dNNs of the 3D \nfif-NOESY spectrum, in O(n3) time. The algorithm is demonstrated on NMR data from a 76-residue protein, human ubiquitin, matched to four structures, including one mutant (homolog), determined either by X-ray crystallography or by different NMR experiments (without RDCs). NVR achieves an average assignment accuracy of over 90%. We further demonstrate the feasibility of our algorithm for different and larger proteins, using NMR data for hen lysozyme (129 residues, 98% accuracy) and streptococcal protein G (56 residues, 95% accuracy), matched to a variety of 3D structural models. Finally, we extend NVR to a second application, 3D structural homology detection, and demonstrate that NVR is able to identify structural homologies between proteins with remote amino acid sequences using a database of structural models.
Journal of Computational Biology | 2003
Christopher James Langmead; Anthony K. Yan; C. Robertson McClung; Bruce Randall Donald
We introduce a model-based analysis technique for extracting and characterizing rhythmic expression profiles from genome-wide DNA microarray hybridization data. These patterns are clues to discovering rhythmic genes implicated in cell-cycle, circadian, or other biological processes. The algorithm, implemented in a program called RAGE (Rhythmic Analysis of Gene Expression), decouples the problems of estimating a patterns wavelength and phase. Our algorithm is linear-time in frequency and phase resolution, an improvement over previous quadratic-time approaches. Unlike previous approaches, RAGE uses a true distance metric for measuring expression profile similarity, based on the Hausdorff distance. This results in better clustering of expression profiles for rhythmic analysis. The confidence of each frequency estimate is computed using Z-scores. We demonstrate that RAGE is superior to other techniques on synthetic and actual DNA microarray hybridization data. We also show how to replace the discretized phase search in our method with an exact (combinatorially precise) phase search, resulting in a faster algorithm with no complexity dependence on phase resolution.
Protein Science | 2011
Jeffrey W. Martin; Anthony K. Yan; Chris Bailey-Kellogg; Pei Zhou; Bruce Randall Donald
High‐resolution structure determination of homo‐oligomeric protein complexes remains a daunting task for NMR spectroscopists. Although isotope‐filtered experiments allow separation of intermolecular NOEs from intramolecular NOEs and determination of the structure of each subunit within the oligomeric state, degenerate chemical shifts of equivalent nuclei from different subunits make it difficult to assign intermolecular NOEs to nuclei from specific pairs of subunits with certainty, hindering structural analysis of the oligomeric state. Here, we introduce a graphical method, DISCO, for the analysis of intermolecular distance restraints and structure determination of symmetric homo‐oligomers using residual dipolar couplings. Based on knowledge that the symmetry axis of an oligomeric complex must be parallel to an eigenvector of the alignment tensor of residual dipolar couplings, we can represent distance restraints as annuli in a plane encoding the parameters of the symmetry axis. Oligomeric protein structures with the best restraint satisfaction correspond to regions of this plane with the greatest number of overlapping annuli. This graphical analysis yields a technique to characterize the complete set of oligomeric structures satisfying the distance restraints and to quantitatively evaluate the contribution of each distance restraint. We demonstrate our method for the trimeric E. coli diacylglycerol kinase, addressing the challenges in obtaining subunit assignments for distance restraints. We also demonstrate our method on a dimeric mutant of the immunoglobulin‐binding domain B1 of streptococcal protein G to show the resilience of our method to ambiguous atom assignments. In both studies, DISCO computed oligomer structures with high accuracy despite using ambiguously assigned distance restraints.
The International Journal of Robotics Research | 2005
Anthony K. Yan; Christopher James Langmead; Bruce Randall Donald
High-throughput nuclear magnetic resonance (NMR) structural biology and NMR structural genomics pose a fascinating set of geometric challenges. A key bottleneck in NMR structural biology is the resonance assignment problem. We seek to accelerate protein NMR resonance assignment and structure determination by exploiting a priori structural information. In particular, a method known as nuclear vector replacement (NVR) has been proposed as a method for solving the assignment problem given a priori structural information. Among several different types of input data, NVR uses a particular type of NMR data known as residual dipolar couplings (RDCs). The basic physics of RDCs tells us that the data should be explainable by a structural model and set of parameters contained within the “Saupe alignment tensor”. In the NVR algorithm, one estimates the Saupe alignment tensors and then proceeds to refine those estimates. We would like to quantify the accuracy of such estimates, where we compare the estimated Saupe matrix to the correct Saupe matrix. In this work, we propose a way to quantify this comparison. Given a correct Saupe matrix and an estimated Saupe matrix, we compute an upper bound on the probability that a randomly rotated Saupe tensor would have an error smaller than the estimated Saupe matrix. This has the advantage of being a quantified upper bound, which also has a clear interpretation in terms of geometry and probability. While the specific application of our rotation probability results is given to NVR, our novel methods can be used for any RDC-based algorithm to bound the accuracy of the estimated alignment tensors. Furthermore, they could also be used in X-ray crystallography or molecular docking to quantitate the accuracy of calculated rotations of proteins, protein domains, nucleic acids, or small molecules.