Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nathaniel Echols is active.

Publication


Featured researches published by Nathaniel Echols.


Proteins | 2002

Normal mode analysis of macromolecular motions in a database framework: Developing mode concentration as a useful classifying statistic

Werner G. Krebs; Vadim Alexandrov; Cyrus A. Wilson; Nathaniel Echols; Haiyuan Yu; Mark Gerstein

We investigated protein motions using normal modes within a database framework, determining on a large sample the degree to which normal modes anticipate the direction of the observed motion and were useful for motions classification. As a starting point for our analysis, we identified a large number of examples of protein flexibility from a comprehensive set of structural alignments of the proteins in the PDB. Each example consisted of a pair of proteins that were considerably different in structure given their sequence similarity. On each pair, we performed geometric comparisons and adiabatic‐mapping interpolations in a high‐throughput pipeline, arriving at a final list of 3,814 putative motions and standardized statistics for each. We then computed the normal modes of each motion in this list, determining the linear combination of modes that best approximated the direction of the observed motion. We integrated our new motions and normal mode calculations in the Macromolecular Motions Database, through a new ranking interface at http://molmovdb.org. Based on the normal mode calculations and the interpolations, we identified a new statistic, mode concentration, related to the mathematical concept of information content, which describes the degree to which the direction of the observed motion can be summarized by a few modes. Using this statistic, we were able to determine the fraction of the 3,814 motions where one could anticipate the direction of the actual motion from only a few modes. We also investigated mode concentration in comparison to related statistics on combinations of normal modes and correlated it with quantities characterizing protein flexibility (e.g., maximum backbone displacement or number of mobile atoms). Finally, we evaluated the ability of mode concentration to automatically classify motions into a variety of simple categories (e.g., whether or not they are “fragment‐like”), in comparison to motion statistics. This involved the application of decision trees and feature selection (particular machine‐learning techniques) to training and testing sets derived from merging the “list” of motions with manually classified ones. Proteins 2002;48:682–695.


Nucleic Acids Research | 2003

MolMovDB: analysis and visualization of conformational change and structural flexibility

Nathaniel Echols; Duncan Milburn; Mark Gerstein

The Database of Macromolecular Movements (http://MolMovDB.org) is a collection of data and software pertaining to flexibility in protein and RNA structures. The database is organized into two parts. Firstly, a collection of morphs of solved structures representing different states of a molecule provides quantitative data for flexibility and a number of graphical representations. Secondly, a classification of known motions according to type of conformational change (e.g. hinged domain or allosteric) incorporates textual annotation and information from the literature relating to the motion, linking together many of the morphs. A variety of subsets of the morphs are being developed for use in statistical analyses. In particular, for each subset it is possible to derive distributions of various motional quantities (e.g. maximum rotation) that can be used to place a specific motion in context as being typical or atypical for a given population. Over the past year, the database has been greatly expanded and enhanced to incorporate new structures and to improve the quality of data. The morph server, which enables users of the database to add new morphs either from their own research or the PDB, has also been enhanced to handle nucleic acid structures and multi-chain complexes.


Nucleic Acids Research | 2006

The Database of Macromolecular Motions: new features added at the decade mark

Samuel C. Flores; Nathaniel Echols; Duncan Milburn; Brandon M. Hespenheide; Kevin S. Keating; Jason Lu; Stephen A. Wells; Eric Z. Yu; M. F. Thorpe; Mark Gerstein

The database of molecular motions, MolMovDB (), has been in existence for the past decade. It classifies macromolecular motions and provides tools to interpolate between two conformations (the Morph Server) and predict possible motions in a single structure. In 2005, we expanded the services offered on MolMovDB. In particular, we further developed the Morph Server to produce improved interpolations between two submitted structures. We added support for multiple chains to the original adiabatic mapping interpolation, allowing the analysis of subunit motions. We also added the option of using FRODA interpolation, which allows for more complex pathways, potentially overcoming steric barriers. We added an interface to a hinge prediction service, which acts on single structures and predicts likely residue points for flexibility. We developed tools to relate such points of flexibility in a structure to particular key residue positions, i.e. active sites or highly conserved positions. Lastly, we began relating our motion classification scheme to function using descriptions from the Gene Ontology Consortium.


Protein Science | 2005

Normal modes for predicting protein motions: a comprehensive database assessment and associated Web tool.

Vadim Alexandrov; Ursula Lehnert; Nathaniel Echols; Duncan Milburn; Donald M. Engelman; Mark Gerstein

We carry out an extensive statistical study of the applicability of normal modes to the prediction of mobile regions in proteins. In particular, we assess the degree to which the observed motions found in a comprehensive data set of 377 nonredundant motions can be modeled by a single normal‐mode vibration. We describe each motion in our data set by vectors connecting corresponding atoms in two crystallographically known conformations. We then measure the geometric overlap of these motion vectors with the displacement vectors of the lowest‐frequency mode, for one of the conformations. Our study suggests that the lowest mode contains useful information about the parts of a protein that move most (i.e., have the largest amplitudes) and about the direction of this movement. Based on our findings, we developed a Web tool for motion prediction (available from http://molmovdb.org/nma) and apply it here to four representative motions—from bacteriorhodopsin, calmodulin, insulin, and T7 RNA polymerase.


Nature Biotechnology | 2002

An integrated approach for finding overlooked genes in yeast

Anuj Kumar; Paul M. Harrison; Kei-Hoi Cheung; Ning Lan; Nathaniel Echols; Paul Bertone; Perry L. Miller; Mark Gerstein; Michael Snyder

We report here the discovery of 137 previously unappreciated genes in yeast through a widely applicable and highly scalable approach integrating methods of gene-trapping, microarray-based expression analysis, and genome-wide homology searching. Our approach is a multistep process in which expressed sequences are first trapped using a modified transposon that produces protein fusions to β-galactosidase (β-gal); non-annotated open reading frames (ORFs) translated as β-gal chimeras are selected as a candidate pool of potential genes. To verify expression of these sequences, labeled RNA is hybridized against a microarray of oligonucleotides designed to detect gene transcripts in a strand-specific manner. In complement to this experimental method, novel genes are also identified in silico by homology to previously annotated proteins. As these methods are capable of identifying both short ORFs and antisense ORFs, our approach provides an effective supplement to current gene-finding schemes. In total, the genes discovered using this approach constitute 2% of the yeast genome and represent a wealth of overlooked biology.


Nucleic Acids Research | 2003

ExpressYourself: a modular platform for processing and visualizing microarray data

Nicholas M. Luscombe; Thomas E. Royce; Paul Bertone; Nathaniel Echols; Christine E. Horak; Joseph T. Chang; Michael Snyder; Mark Gerstein

DNA microarrays are widely used in biological research; by analyzing differential hybridization on a single microarray slide, one can detect changes in mRNA expression levels, increases in DNA copy numbers and the location of transcription factor binding sites on a genomic scale. Having performed the experiments, the major challenge is to process large, noisy datasets in order to identify the specific array elements that are significantly differentially hybridized. This normally requires aggregating different, often incompatible programs into a multi-step pipeline. Here we present ExpressYourself, a fully integrated platform for processing microarray data. In completely automated fashion, it will correct the background array signal, normalize the Cy5 and Cy3 signals, score levels of differential hybridization, combine the results of replicate experiments, filter problematic regions of the array and assess the quality of individual and replicate experiments. ExpressYourself is designed with a highly modular architecture so various types of microarray analysis algorithms can readily be incorporated as they are developed; for example, the system currently implements several normalization methods, including those that simultaneously consider signal intensity and slide location. The processed data are presented using a web-based graphical interface to facilitate comparison with the original images of the array slides. In particular, Express Yourself is able to regenerate images of the original microarray after applying various steps of processing, which greatly facilities identification of position-specific artifacts. The program is freely available for use at http://bioinfo.mbb.yale.edu/expressyourself.


Proteins | 2004

The protein target list of the Northeast Structural Genomics Consortium

Zeba Wunderlich; Thomas B. Acton; Jinfeng Liu; Gregory J. Kornhaber; John K. Everett; Phil Carter; Ning Lan; Nathaniel Echols; Mark Gerstein; Burkhard Rost; Gaetano T. Montelione

The U.S. NIH Protein Structure Initiative (PSI) is a joint government, university, and industry effort, organized and supported by the National Institute of General Medical Sciences, and aimed at reducing the costs and increasing the speed of protein structure determination. Its long-range goal is to make the 3D atomic-level structures of most proteins in nature easily obtainable from knowledge of their corresponding DNA sequences (http:// www.nigms.gov/psi). It is the primary U.S. component of a broad international effort in structural genomics, involving at least 20 projects throughout the world. In order to minimize overlap of their efforts, most of these structural genomics pilot projects make their protein target lists and progress reports publicly available. These protein target lists provide dynamic summaries of progress on the production and structure determination of each target protein. These Web-accessible data represent a tremendously valuable new resource to the biological science community, which is only beginning to be widely recognized. As illustrated in the article by Liu et al. in this issue of Proteins, much thought and effort, often involving advanced bioinformatics analysis, has gone into developing these protein target lists. The article by O’Toole et al. in this issue of Proteins describes some of the features of these protein targets lists, the overlap between these worldwide efforts, and a first pass at the data mining that becomes possible by analyzing success and failure at various points along the structure production pipeline across thousands of protein targets. Such retrospective analysis of structural genomics data has the potential to greatly improve methods for protein expression, sample preparation, functional characterization, and structure determination. In addition, the targets lists themselves provide inventories of protein expression vectors, protein samples, and many other biochemical reagents that are generally freely available to the broader biological community. The Northeast Structural Genomics Consortium (NESG) is one of the several pilot projects of the PSI. Its primary goals are to develop and refine new technologies for high-throughput protein production and structure determination by both NMR and X-ray crystallography, and to apply these technologies in determining representative structures of the domain sequence families that constitute eukaryotic proteomes. The project (http://www.nesg.org) is developing technology aimed at optimizing each stage of the structure determination pipeline, including intelligent protein target selection, high-throughput, and costeffective protein sample production, robotics-aided protein crystallization screening, rapid NMR data collection, automated NMR and X-ray diffraction data analysis, and integrated databases for laboratory information management and structure–function annotations. The key shortterm goal of the project is to construct a technology platform capable of experimentally determining 100–200 sequence-unique NMR or X-ray crystal structures of proteins per year. Most structural genomics projects involve collaborative interactions between multiple research groups, coordinated through LIMS. The development and integration of these LIMS are significant challenges that are being addressed both individually and collectively by the structural genomics research community. SPiNE (http:// spine.nesg.org) is a data warehouse and integrated data tracking tool that holds detailed records about the cloning, expression, purification, biophysical characterization, crystallization, and structure determination by NMR and/or X-ray crystallography of each target under study by the NESG Consortium. The NESG also aims at correlating the structural data produced by the project with the extensive biological data emerging from large-scale functional genomics efforts (e.g., see Goh et al. and Carter et al.).


Pharmacogenomics | 2002

SNPs on human chromosomes 21 and 22 - analysis in terms of protein features and pseudogenes

Suganthi Balasubramanian; Paul M. Harrison; Hedi Hegyi; Paul Bertone; Nicholas M. Luscombe; Nathaniel Echols; Patrick McGarvey; Zhaolei Zhang; Mark Gerstein

SNPs are useful for genome-wide mapping and the study of disease genes. Previous studies have focused on SNPs in specific genes or SNPs pooled from a variety of different sources. Here, a systematic approach to the analysis of SNPs in relation to various features on a genome-wide scale, with emphasis on protein features and pseudogenes, is presented. We have performed a comprehensive analysis of 39,408 SNPs on human chromosomes 21 and 22 from the SNP consortium (TSC) database, where SNPs are obtained by random sequencing using consistent and uniform methods. Our study indicates that the occurrence of SNPs is lowest in exons and higher in repeats, introns and pseudogenes. Moreover, in comparing genes and pseudogenes, we find that the SNP density is higher in pseudogenes and the ratio of nonsynonymous to synonymous changes is also much higher. These observations may be explained by the increased rate of SNP accumulation in pseudogenes, which presumably are not under selective pressure. We have also performed secondary structure prediction on all coding regions and found that there is no preferential distribution of SNPs in a -helices, b -sheets or coils. This could imply that protein structures, in general, can tolerate a wide degree of substitutions. Tables relating to our results are available from http://genecensus.org/pseudogene.


Genome Research | 2002

Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22

Paul M. Harrison; Hedi Hegyi; Suganthi Balasubramanian; Nicholas M. Luscombe; Paul Bertone; Nathaniel Echols; Ted Johnson; Mark Gerstein


Nucleic Acids Research | 2001

Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome

Paul M. Harrison; Nathaniel Echols; Mark Gerstein

Collaboration


Dive into the Nathaniel Echols's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Bertone

Medical Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anuj Kumar

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge