David G. George
Georgetown University Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David G. George.
Methods in Enzymology | 1990
David G. George; Winona C. Barker; Lois T. Hunt
Publisher Summary This chapter describes the mutation data matrix (MDM) and its application for comparing protein sequences. Basic to all sequence comparison is the concept of an alignment that defines the relationship between sequences on a residue-by-residue basis. Sequence comparison methods use a scoring matrix that assigns a value to each possible pair of aligned amino acids. One of the most widely used similarity measures is the mutation data matrix (MDM) developed by Dayhoff and colleagues. The first MDM, published in 1968, was derived from over 400 accepted point mutations between present-day sequences and inferred ancestral sequences. Within the Markovian model, the MDM is derived from a transition probability matrix in which each matrix element gives the probability that amino acid A will be replaced by amino acid B in one unit of evolutionary change. The diagonal elements give the probabilities that the amino acids will remain unchanged. The probability of an amino acid being replaced is estimated as its relative mutability, which is calculated as the ratio of the number of observed changes of an amino acid to its total exposure to change.
Journal of Molecular Evolution | 1985
David G. George; Lois T. Hunt; Lai-Su L. Yeh; Winona C. Barker
SummaryRecent evidence indicates that a gene transposition event occurred during the evolution of the bacterial ferredoxins subsequent to the ancestral intrasequence gene duplication. In light of this new information, the relationships among the bacterial ferredoxins were reexamined and an evolutionary tree consistent with this new understanding was derived. The bacterial ferredoxins can be divided into several groups based on their sequence properties; these include the clostridial-type ferredoxins, theAzotobacter-type ferredoxins, and a group containing the ferredoxins from the anaerobic, green, and purple sulfur bacteria. Based on sequence comparison, it was concluded that the amino-terminal domain of theAzotobacter-type ferredoxins, which contains the novel 3Fe∶3S cluster binding site, is homologous with the carboxyl-terminal domain of the ferredoxins from the anaerobic photosynthetic bacteria.A number of ferredoxin sequences do not fit into any of the groups described above. Based on sequence properties, these sequences can be separated into three groups: a group containingMethanosarcina barkeri ferredoxin andDesulfovibrio desulfuricans ferredoxin II, a group containingDesulfovibrio gigas ferredoxin andClostridium thermoaceticum ferredoxin, and a group containingDesulfovibrio africanus ferredoxin I andBacillus stearothermophilus ferredoxin. The last two groups differ from all of the other bacterial ferredoxins in that they bind only one Fe∶S cluster per polypeptide, whereas the others bind two. Sequence examination indicates that the second binding site has been either partially or completely lost from these ferredoxins.Methanosarcina barkeri ferredoxin andDesulfovibrio desulfuricans ferredoxin II are of interest because, of all the ferredoxins whose sequences are presently known, they show the strongest evidence of internal gene duplication. However, the derived evolutionary tree indicates that they diverged from theAzotobacter-type ferredoxins well after the ancestral internal gene duplication. This apparent discrepancy is explained by postulating a duplication of one halfchain sequence and a deletion of the other halfchain. TheClostridium thermoaceticum andBacillus stearothermophilus groups diverged from this line and subsequently lost one of the Fe∶S binding sites.It has recently become apparent that gene duplication is ubiquitous among the ferredoxins. Several organisms are now known to have a variety of ferredoxins with widely divergent properties. Unfortunately, in only one case are the sequences of more than one ferredoxin from the same organism known. Thus, although the major features of the bacterial ferredoxin tree are now understood, a complete bacterial phylogeny cannot be inferred until more sequence information is available.
Methods in Enzymology | 1996
Winona C. Barker; Friedhelm Pfeiffer; David G. George
Publisher Summary This chapter discusses the superfamily classification in the Protein Information Resource (PIR)-International protein sequence database and the development of the model of the protein superfamily concept that encompasses the most common usages and integrates both homology at the domain level and homology at the level of complete proteins. This model preserves the ability to fully partition the Protein Sequence Database, permits the organization of the database in a structured way, and introduces a more precise and unambiguous definition for the term “protein superfamily.” This classification provides a systematic scheme for the verification of the information in the database and for inferring additional information by homology in a controlled way. Information generated from large-scale sequencing projects is incomplete and not well understood. The major task of computational biology is to assign biological meaning to these data. Homology is the major operating principle employed in these analyses. The superfamily classification provides a useful architecture for the self-consistent and objective examination of sequence data by homology.
Nucleic Acids Research | 1993
Winona C. Barker; David G. George; Hans-Werner Mewes; Friedhelm Pfeiffer; Akira Tsugita
PIR-International is an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. PIR-International is most noted for the Protein Sequence Database. This database originated in the early 1960s with the pioneering work of the late Margaret Dayhoff as a research tool for the study of protein evolution and intersequence relationships; it is maintained as a scientific resource, organized by biological concepts, using sequence homology as a guiding principle. PIR-International also maintains a number of other genomic, protein sequence, and sequence-related databases. The databases of PIR-International are made widely available. This paper briefly describes the architecture of the Protein Sequence Database, a number of other PIR-International databases, and mechanisms for providing access to and for distribution of these databases.
Nucleic Acids Research | 1997
David G. George; Robert J. Dodson; John S. Garavelli; Daniel H. Haft; Lois T. Hunt; Christopher R. Marzec; Bruce C. Orcutt; Kathryn E. Sidman; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Leslie Arminski; Robert S. Ledley; Akira Tsugita; Winona C. Barker
From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI.
Nucleic Acids Research | 1996
Neela Swaminathan; David A. Mead; Karolyn McMaster; David G. George; James L. Van Etten; Piotr M. Skowron
R.CviJI is unique among site-specific restriction endonucleases in that its activity can be modulated to recognize either a two or three base sequence. Normally R.CviJI cleaves RGCY sites between the G and C to leave blunt ends. In the presence of ATP R.CviJI* cleaves RGCN and YGCY sites, but not YGCR sites. The gene encoding R.CviJI was cloned from the eukaryotic Chlorella virus IL-3A and expressed in Escherichia coli. The primary E.coli cviJIR gene product is a 278 amino acid protein initiated from a GTG codon, rather than the expected 358 amino acid protein initiated from an in-frame upstream ATG codon. Interestingly, the 278 amino acid protein displays the normal restriction activity but not the R.CviJI* activity of the native enzyme. Nine restriction and modification proteins which recognize a central GC or CG sequence share short regions of identity with R.CviJI amino acids 144-235, suggesting that this region is the recognition and/or catalytic domain.
Methods in Enzymology | 1990
Winona C. Barker; David G. George; Lois T. Hunt
Publisher Summary Researchers at the National Biomedical Research Foundation (NBRF) have maintained the Protein Sequence Database since the early 1960s. The database has become truly international with the recent establishment of PIR-International, an association of protein sequence data collection centers including NBRF, the Martinsried Institute for Protein Sequences (MIPS), and the International Protein Information Database in Japan (JIPID). All three centers are working cooperatively to produce a single protein sequence database. There are three types of sequence databases that can be useful to biological researchers. When a researcher is primarily interested in establishing the identity and function of a newly determined sequence, the availability of as many data as possible may be more critical than the accuracy of the data. After a possible relationship is found, the accuracy of the sequences identified can be checked and conflicts resolved. The shift of emphasis toward sequence alignments as opposed to individual sequences also provides a mechanism for alleviating the burden of keeping pace with an extremely rapid influx of new sequence data.
Biochemical and Biophysical Research Communications | 1983
David G. George; Lai-Su L. Yeh; Winona C. Barker
Hypothetical lambda protein ORF314 shows significant homology with the carboxyl end of phage T4 tail-fiber protein gp37. Homology can also be demonstrated between hypothetical lambda protein ORF194 and a fragment of bacteriophage T4 protein gp38. This sequence homology is also reflected in the genomic sequences of these two phages.
Journal of Molecular Evolution | 1985
Lois T. Hunt; David G. George; Lai-Su L. Yeh
SummaryWe have found ragweed allergen Ra3 to be related to the type 1 copper proteins; it is most closely related to stellacyanin and basic blue protein. The type 1 copper proteins form a diverse group of proteins, most of which are involved in electron transport. However, key amino acids believed to be involved in copper binding are absent from the allergen sequence; thus, the allergen is not likely to be functionally related to the type 1 copper proteins. We have grouped these proteins into one superfamily and we depict the relationships among them by an evolutionary tree. As indicated by this tree, an ancient gene duplication resulted in the divergence of plastocyanin from the line leading to basic blue protein, stellacyanin, and allergen Ra3.
BioSystems | 1985
Lois T. Hunt; David G. George; Winona C. Barker
Over the past 30 years the study of the sequences of proteins and nucleic acids has produced almost incredible amounts of information, new concepts, and new avenues of research. The beginning was slow: the first peptide hormones sequenced in the early 1950s, the first cytochrome c (horse) in 1961, the first bacterial ferredoxin in 1964, and the first transfer RNA (yeast alanine tRNA) in 1965. In the past 6 years, the rate of data accumulation has accelerated tremendously, primarily due to technological advances in nucleic acid sequencing techniques. For investigators of biological evolution, the sequence data and the new information on genetic mechanisms would prove to be the best evidence for elucidating relationships among the genomes of living organisms and for deducing phylogenetic history. In particular, they needed evidence to decide between the two hypotheses for the origin of eukaryotic cells. Now, less than 20 years since Margulis renewed the investigation of this problem, comparisons of protein and nucleic acid sequences, especially of the small subunit ribosomal RNAs, have answered this question in favor of the endosymbiotic origin of eukaryotic cells. After briefly discussing some of the concepts that helped resolve this controversy and the problems involved in using sequence data for evolutionary studies, we describe a few examples of useful evolutionary trees.