Da-Fei Feng
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Da-Fei Feng.
Journal of Molecular Evolution | 1987
Da-Fei Feng; Russell F. Doolittle
SummaryA progressive alignment method is described that utilizes the Needleman and Wunsch pairwise alignment algorithm iteratively to achieve the multiple alignment of a set of protein sequences and to construct an evolutionary tree depicting their relationship. The sequences are assumed a priori to share a common ancestor, and the trees are constructed from difference matrices derived directly from the multiple alignment. The thrust of the method involves putting more trust in the comparison of recently diverged sequences than in those evolved in the distant past. In particular, this rule is followed: “once a gap, always a gap”. The method has been applied to three sets of protein sequences: 7 superoxide dismutases, 11 globins, and 9 tyrosine kinase-like sequences. Multiple alignments and phylogenetic trees for these sets of sequences were determined and compared with trees derived by conventional pairwise treatments. In several instances, the progressive method led to trees that appeared to be more in line with biological expectations than were trees obtained by more commonly used methods.
Science | 1996
Russell F. Doolittle; Da-Fei Feng; Simon Tsang; Glen Cho; Elizabeth Little
Amino acid sequence data from 57 different enzymes were used to determine the divergence times of the major biological groupings. Deuterostomes and protostomes split about 670 million years ago and plants, animals, and fungi last shared a common ancestor about a billion years ago. With regard to these protein sequences, plants are slightly more similar to animals than are the fungi. In contrast, phylogenetic analysis of the same sequences indicates that fungi and animals shared a common ancestor more recently than either did with plants, the greater difference resulting from the fungal lineage changing faster than the animal and plant lines over the last 965 million years. The major protist lineages have been changing at a somewhat faster rate than other eukaryotes and split off about 1230 million years ago. If the rate of change has been approximately constant, then prokaryotes and eukaryotes last shared a common ancestor about 2 billion years ago, archaebacterial sequences being measurably more similar to eukaryotic ones than are eubacterial ones.
Journal of Molecular Evolution | 1985
Da-Fei Feng; Mark S. Johnson; Russell F. Doolittle
SummaryWe examined two extensive families of protein sequences using four different alignment schemes that employ various degrees of “weighting” in order to determine which approach is most sensitive in establishing relationships. All alignments used a similarity approach based on a general algorithm devised by Needleman and Wunsch. The approaches included a simple program, UM (unitary matrix), whereby only identities are scored; a scheme in which the genetic code is used as a basis for weighting (GC); another that employs a matrix based on structural similarity of amino acids taken together with the genetic basis of mutation (SG); and a fourth that uses the empirical log-odds matrix (LOM) developed by Dayhoff on the basis of observed amino acid replacements. The two sequence families examined were (a) nine different globins and (b) nine different tyrosine kinase-like proteins. It was assumed a priori that all members of a family share common ancestry. In cases where two sequences were more than 30% identical, alignments by all four methods were almost always the same. In cases where the percentage identity was less than 20%, however, there were often significant differences in the alignments. On the average, the Dayhoff LOM approach was the most effective in verifying distant relationships, as judged by an empirical “jumbling test.” This was not universally the case, however, and in some instances the simple UM was actually as good or better. Trees constructed on the basis of the various alignments differed with regard to their limb lengths, but had essentially the same branching orders. We suggest some reasons for the different effectivenesses of the four approaches in the two different sequence settings, and offer some rules of thumb for assessing the significance of sequence relationships.
Methods in Enzymology | 1990
Da-Fei Feng; Russell F. Doolittle
Publisher Summary The relationship of a set of related protein sequences can be expressed quantitatively in terms of a phylogenetic tree. The accuracy of the tree naturally depends on the alignment of the sequences. When there are more than two sequences, the problem lies in the gaps which must be introduced to align sequences optimally. The preliminary set of pair wise measurements also reveals any sub clusters that may exist in the set. These sub clusters can be treated as units during the alignment process, ensuring that the relative positions of the residues within the cluster will not be altered. The matrix is then reduced by one, and all distance values related to this pair are averaged in a new matrix. This procedure is repeated until the matrix is reduced to a single dimension. The end result is a branching order with the associated branch lengths. Aside from the fact that fewer programs are now involved, the single most important improvement is in the TREE program. Formerly, to get the branch lengths of a phylogenetic tree, one had to manually construct a connectivity table, which is not only cumbersome but also error prone, especially when working with a large number of sequences.
Trends in Biochemical Sciences | 1992
Michael W. Smith; Da-Fei Feng; Russell F. Doolittle
One of the most debated questions in the field of molecular evolution is the possible role of horizontal transfer in evolution. Of all the claims that have been made over the years, those reporting transfers between eukaryotes and prokaryotes are the most controversial. Here we present the cases for and against several such possible gene acquisitions.
Journal of Molecular Evolution | 1990
Russell F. Doolittle; Da-Fei Feng; K. L. Anderson; M. R. Alberro
SummaryNaturally occurring horizontal gene transfers between nonviral organisms are difficult to prove. Only with the availability of sequence data from a wide variety of organisms can a convincing case be made. In the case of putative gene transfers between prokaryotes and eukaryotes, the minimum requirements for inferring such an event include (1) sequences of the transferred gene or its product from several appropriately divergent eukaryotes and several prokaryotes, and (2) a similar set of sequences from the same (or closely related organisms) for another gene or genes. Given these criteria, we believe that a strong case can be made forEscherichia coli having acquired a second glyceraldehyde-3-phosphate dehydrogenase gene from some eukaryotic host. Ancillary observations on the general rate of change and the time of the prokaryote-eukaryote divergence support the notion.
Methods in Enzymology | 1996
Da-Fei Feng; Russell F. Doolittle
Publisher Summary This chapter discusses the progressive alignment of amino acid sequences and construction of phylogenetic trees from them and a much improved, easier to use, set of programs, which is refer to as ProPack––a packet of programs centering on progressive alignment. The changes pertain mainly to speeding up the calculations and reducing the number of manual operations; the basic idea of progressive alignment remains the same. The principal new features of ProPack are introduction of an option for choosing an alternative amino acid substitution matrix, automatic elimination of negative branch lengths in the calculation of phylogenetic trees, addition of a bootstrap analysis option, and inclusion of a simple program for converting output to a form that can be used to draw trees on a microcomputer with standard software. The bootstrapping is restricted to solutions with no negative branch lengths, a necessary condition if the likelihoods are to have any meaning.
Methods in Enzymology | 1990
Russell F. Doolittle; Da-Fei Feng
Publisher Summary The PAPA approach to finding a phylogeny by a parsimony, or common ancestor, procedure begins with the same progressive alignment employed with a matrix method. A listing of the five segments for each four-taxon tree is saved and used for the construction of the overall tree. In this regard, because the topology is already known from the nearest-neighbor analysis, segments can be assigned directly, beginning with nearest neighbors and then adding nested interior segments progressively. The starting file for the PAPA program contains all the sequences to be considered, consecutively, but with neutral elements (Xs) in all places where gaps were inserted by the progressive alignment. The PAPA programs have been applied to a wide variety of phylogenies that had previously been determined by a matrix method after progressive alignment. For the most part, the agreement has been excellent.
Journal of Molecular Evolution | 1997
Da-Fei Feng; Russell F. Doolittle
Abstract. Amino acid substitution tables are essential for the proper alignment of protein sequences, and alignment scores based on them can be transformed into distance measures by various means. In the simplest case, the negative log of the score is used. This Poisson relationship assumes that all sites are equally likely to change, however. A more accurate relationship would correct for different rates of change at each residue position. Recently, Grishin (J. Mol. Evol. 41:675–679, 1995) published a set of simple equations that correct for various circumstances, including different rates of change at different sites. We have used these equations in conjunction with similarity scores that take into account constraints on amino acid interchange. Simulation studies show a linear relationship between these calculated distances and the numbers of allowed mutations based on the observed variation of rate at all sites in various proteins.
The Biological Bulletin | 1999
Russell F. Doolittle; Da-Fei Feng; Glen Cho
ported was a divergence time for prokaryotes and eukaryotes of only slightly more than two billion years ago. In line with this result, the average resemblance of these enzymes between Bacteria (a.k.a. eubacteria) and Eukarya (a.k.a. eukaryotes) was 37% identity. The recency of this divergence time was criticized by others on several grounds, particularly with regard to the way we converted sequence-based distances into evolutionary time. We had used a modified Poisson relationship and applied a post-hoc correction factor. The distance data were themselves calibrated on the basis of divergence times drawn from the fossil record and mostly involved sequences from vertebrate animals. The study has now been expanded and the data reanalyzed with a distance measure that corrects for both constraints on amino acid interchange and variation in substitution rate at different sites (2). The validity of the method was checked by an in-depth simulation study (3). Beyond that, the number of enzyme sets compared was increased to 64 and the total number of sequences used was 823 (4). Interestingly, the average sequence resemblance between Bacteria and Eukarya held steady at 37% identity. The study was greatly enriched by the availability of complete genome sequences for several eubacteria and an archaebacterium. The latter not only expanded the data set but also had a great impact on the interpretation of certain aspects of the data. As it happens, the majority of the