R. D. Blake
University of Maine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by R. D. Blake.
Bioinformatics | 1999
R. D. Blake; Jeff W. Bizzaro; Jonathan D. Blake; G. R. Day; S. G. Delcourt; J. Knowles; Kenneth A. Marx; J. SantaLucia
MOTIVATION MELTSIM is a windows-based statistical mechanical program for simulating melting curves of DNAs of known sequence and genomic dimensions under different conditions of ionic strength with great accuracy. The program is useful for mapping variations of base compositions of sequences, conducting studies of denaturation, establishing appropriate conditions for hybridization and renaturation, determinations of sequence complexity, and sequence divergence. RESULTS Good agreement is achieved between experimental and calculated melting curves of plasmid, bacterial, yeast and human DNAs. Denaturation maps that accompany the calculated curves indicate non-coding regions have a significantly lower (G+C) composition than coding regions in all species examined. Curves of partially sequenced human DNA suggest the current database may be heavily biased with coding regions, and excluding large (A+T)-rich elements. AVAILABILITY MELTSIM 1.0 is available at: //www.uml.edu/Dept/Chem/UMLBIC/Apps/MEL TSIM/MELTSIM-1.0-Win/meltsim. zip. Melting curve plots in this paper were made with GNUPLOT 3.5, available at: http://www.cs.dartmouth.edu/gnuplot_inf o.html Contact : [email protected];
Journal of Molecular Evolution | 1992
R. D. Blake; Samuel T. Hess; Janice Nicholson-Tuell
SummaryThe numbers and local sequence environments of the two types of substitution mutation plus additions and deletions have been obtained directly in this study from differences between a large number of extant primate gene and pseudogene sequences. A total of 3786 mutations were scored in regions where similarities between pseudogene and corresponding gene sequences is ≥ 85%, comprising ∼30% of the pseudogene database of 80,584 bp. The pattern of mutations obtained in this fashion is almost identical to that obtained by Li et al. (1984) using a slightly different, more direct approach and with a smaller database. When mutations were scored, the neighbor pairs on the 5′ and 3′ sides were also noted, leading to a large 16 × 12 matrix of transitions and transversions. Biases of varying magnitude are found in the rates of substitution of the same base pair in different local sequence environments. The overall order for the effect of the 5′ neighbor on the rates of substitution mutation of a pyrimidine is A > C ≫ T > G, and G > A > T > C for the 3′ neighbor; where these results represent the average of substitution rates for the complement purine with complement neighbors of bases ordered above. The order for the 3′ neighbor is essentially the same for the two transitions and most of the four transversions as well; however, the order for the 5′ neighbor is more variable. The overall rate for the C · G → T · A transition is not unusual, however the presence of a 3′ neighboring G · C pair boosts the rate substantially, presumably due to specific cytosine methylation of the CG doublet in primate DNAs. The rate of the T · A → C · G transition is also well above average when the 3′ neighbor is an A · T, and to a lesser extent a G · C, pair. The latter bias is typical in that it reflects the association of alternating pyrimidine-purine sequences with increasing mutation rates. The substitution of the pyrimidine in a 5′ purine-pyrimi-dine-purine3′ sequence generally occurs much faster than in a pyrimidine tract and points to the local conformation as a major determining factor of the substitution rate. An apparent inverse relationship is found between starting and product doublet frequencies of base pairs undergoing mutations with specific 3′ neighbors, indicating that differences in intrinsic substitution rates of base pairs with specific neighbors are a key factor in producing the familiar biases of nearest-neighbor frequencies.
Journal of Molecular Evolution | 1995
Sohail A. Qureshi; R. D. Blake
The (G + C) distribution and the presence and amounts of repetitive sequence families in the white-tailed deer (Odocoileus virginianus) have been examined. The distribution ranges from 20 to 70% (G + C) and shows four distinct repeat families. A 0.7-kb family, DII, corresponds to satellite II in domestic bovids—ox, sheep, and goat—and was singled out for detailed characterization. DII has a prototypic repeat of 67% (G + C), consists of 25,000 tandem copies, and contributes 1.7% to the genomic DNA. Sequencing and electrophoretic analysis indicate a repeat length of 691 bp. These characteristics are similar to those of the bovid satellite II families as well as to those of other cervids that we have examined. The intraspecific sequence divergence within this family has a variance of only 2.5 ± 0.3%.
Journal of Biomolecular Structure & Dynamics | 1986
R. D. Blake; S. Earley
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.
Journal of Biomolecular Structure & Dynamics | 1984
R. D. Blake; Philip W. Hinds
Fifty-three gene sequences from E. coli containing 18,288 reading frame triplets have been characterized according to the nature and level of average codon preference. The distribution of average preferences is bimodal, with approximately half the genes using an average of only 36 codons, and the remainder just 42 codons. There is a high correlation between the level of codon bias, the tRNA population and the abundance of protein product, indicating biased patterns are exploited by the cell for the production of widely different levels of gene product. This relationship is especially striking in genes involved in the production of components for transcription and translation. Overall, the genes for these processes generate some five-fold more protein than the average in the genome, and use about five fewer codons. The very high codon bias found in the RNA polymerase gene thus provides a simple, autogenous mechanism for the coordinate synthesis of these components and RNA polymerase. A surprisingly high level of codon probability is also found in triplets of the complement of coding sequences. This is apparently due to the evolutionary dispersion of coding sequences and/or the requirement for increased levels of secondary structure in messenger RNAs.
Journal of Biomolecular Structure & Dynamics | 1993
Kenneth A. Marx; Samuel T. Hess; R. D. Blake
D. discoideum, the slime mold, is one of the most AT rich eukaryotic genomes known. In this paper we examine this organisms database for overlapping N-tuples of high frequency and find A and T tracts possess among the highest frequencies in flanking sequences but not in coding sequences. We examined both overlapping and non-overlapping frequencies of the A, T, G and C homopolymer tracts of 2 < N < 6. Overlapping (dG).(dC) and (dA).(dT) tracts occur at greater frequencies than expected, based on random occurrence. Long (dA).(dT) tracts of N > 10 occur at well above expected frequencies in flanking and intron regions, while (dG).(dC) tracts above N = 5 are rarely found. Some of the implications of these findings for tract origins in slip-strand replication and for chromatin structure are discussed.
Journal of Biomolecular Structure & Dynamics | 1984
Philip W. Hinds; R. D. Blake
Oligonucleotide and codon frequencies have been determined in published sequences of E. coli DNA totaling 103,100bp with 18,459 reading frame trinucleotides; corresponding to 2.5% of the total genome. Dinucleotide frequencies are in excellent agreement with those determined by nearest neighbor chemical analysis, indicating the computer count of a limited sampling to be a good representation of the overall frequencies in total genomic DNA. The distinctive nonrandom codon pattern is found to be uniformly distributed and contributes to a distinctive nonrandom oligonucleotide pattern; enabling correlations between frequency levels to be extended beyond reading frame sequences. Correlation analysis indicates a surprisingly high degree of correlation everywhere in the genome. Coefficients of correlation between oligonucleotide frequencies overall and those in specific segments vary as follows: primary strands of individual coding sequences greater than 0.9 greater than lambda DNA greater than noncoding, non-RNA greater than phi X174 DNA greater than complementary strands greater than RNA genes congruent to 0.6 greater than transposon-insertion elements greater than T7DNA much greater than eukaryotic sequences congruent to 0. It is concluded that this high degree of oligonucleotide and codon correspondence in E. coli reflects the widespread distribution of remnants of an early and slowly changing codon pattern that has been continually dispersed by duplication-divergence processes, leading to the present genome.
Journal of Biomolecular Structure & Dynamics | 1994
Kenneth A. Marx; Samuel T. Hess; R. D. Blake
It has been shown that the frequency versus size distribution of A and T overlapping and non-overlapping homopolymer tracts of N > 5 in D. discoideum gene flanking and intron regions are significantly greater than in coding regions(1). In the present report, we demonstrate, that a spatial periodicity exists in long A and T tracts (N > 10) in long flanking sequences by scored alignments of those tracts (N > 10) with the nucleosomal repeat. A tract spacing was found at 185-190 bp that corresponds to a maximum alignment score. This is exactly the average spacing of D. discoideum nucleosomes determined experimentally. A majority of A and T tracts in flanking sequences are often spaced by short DNA stretches and the total length of adjacent A and T tracts plus the interrupting short DNA stretch corresponds closely to the average experimentally measured nucleosomal linker DNA size in D. discoideum-42 bp. These data suggest a model which has A and T runs of N > 10 bp in flanking DNA of D. discoideum organized in a regular phase with nonhomopolymer sequences along the DNA. This model has functional implications for A and T tracts, suggesting that they are found in nucleosomal linker DNA regions of chromatin during some necessary portion(s) of the life of the cell.
Journal of Molecular Evolution | 1997
R. D. Blake; Jackie Zhehong Wang; Laurent Beauregard
Abstract. High-resolution derivative melting was used to obtain detailed distributions of local (G + C) contents in a number of ruminant DNAs. Profiles over low (G + C) regions [20–36% (G + C)] are congruent for all ruminants. This region represents 45–50% of the nuclear DNA content and primarily contains intergenic and intron sequences. The high (G + C) region, where most coding sequences are found [38–68% (G + C)], is marked by satellite bands denoting the presence of transcriptionally inert, tandemly repetitive sequence families. These bands can be analyzed for the abundance, base composition, and sequence divergence of satellite families with relatively high precision. Band patterns are unique to each species; even closely related species can be readily distinguished by their base distribution profiles. Variations in nuclear DNA contents in ruminants, determined by flow cytometry, are primarily due to variations in abundances of these repetitive sequence families. Thus, A. alces (moose) is found to have 8.85 ± 0.2 pg DNA/cell, 25% more than the average in ruminants, while the base distribution curve indicates the presence of an unusually abundant satellite of 52.6% (G + C). The size (1 kb) and sequence of this satellite corresponds to satellite-I of other cervids, and in consequence it is designated Alces-I. The sequence of a cloned repeat of Alces-I has a length of 968 bp, a (G + C) content of 52.6%, and contributes 35%, or almost 3 million copies to the nuclear DNA, exceeding by ∼300% the average array size of this repeat family in related cervids. In situ hybridization indicates the repeat is distributed throughout centromeric regions of all 62 acrocentric autosomes. Alces-I has much greater-than-expected numbers of GG, GA, and AG and far fewer numbers of TA and CG duplets, characteristics of all tandem repeats. The sequence is judged to be orthologous with satellite-I sequences from Rangifer tarandus (caribou), Capreolus capreolus (roe deer), Muntiacus muntjac (Chinese muntjac) and Muntiacus reevesi (Indian muntjac), as well as Antilocapra americana (pronghorn), and the bovids Bos taurus and Ovis aries. A tentative tree for the five cervids is in excellent agreement with one proposed on the basis of morphological characteristics. Differences from a consensus sequence indicate transversions exceed transitions by almost twofold, suggesting that substitutions occur randomly, or nearly so.
Computational Biology and Chemistry | 1983
Gary R. Day; R. D. Blake
Abstract Programs are described for the efficient analysis of large numbers of DNA sequences for the recurrence of particular short segments. These programs include routines for the determination of oligonucletide frequencies, and for the analysis for perfect and imperfect palindromic sequences with either mirror or two-fold symmetry, as well as for inverted repeat sequences at remote loci. Other routines are included for locating perfect or imperfect matching sequences of any length and of either specified or unspecified sequences in the same or in different molecules. The latter routines are especially useful for locating repeated sequences obscured by random divergence. A number of additional programs are provided for more basic analysis or for cosmetic management of sequences. A routine is included for generating a numbered listing of primary and complementary strands, in selectable order, from an unformatted database. Another routine plots with sequence length the variation in percent G-C or the variation of a particular nearest neighbor or collection of nearest neighbors. Still other routines allow for the determination of translation reading frames and restriction maps.