Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Oliver Clay is active.

Publication


Featured researches published by Oliver Clay.


Gene | 1996

The gene distribution of the human genome

Serguei Zoubak; Oliver Clay; Giorgio Bernardi

Linear correlations exist between the GC levels of third codon positions (GC3) of individual human genes and the GC levels of long genomic sequences and DNA molecules (50-100 kb in size) embedding the genes. These linear relationships allow the positioning of the GC3 histogram of cDNA sequences from the databases relative to the CsCl profile of human DNA. In turn, this allows an estimate of the relative concentrations of genes in genomic regions of different GC content. An estimate obtained by using current sequence data and Gaussian decompositions of the GC3 histogram and of the CsCl profile indicates that the GC-richest (non-ribosomal) component of the human genome is at least 17 times as gene-rich as the GC-poor regions. Moreover, our results suggest that the most recent physical maps of the human genome consisting of overlapping YACs cover less than 50% of the genes.


FEBS Letters | 2002

A compact view of isochores in the draft human genome sequence

Adam Pavlicek; Jan Pačes; Oliver Clay; Giorgio Bernardi

Prior to genome sequencing, information on base composition (GC level) and its variation in mammalian genomes could be obtained using density gradient ultracentrifugation. Analyses using this approach led to the conclusion that mammalian genomes are organized into mosaics of fairly homogeneous regions, called isochores. We present an initial compositional overview of the chromosomes of the recently available draft human genome sequence, in the form of color‐coded moving window plots and corresponding GC level histograms. Results obtained from the draft human genome sequence agree well with those obtained or deduced earlier from CsCl experiments. The draft sequence now permits the visualization of the mosaic organization of the human genome at the DNA sequence level.


Gene | 2002

Compositional patterns in reptilian genomes.

Sandrine Hughes; Oliver Clay; Giorgio Bernardi

Sauropsids form a complex group of vertebrates including squamates (lizards and snakes), turtles, crocodiles, sphenodon and birds (which are often considered as a separate class). Although avian genomes have been relatively well studied, the genomes of the other groups have remained only sparsely characterized. Moreover, the nuclear sequences available in databanks are still very limited. In the present study, we have analysed the compositional patterns, i.e. the GC (molar fraction of guanine and cytosine in DNA) distributions, of 31 reptilian (particularly snake) genomes by analytical ultracentrifugation of DNAs in CsCl gradients. The profiles were characterized by their modal buoyant density rho(o), mean buoyant density < rho>, asymmetry < rho>- rho(o), and heterogeneity H. The modal buoyant density distribution of reptilian DNAs clearly distinguishes two groups. The snakes fall in the same range of modal densities as most mammals, whereas crocodiles, turtles and lizards show higher values (>1.700 g/cm(3)). As far as the more important compositional properties of asymmetry and heterogeneity are concerned, previous studies showed that amphibians and fishes share relatively low values, whereas birds and mammals are characterized by highly heterogeneous and asymmetric patterns (with the exception of Muridae, which have a lower heterogeneity). The present results show that the snake genomes cover a broad range of asymmetry and heterogeneity values, whereas the genomes of crocodiles and turtles cover a narrow range that is intermediate between those of fishes/amphibians and those of mammals/birds.


Gene | 2001

Standard deviations and correlations of GC levels in DNA sequences

Oliver Clay

In a DNA sequence that exhibits long-range correlations, standard deviations among the GC levels of its segments can be up to an order of magnitude higher than in a sequence consisting of independent, identically distributed nucleotides. Conversely, plots of inter-segment standard deviations vs. segment length reveal quantitative information about the correlations present in a sequence. We present and discuss formulae that relate long-range (power-law) correlations between the nucleotides of a sequence to the expected standard deviations of the GC levels of its segments, and to the correlations between them.


Biochemical and Biophysical Research Communications | 2008

GC level and expression of human coding sequences

Stilianos Arhondakis; Oliver Clay; Giorgio Bernardi

Several groups have addressed the issue of the influence of GC on expression levels in mammalian genes. In general, GC-rich genes appeared to be more expressed than GC-poor ones. Recently, expression levels of GC(3)-rich and GC(3)-poor versions of genes (GC(3) is the third codon position GC), inserted in vector plasmids, were compared in order to eliminate differences associated with their genomic context. Transfection experiments showed that GC(3)-rich genes were expressed more efficiently than their GC(3)-poor counterparts, indicating that GC(3) dramatically and intrinsically boosts expression efficiency. Here we show that, while the protocols used eliminated the original genomic context, they replaced it with the plasmid contexts whose compositional properties affected the results.


FEBS Letters | 2002

Transposable elements encoding functional proteins: pitfalls in unprocessed genomic data?

Adam Pavlicek; Oliver Clay; Giorgio Bernardi

The contribution of transposable elements (TEs), including Alus, to human coding sequences has recently been reported to be high, 4% (1.3% Alus) out of 13 799 sequences [1,2]. This is surprising, because previous examinations had revealed only very few repeats, and almost no Alus, in coding sequences [3,4,25]. Since extreme caution about input data has been suggested [5^7], we examined the database of [1] and found that many (V30%) of its TE-containing sequences or their protein products are de¢ned as ‘hypothetical’, and 63% (421/669 sequences) are annotated as ‘predicted, without experimental evidence or records without ¢nal NCBI revision’. Such a dataset is likely to contain several sequences that remain untranscribed, and more that remain untranslated. Not even experimental validation [8], let alone computer prediction of functional genes is foolproof: the errors in coding sequence databases such as those used in [1] may well amount to 1^2% or more. Essentially all reported coding regions derived from Alus, or containing alternatively spliced Alus, have been detected at the RNA (cDNA) level, instead of at the protein level [3,9]. In eukaryotic cells, there is a signi¢cant turnover of RNA, and several steps of quality control exist for the synthesized RNA in both nucleus and cytoplasm [10^14]. mRNAs with an aberrant 3P end are generally retained and/or degraded at their site of transcription [15] and the majority of stable RNA polymerase II transcripts remain in the nucleus as ‘junk’ RNA, so they never reach the cytoplasm [10]. The minority of transcripts that are successfully exported from the nucleus undergo additional check(s) during their translation. For example, there are specialized degradation mechanisms for transcripts having premature stop codons or lacking terminal codons, which prevent the creation of aberrant, potentially pathogenic proteins [11,13,16]. Thus, even detection of a transcript at the mRNA (cDNA) level cannot guarantee that these mRNAs are ever translated into stable proteins. As has been summarized in the light of growing evidence [17], ‘mRNA abundance is a poor indicator of the levels of the corresponding protein’, yet ‘it is the proteome that determines cell phenotype’: the transcriptome does not faithfully represent the proteome. Furthermore, to become a viable protein, a transcript must (after its accurate translation and possible post-translational modi¢cation) resist degradation until it can serve its functional role at the site of its required action. These facts underline the importance of detection at the protein level, for elucidating whether SINEs or other repeats contribute to true coding sequences in humans or mice. The most accurate sources of proteins are 3D structure databases and direct amino acid sequencing. Out of 781 non-redundant human proteins from a 3D database or determined at the amino acid level that we extracted from [18] (mean length 404 aa; including some fragments, but neglecting all peptides shorter than 50 aa or having s 70% identity) and compared to human repeats in RepBase [19] using TFASTX [20], we found no Alu-related protein domain (the best hit has an E-value of 0.5). Twenty-eight apparently signi¢cant hits with E-values under 0.01 were detected, but mainly from protein-coding elements (DNA transposons and LINE1). When cDNAs encoding these 28 proteins were extracted and searched by RepeatMasker [21], no interspersed repeats were detected. In addition, the similarity regions that had been reported by TFASTX were also found in other vertebrate orthologs. In summary, we did not detect any repeat sequence in our dataset of 781 protein sequences. In 1994, it was pointed out [5] that a discovery of a translated Alu element(s) in a functional part of a functional human protein ‘would represent the ¢rst report of its kind and would have important evolutionary implications’. Despite the 7 years since this challenge, con¢rmed cases of Alu-containing sequences that encode a functional protein still remain extremely elusive. The paucity of documented examples is a good indication that proteins are unlikely to utilize domains encoded by Alus for functional ends. The reluctance to accept this view is understandable, given the huge proportion of interspersed repeats in the human genome (around 45% [4]) : in principle, at least some of them might have been recruited for functional purposes at the protein level. The great majority of previously detected repeat-derived coding sequences comes, however, from protein-coding repeats, and particularly from DNA transposons [4,25]. LINEs are less common in coding sequences and only a few Alus had been identi¢ed prior to the analysis of [1,2]. Since SINEs are derived from RNA genes without protein-coding capacity, the lack of Alu-encoded proteins is consistent with the notion that new domains arise from existing sequences encoding functional proteins (for example, by exon shu¥ing) and that the de novo creation of coding sequences from non-coding DNA is rare. Indeed, in the words of Graur and Li [22], ‘True novelty is almost unheard of during evolution; rather, preexisting genes and parts of genes [presumably encoding functional proteins or their domains] are transformed to produce new functions, and molecular systems are combined to give rise to new, often more complex systems. T We may T deduce that [such] molecular tinkering is most probably the paradigm of molecular evolution.’ Such a notion appears to contrast with the recent view of coding Alus presented by one of these authors [1]. The relative frequencies for the TE classes found by Nekrutenko and Li [1] are similar to genome-wide repeat proportions, i.e. to expectations under random sampling of sequences or random errors in predicting exons. In contrast, our ¢ndings are in good agreement with previous reports [4,25] and the above arguments that repeat-derived protein-coding sequences, especially those corresponding to Alus and other SINEs, should be rare. Indeed, Alus are derived from 7SL RNA, part of the signal recognition particle on ribosomes [23], and the strong selection for such 7SL-like secondary


FEBS Letters | 2006

Compositional properties of human cDNA libraries: practical implications.

Stilianos Arhondakis; Oliver Clay; Giorgio Bernardi

The strikingly wide and bimodal gene distribution exhibited by the human genome has prompted us to study the correlations between EST‐counts (expression levels) and base composition of genes, especially since existing data are contradictory. Here we investigate how cDNA library preparation affects the GC distributions of ESTs and/or genes found in the library, and address consequences for expression studies. We observe that strongly anomalous GC distributions often indicate experimental biases or deficits during their preparation. We propose the use of compositional distributions of raw ESTs from a cDNA library, and/or of the genes they represent, as a simple and effective tool for quality control.


Archive | 1999

Compositional Correlations and Gene Distribution of the Human Genome

Oliver Clay; Giuseppe D’Onofrio; Kamel Jabbari; Serguei Zoubak; Salvatore Saccone; Giorgio Bernardi

This review briefly describes the compositional approach to the animals of vertebrate genomes. This approach involves the study of distributions of, and correlations among, the base compositions (GC levels) of different parts of these genomes, such as exons, introns, third codon positions, flanking of genes, and long genomic sequences or fragments spanning genic and intergenic DNA. Properties of the human genome that were inferred using the compositional approach include its organization into isochores, the presence of much higher gene densities in GC rich than in GC poor regions, and the non-uniform concentration of genes in the chromosomal bands.


Genome Research | 2006

An isochore map of human chromosomes

Maria Costantini; Oliver Clay; Fabio Auletta; Giorgio Bernardi


Molecular Phylogenetics and Evolution | 1996

Human Coding and Noncoding DNA: Compositional Correlations

Oliver Clay; Simone Cacciò; Serguei Zoubak; Dominique Mouchiroud; Giorgio Bernardi

Collaboration


Dive into the Oliver Clay's collaboration.

Top Co-Authors

Avatar

Giorgio Bernardi

Stazione Zoologica Anton Dohrn

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nicolas Carels

Stazione Zoologica Anton Dohrn

View shared research outputs
Top Co-Authors

Avatar

Stéphane Cruveiller

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Christophe J. Douady

Institut Universitaire de France

View shared research outputs
Top Co-Authors

Avatar

Stilianos Arhondakis

Stazione Zoologica Anton Dohrn

View shared research outputs
Top Co-Authors

Avatar

Adam Pavlicek

Genetic Information Research Institute

View shared research outputs
Top Co-Authors

Avatar

Stéphane Cruveiller

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Fabio Auletta

Stazione Zoologica Anton Dohrn

View shared research outputs
Top Co-Authors

Avatar

Maria Costantini

Stazione Zoologica Anton Dohrn

View shared research outputs
Researchain Logo
Decentralizing Knowledge