Christian J. Michel
University of Strasbourg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christian J. Michel.
Computational Biology and Chemistry | 2012
Christian J. Michel
In 1996, a common trinucleotide circular code, called X, is identified in genes of eukaryotes and prokaryotes (Arquès and Michel, 1996). This circular code X is a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons. This reading frame retrieval needs a window length l of 12 nucleotides (l ≥ 12). With a window length strictly less than 12 nucleotides (l < 12), some words of X, called ambiguous words, are found in the shifted frames (the reading frame shifted by one or two nucleotides) preventing the reading frame in genes to be retrieved. Since 1996, these ambiguous words of X were never studied. In the first part of this paper, we identify all the ambiguous words of the common trinucleotide circular code X. With a length l varying from 1 to 11 nucleotides, the type and the occurrence number (multiplicity) of ambiguous words of X are given in each shifted frame. Maximal ambiguous words of X, words which are not factors of another ambiguous words, are also determined. Two probability definitions based on these results show that the common trinucleotide circular code X retrieves the reading frame in genes with a probability of about 90% with a window length of 6 nucleotides, and a probability of 99.9% with a window length of 9 nucleotides (100% with a window length of 12 nucleotides, by definition of a circular code). In the second part of this paper, we identify X circular code motifs (shortly X motifs) in transfer RNA and 16S ribosomal RNA: a tRNA X motif of 26 nucleotides including the anticodon stem-loop and seven 16S rRNA X motifs of length greater or equal to 15 nucleotides. Window lengths of reading frame retrieval with each trinucleotide of these X motifs are also determined. Thanks to the crystal structure 3I8G (Jenner et al., 2010), a 3D visualization of X motifs in the ribosome shows several spatial configurations involving mRNA X motifs, A-tRNA and E-tRNA X motifs, and four 16S rRNA X motifs. Another identified 16S rRNA X motif is involved in the decoding center which recognizes the codon-anticodon helix in A-tRNA. From a code theory point of view, these identified X circular code motifs and their mathematical properties may constitute a translation code involved in retrieval, maintenance and synchronization of reading frames in genes.
BioSystems | 2014
Christian J. Michel; Hervé Seligmann
The C(3) self-complementary circular code X identified in genes of prokaryotes and eukaryotes is a set of 20 trinucleotides enabling reading frame retrieval and maintenance, i.e. a framing code (Arquès and Michel, 1996; Michel, 2012, 2013). Some mitochondrial RNAs correspond to DNA sequences when RNA transcription systematically exchanges between nucleotides (Seligmann, 2013a,b). We study here the 23 bijective transformation codes ΠX of X which may code nucleotide exchanging RNA transcription as suggested by this mitochondrial observation. The 23 bijective transformation codes ΠX are C(3) trinucleotide circular codes, seven of them are also self-complementary. Furthermore, several correlations are observed between the Reading Frame Retrieval (RFR) probability of bijective transformation codes ΠX and the different biological properties of ΠX related to their numbers of RNAs in GenBanks EST database, their polymerization rate, their number of amino acids and the chirality of amino acids they code. Results suggest that the circular code X with the functions of reading frame retrieval and maintenance in regular RNA transcription, may also have, through its bijective transformation codes ΠX, the same functions in nucleotide exchanging RNA transcription. Associations with properties such as amino acid chirality suggest that the RFR of X and its bijective transformations molded the origins of the genetic codes machinery.
Computational Biology and Chemistry | 2013
Christian J. Michel
In 1996, a trinucleotide circular code X is identified in genes of prokaryotes and eukaryotes (Arquès and Michel, 1996). In 2012, X motifs are identified in the transfer RNA (tRNA) Phe and 16S ribosomal RNA (Michel, 2012). A statistical analysis of X motifs in all available tRNAs of prokaryotes and eukaryotes in the genomic tRNA database (September 2012, http://lowelab.ucsc.edu/GtRNAdb/, Lowe and Eddy, 1997) is carried out here. For this purpose, a search algorithm of X motifs in a DNA sequence is developed. Two definitions allow to determine the occurrence probabilities of X motifs and the circular codes X, X1=P(X) and X2=P(2)(X) (P being a circular permutation map applied on X) in a population of tRNAs. This approach identifies X motifs in the 5 and/or 3 regions of 16 isoaccepting tRNAs (except for the tRNAs Arg, His, Ser and Trp). The statistical analyses are performed on different and large tRNA populations according to the taxonomy (prokaryotes and eukaryotes), tRNA length and tRNA score. Finally, a circular code property observed in genes of prokaryotes and eukaryotes is identified in the 3 regions of 19 isoaccepting tRNAs of prokaryotes and eukaryotes (except for the tRNA Leu). The identification of X motifs and a gene circular code property in tRNAs strengthens the concept proposed in Michel (2012) of a possible translation (framing) code based on a circular code.
Computational Biology and Chemistry | 2014
Karim El Soufi; Christian J. Michel
A translation (framing) code based on the circular code was proposed in Michel (2012) with the identification of X circular code motifs (X motifs shortly) in the bacterial rRNA of Thermus thermophilus, in particular in the ribosome decoding center. Three classes of X motifs are now identified in the rRNAs of bacteria Escherichia coli and Thermus thermophilus, archaea Pyrococcus furiosus, nuclear eukaryotes Saccharomyces cerevisiae, Triticum aestivum and Homo sapiens, and chloroplast Spinacia oleracea. The universally conserved nucleotides A1492 and A1493 in all studied rRNAs (bacteria, archaea, nuclear eukaryotes, and chloroplasts) belong to X motifs (calledu2009mAA). The conserved nucleotide G530 in rRNAs of bacteria and archaea belongs to X motifs (calledu2009mG). Furthermore, the X motif mG is also found in rRNAs of nuclear eukaryotes and chloroplasts. Finally, a potentially important X motif, called m, is identified in all studied rRNAs. With the available crystallographic structures of the Protein Data Bank PDB, we also show that these X motifs mAA, mG, and m belong to the ribosome decoding center of all studied rRNAs with possible interaction with the mRNA X motifs and the tRNA X motifs. The three classes of X motifs identified here in rRNAs of several and different organisms strengthen the concept of translation code based on the circular code.
Journal of Theoretical Biology | 2015
Christian J. Michel
In 1996, a set X of 20 trinucleotides is identified in genes of both prokaryotes and eukaryotes which has in average the highest occurrence in reading frame compared to the two shifted frames (Arquès and Michel, 1996). Furthermore, this set X has an interesting mathematical property as X is a maximal C(3) self-complementary trinucleotide circular code (Arquès and Michel, 1996). In 2014, the number of trinucleotides in prokaryotic genes has been multiplied by a factor of 527. Furthermore, two new gene kingdoms of plasmids and viruses contain enough trinucleotide data to be analysed. The approach used in 1996 for identifying a preferential frame for a trinucleotide is quantified here with a new definition analysing the occurrence probability of a complementary/permutation (CP) trinucleotide set in a gene kingdom. Furthermore, in order to increase the statistical significance of results compared to those of 1996, the circular code X is studied on several gene taxonomic groups in a kingdom. Based on this new statistical approach, the circular code X is strengthened in genes of prokaryotes and eukaryotes, and now also identified in genes of plasmids. A subset of X with 18 or 16 trinucleotides is identified in genes of viruses. Furthermore, a simple probabilistic model based on the independent occurrence of trinucleotides in reading frame of genes explains the circular code frequencies and asymmetries observed in the shifted frames in all studied gene kingdoms. Finally, the developed approach allows to identify variant X codes in genes, i.e. trinucleotide codes which differ from X. In genes of bacteria, eukaryotes and plasmids, 14 among the 47 studied gene taxonomic groups (about 30%) have variant X codes. Seven variant X codes are identified with at least 16 trinucleotides of X. Two variant X codes XA in cyanobacteria and plasmids of cyanobacteria, and XD in birds are self-complementary, without permuted trinucleotides but non-circular. Five variant X codes XB in deinococcus, plasmids of chloroflexi and deinococcus, mammals and kinetoplasts, XC in elusimicrobia and apicomplexans, XE in fishes, XF in insects, and XG in basidiomycetes and plasmids of spirochaetes are C(3) self-complementary circular. In genes of viruses, no variant X code is found.
Journal of Theoretical Biology | 2010
Ahmed Ahmed; Gabriel Frey; Christian J. Michel
A circular code is a set of trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons, and automatically with a window of few nucleotides. In 1996, a common circular code, called X, was identified in large populations of eukaryotic and prokaryotic genes. Hence, it is believed to be an ancestral structural property of genes. A new computational approach based on comparative genomics is developed to identify essential molecular functions associated with circular codes. It is based on a quantitative and sensitive statistical method (FPTF) to identify three permuted trinucleotide sets in the three frames of genes, a flower automaton algorithm to determine if a trinucleotide set is a circular code or not, and an integrated Gene Ontology and Taxonomy (iGOT) database. By carrying out automatic circular code analyses on a huge number of gene populations where each population is associated with a particular molecular function, it identifies 266 gene populations having circular codes close to X. Surprisingly, their molecular functions include 98% of those covered by the essential genes of the DEG database (Database of Essential Genes). Furthermore, three trinucleotides GTG, AAG and GCG, replacing three trinucleotides of the code X and called evolutionary trinucleotides, significantly occur in these 266 gene populations. Finally, a new method developed to analyse and quantify the stability of a set of trinucleotides demonstrates that these evolutionary trinucleotides are associated with a significant increase of the stability of the common circular code X. Indeed, its stability increases from the 1502th rank to the 16th rank after the replacement of the three evolutionary trinucleotides among 9920 possible trinucleotide replacement sets.
Computational Biology and Chemistry | 2010
Christian J. Michel; Giuseppe Pirillo
A new trinucleotide proposition is proved here and allows all the trinucleotide circular codes on the genetic alphabet to be identified (their numbers and their sets of words). This new class of genetic motifs, i.e. circular codes (or synchronizing genetic motifs), may be involved in the structure and the origin of the genetic code, and in reading frames of genes.
Philosophical Transactions of the Royal Society A | 2016
Elena Fimmel; Christian J. Michel; Lutz Strüngmann
The circular code theory proposes that genes are constituted of two trinucleotide codes: the classical genetic code with 61 trinucleotides for coding the 20 amino acids (except the three stop codons {TAA,TAG,TGA}) and a circular code based on 20 trinucleotides for retrieving, maintaining and synchronizing the reading frame. It relies on two main results: the identification of a maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses (Michel 2015 J. Theor. Biol. 380, 156–177. (doi:10.1016/j.jtbi.2015.04.009); Arquès & Michel 1996 J. Theor. Biol. 182, 45–58. (doi:10.1006/jtbi.1996.0142)) and the finding of X circular code motifs in tRNAs and rRNAs, in particular in the ribosome decoding centre (Michel 2012 Comput. Biol. Chem. 37, 24–37. (doi:10.1016/j.compbiolchem.2011.10.002); El Soufi & Michel 2014 Comput. Biol. Chem. 52, 9–17. (doi:10.1016/j.compbiolchem.2014.08.001)). The univerally conserved nucleotides A1492 and A1493 and the conserved nucleotide G530 are included in X circular code motifs. Recently, dinucleotide circular codes were also investigated (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631); Fimmel et al. 2015 J. Theor. Biol. 386, 159–165. (doi:10.1016/j.jtbi.2015.08.034)). As the genetic motifs of different lengths are ubiquitous in genes and genomes, we introduce a new approach based on graph theory to study in full generality n-nucleotide circular codes X, i.e. of length 2 (dinucleotide), 3 (trinucleotide), 4 (tetranucleotide), etc. Indeed, we prove that an n-nucleotide code X is circular if and only if the corresponding graph is acyclic. Moreover, the maximal length of a path in corresponds to the window of nucleotides in a sequence for detecting the correct reading frame. Finally, the graph theory of tournaments is applied to the study of dinucleotide circular codes. It has full equivalence between the combinatorics theory (Michel & Pirillo 2013 ISRN Biomath. 2013, 538631. (doi:10.1155/2013/538631)) and the group theory (Fimmel et al. 2015 J. Theor. Biol. 386, 159–165. (doi:10.1016/j.jtbi.2015.08.034)) of dinucleotide circular codes while its mathematical approach is simpler.
Journal of Theoretical Biology | 2014
Christian J. Michel
The reading frame coding (RFC) of codes (sets) of trinucleotides is a genetic concept which has been largely ignored during the last 50 years. A first objective is the definition of a new and simple statistical parameter PrRFC for analysing the probability (efficiency) of reading frame coding (RFC) of any trinucleotide code. A second objective is to reveal different classes and subclasses of trinucleotide codes involved in reading frame coding: the circular codes of 20 trinucleotides and the bijective genetic codes of 20 trinucleotides coding the 20 amino acids. This approach allows us to propose a genetic scale of reading frame coding which ranges from 1/3 with the random codes (RFC probability identical in the three frames) to 1 with the comma-free circular codes (RFC probability maximal in the reading frame and null in the two shifted frames). This genetic scale shows, in particular, the reading frame coding probabilities of the 12,964,440 circular codes (PrRFC=83.2% in average), the 216 C(3) self-complementary circular codes (PrRFC=84.1% in average) including the code X identified in eukaryotic and prokaryotic genes (PrRFC=81.3%) and the 339,738,624 bijective genetic codes (PrRFC=61.5% in average) including the 52 codes without permuted trinucleotides (PrRFC=66.0% in average). Otherwise, the reading frame coding probabilities of each trinucleotide code coding an amino acid with the universal genetic code are also determined. The four amino acids Gly, Lys, Phe and Pro are coded by codes (not circular) with RFC probabilities equal to 2/3, 1/2, 1/2 and 2/3, respectively. The amino acid Leu is coded by a circular code (not comma-free) with a RFC probability equal to 18/19. The 15 other amino acids are coded by comma-free circular codes, i.e. with RFC probabilities equal to 1. The identification of coding properties in some classes of trinucleotide codes studied here may bring new insights in the origin and evolution of the genetic code.
Information & Computation | 2012
Christian J. Michel; Giuseppe Pirillo; Mario A. Pirillo
Trinucleotide comma-free codes and trinucleotide circular codes are two important classes of codes in code theory and theoretical biology. A trinucleotide circular code containing exactly 20 elements is called here a 20-trinucleotide circular code. In this paper, solving a combinatorial problem of hard computational complexity, we extend and improve our results of C.J. Michel, G. Pirillo, and M.A. Pirillo (2008) [14] concerning the small class of 528 self-complementary 20-trinucleotide circular codes, to the complete class of the 20-trinucleotide circular codes which contains 12,964,440 elements. A surprising relation with the symmetric group @S4 appears but it remains unexplained so far.