Gregory Nuel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory Nuel is active.

Explore More

Publication

Featured researches published by Gregory Nuel.

Nucleic Acids Research | 2003

AMIGene: Annotation of MIcrobial Genes

Stéphanie Bocs; Stéphane Cruveiller; David Vallenet; Gregory Nuel; Claudine Médigue

UNLABELLED AMIGene (Annotation of MIcrobial Genes) is an application for automatically identifying the most likely coding sequences (CDSs) in a large contig or a complete bacterial genome sequence. The first step in AMIGene is dedicated to the construction of Markov models that fit the input genomic data (i.e. the gene model), followed by the combination of well-known gene-finding methods and an heuristic approach for the selection of the most likely CDSs. The web interface allows the user to select one or several gene models applied to the analysis of the input sequence by the AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable text format. The AMIGene web site is accessible at the following address: http://www.genoscope.cns.fr/agc/tools/amigene/index.html ( CONTACT [email protected]).

PLOS ONE | 2010

Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies

Marine Jeanmougin; Aurélien de Reyniès; Laetitia Marisa; Caroline Paccard; Gregory Nuel; Mickael Guedj

High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welchs t-test, ANOVA [1], Wilcoxons test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.

PLOS Genetics | 2010

Deciphering Normal Blood Gene Expression Variation-The NOWAC Postgenome Study

Vanessa Dumeaux; Karina Standahl Olsen; Gregory Nuel; Ruth H. Paulssen; Anne Lise Børresen-Dale; Eiliv Lund

There is growing evidence that gene expression profiling of peripheral blood cells is a valuable tool for assessing gene signatures related to exposure, drug-response, or disease. However, the true promise of this approach can not be estimated until the scientific community has robust baseline data describing variation in gene expression patterns in normal individuals. Using a large representative sample set of postmenopausal women (N = 286) in the Norwegian Women and Cancer (NOWAC) postgenome study, we investigated variability of whole blood gene expression in the general population. In particular, we examined changes in blood gene expression caused by technical variability, normal inter-individual differences, and exposure variables at proportions and levels relevant to real-life situations. We observe that the overall changes in gene expression are subtle, implying the need for careful analytic approaches of the data. In particular, technical variability may not be ignored and subsequent adjustments must be considered in any analysis. Many new candidate genes were identified that are differentially expressed according to inter-individual (i.e. fasting, BMI) and exposure (i.e. smoking) factors, thus establishing that these effects are mirrored in blood. By focusing on the biological implications instead of directly comparing gene lists from several related studies in the literature, our analytic approach was able to identify significant similarities and effects consistent across these reports. This establishes the feasibility of blood gene expression profiling, if they are predicated upon careful experimental design and analysis in order to minimize confounding signals, artifacts of sample preparation and processing, and inter-individual differences.

Annals of Human Genetics | 2008

A Note on Allelic Tests in Case-Control Association Studies

Mickael Guedj; Gregory Nuel; Bernard Prum

This paper reconsiders the relevant contribution of Sasieni in the validity of allele‐based tests in case‐control genetic association studies. In particular, the author clearly demonstrates that the classical chi‐square test applied to allelic contingency tables is biased when the combined case‐control population is not in Hardy‐Weinberg equilibrium. As an alternative, he suggests using the Cochran‐Armitage test for trends by basing his argument on the fact that these two tests are asymptotically equivalent at the Hardy‐Weinberg equilibrium. However he only demonstrates the equality of the statistics when the observed genotypic proportions are strictly in equilibrium ‐ which does not formally imply the suggested, and often accepted, asymptotic behavior.

PLOS ONE | 2012

ISL1 directly regulates FGF10 transcription during human cardiac outflow formation.

Christelle Golzio; Emmanuelle Havis; Philippe Daubas; Gregory Nuel; Candice Babarit; Arnold Munnich; Michel Vekemans; Stéphane Zaffran; Stanislas Lyonnet; Heather Etchevers

The LIM homeodomain gene Islet-1 (ISL1) encodes a transcription factor that has been associated with the multipotency of human cardiac progenitors, and in mice enables the correct deployment of second heart field (SHF) cells to become the myocardium of atria, right ventricle and outflow tract. Other markers have been identified that characterize subdomains of the SHF, such as the fibroblast growth factor Fgf10 in its anterior region. While functional evidence of its essential contribution has been demonstrated in many vertebrate species, SHF expression of Isl1 has been shown in only some models. We examined the relationship between human ISL1 and FGF10 within the embryonic time window during which the linear heart tube remodels into four chambers. ISL1 transcription demarcated an anatomical region supporting the conserved existence of a SHF in humans, and transcription factors of the GATA family were co-expressed therein. In conjunction, we identified a novel enhancer containing a highly conserved ISL1 consensus binding site within the FGF10 first intron. ChIP and EMSA demonstrated its direct occupation by ISL1. Transcription mediated by ISL1 from this FGF10 intronic element was enhanced by the presence of GATA4 and TBX20 cardiac transcription factors. Finally, transgenic mice confirmed that endogenous factors bound the human FGF10 intronic enhancer to drive reporter expression in the developing cardiac outflow tract. These findings highlight the interest of examining developmental regulatory networks directly in human tissues, when possible, to assess candidate non-coding regions that may be responsible for congenital malformations.

Journal of Medical Genetics | 2008

A PCSK9 variant and familial combined hyperlipidaemia

Marianne Abifadel; L. Bernier; G. Dubuc; Gregory Nuel; Jean-Pierre Rabès; J. Bonneau; A. Marques; M. Marduel; Martine Devillers; Arnold Munnich; Danièle Erlich; Mathilde Varret; M. Roy; J. Davignon; Catherine Boileau

Background: Our discovery in 2003 of the first mutations of PCSK9 gene causing autosomal dominant hypercholesterolaemia (ADH) shed light on an unknown factor that strongly influences the level of circulating low density lipoprotein cholesterol (LDL-C). PCSK9 gain of function mutations cause hypercholesterolaemia by a reduction of LDL receptor levels, while PCSK9 loss of function variants are associated with a reduction of LDL-C values and a decreased risk of coronary heart disease. Methods and results: We report an insertion of two leucines (p.L21tri also designated p.L15_L16ins2L) in the leucine stretch of the signal peptide of PCSK9 that is found in two of 25 families with familial combined hyperlipidaemia (FCHL). This mutant is associated with high total cholesterol and LDL-C values in these families and is found also in a patient with familial hypercholesterolaemia and her father. Conclusion: PCSK9 variants might contribute to FCHL phenotype and are to be taken into consideration in the study of this complex and multigenic disease with other genes implicated in dyslipidaemia.

Human Heredity | 2006

A fast, unbiased and exact allelic test for case-control association studies.

Mickael Guedj; J. Wojcik; E. Della-Chiesa; Gregory Nuel; Karl Forner

Association studies are traditionally performed in the case-control framework. As a first step in the analysis process, comparing allele frequencies using the Pearson’s chi-square statistic is often invoked. However such an approach assumes the independence of alleles under the hypothesis of no association, which may not always be the case. Consequently this method introduces a bias that deviates the expected type I error-rate. In this article we first propose an unbiased and exact test as an alternative to the biased allelic test. Available data require to perform thousands of such tests so we focused on its fast execution. Since the biased allelic test is still widely used in the community, we illustrate its pitfalls in the context of genome-wide association studies and particularly in the case of low-level tests. Finally, we compare the unbiased and exact test with the Cochran-Armitage test for trend and show it perfoms similarly in terms of power. The fast, unbiased and exact allelic test code is available in R, C++ and Perl at: http://stat.genopole.cnrs.fr/software/fueatest.

BMC Bioinformatics | 2011

Conotoxin protein classification using free scores of words and support vector machines

Nazar Zaki; Stefan Wolfsheimer; Gregory Nuel; Sawsan Khuri

BackgroundConotoxin has been proven to be effective in drug design and could be used to treat various disorders such as schizophrenia, neuromuscular disorders and chronic pain. With the rapidly growing interest in conotoxin, accurate conotoxin superfamily classification tools are desirable to systematize the increasing number of newly discovered sequences and structures. However, despite the significance and extensive experimental investigations on conotoxin, those tools have not been intensively explored.ResultsIn this paper, we propose to consider suboptimal alignments of words with restricted length. We developed a scoring system based on local alignment partition functions, called free score. The scoring system plays the key role in the feature extraction step of support vector machine classification. In the classification of conotoxin proteins, our method, SVM-Freescore, features an improved sensitivity and specificity by approximately 5.864% and 3.76%, respectively, over previously reported methods. For the generalization purpose, SVM-Freescore was also applied to classify superfamilies from curated and high quality database such as ConoServer. The average computed sensitivity and specificity for the superfamily classification were found to be 0.9742 and 0.9917, respectively.ConclusionsThe SVM-Freescore method is shown to be a useful sequence-based analysis tool for functional and structural characterization of conotoxin proteins. The datasets and the software are available at http://faculty.uaeu.ac.ae/nzaki/SVM-Freescore.htm.

Algorithms for Molecular Biology | 2006

Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics.

Gregory Nuel

The technique of Finite Markov Chain Imbedding (FMCI) is a classical approach to complex combinatorial problems related to sequences. In order to get efficient algorithms, it is known that such approaches need to be first rewritten using recursive relations. We propose here to give here a general recursive algorithms allowing to compute in a numerically stable manner exact Cumulative Distribution Function (CDF) or complementary CDF (CCDF). These algorithms are then applied in two particular cases: the local score of one sequence and pattern statistics. In both cases, asymptotic developments are derived. For the local score, our new approach allows for the very first time to compute exact p-values for a practical study (finding hydrophobic segments in a protein database) where only approximations were available before. In this study, the asymptotic approximations appear to be completely unreliable for 99.5% of the considered sequences. Concerning the pattern statistics, the new FMCI algorithms dramatically outperform the previous ones as they are more reliable, easier to implement, faster and with lower memory requirements.

BMC Bioinformatics | 2010

Mining protein loops using a structural alphabet and statistical exceptionality

Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux

BackgroundProtein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.ResultsWe developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.ConclusionsWe developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.

Explore More