Serge Léger
National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Serge Léger.
BMC Bioinformatics | 2011
Dan Tulpan; Serge Léger; Luc Belliveau; Adrian S. Culf; Miroslava Cuperlovic-Culf
BackgroundOne-dimensional 1H-NMR spectroscopy is widely used for high-throughput characterization of metabolites in complex biological mixtures. However, the accurate identification of individual compounds is still a challenging task, particularly in spectral regions with higher peak densities. The need for automatic tools to facilitate and further improve the accuracy of such tasks, while using increasingly larger reference spectral libraries becomes a priority of current metabolomics research.ResultsWe introduce a web server application, called MetaboHunter, which can be used for automatic assignment of 1H-NMR spectra of metabolites. MetaboHunter provides methods for automatic metabolite identification based on spectra or peak lists with three different search methods and with possibility for peak drift in a user defined spectral range. The assignment is performed using as reference libraries manually curated data from two major publicly available databases of NMR metabolite standard measurements (HMDB and MMCD). Tests using a variety of synthetic and experimental spectra of single and multi metabolite mixtures show that MetaboHunter is able to identify, in average, more than 80% of detectable metabolites from spectra of synthetic mixtures and more than 50% from spectra corresponding to experimental mixtures. This work also suggests that better scoring functions improve by more than 30% the performance of MetaboHunters metabolite identification methods.ConclusionsMetaboHunter is a freely accessible, easy to use and user friendly 1H-NMR-based web server application that provides efficient data input and pre-processing, flexible parameter settings, fast and automatic metabolite fingerprinting and results visualization via intuitive plotting and compound peak hit maps. Compared to other published and freely accessible metabolomics tools, MetaboHunter implements three efficient methods to search for metabolites in manually curated data from two reference libraries.Availabilityhttp://www.nrcbioinformatics.ca/metabohunter/
BioMed Research International | 2013
Dan Tulpan; Chaouki Regoui; Guillaume Durand; Luc Belliveau; Serge Léger
This paper presents a novel hybrid DNA encryption (HyDEn) approach that uses randomized assignments of unique error-correcting DNA Hamming code words for single characters in the extended ASCII set. HyDEn relies on custom-built quaternary codes and a private key used in the randomized assignment of code words and the cyclic permutations applied on the encoded message. Along with its ability to detect and correct errors, HyDEn equals or outperforms existing cryptographic methods and represents a promising in silico DNA steganographic approach.
Chemical Science | 2011
Miroslava Cuperlovic-Culf; Ian C. Chute; Adrian S. Culf; Mohamed Touaibia; Anirban Ghosh; Steve Griffiths; Dan Tulpan; Serge Léger; Anissa Belkaid; Marc E. Surette; Rodney J. Ouellette
1H NMR analysis was performed on metabolic extracts from a selection of six breast cell lines, including normal-immortalized, invasive ductal carcinomas and adenocarcinomas. Metabolites with significant concentration differences between normal and cancerous cells as well as ER+ and ER− (estrogen receptor) cells were determined and their relation to the differentially expressed genes was explored. Major differences have been shown for many amino acids and this was linked to expression level changes of related genes. Observed changes in choline concentration were connected to expression level changes of the SCL44A1 transporter gene.
international conference on computational linguistics | 2014
Cyril Goutte; Serge Léger; Marine Carpuat
We describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classifier with 99.99% accuracy on the five target groups. Within each group (except English), we use a voting combination of discriminative classifiers trained on a variety of feature spaces, achieving an average accuracy of 95.71%, with per-group accuracy between 90.95% and 100% depending on the group. This approach turns out to reach the best performance among all systems submitted to the open and closed tasks.
BMC Bioinformatics | 2010
Dan Tulpan; Mirela Andronescu; Serge Léger
BackgroundEstimation of DNA duplex hybridization free energy is widely used for predicting cross-hybridizations in DNA computing and microarray experiments. A number of software programs based on different methods and parametrizations are available for the theoretical estimation of duplex free energies. However, significant differences in free energy values are sometimes observed among estimations obtained with various methods, thus being difficult to decide what value is the accurate one.ResultsWe present in this study a quantitative comparison of the similarities and differences among four published DNA/DNA duplex free energy calculation methods and an extended Nearest-Neighbour Model for perfect matches based on triplet interactions. The comparison was performed on a benchmark data set with 695 pairs of short oligos that we collected and manually curated from 29 publications. Sequence lengths range from 4 to 30 nucleotides and span a large GC-content percentage range. For perfect matches, we propose an extension of the Nearest-Neighbour Model that matches or exceeds the performance of the existing ones, both in terms of correlations and root mean squared errors. The proposed model was trained on experimental data with temperature, sodium and sequence concentration characteristics that span a wide range of values, thus conferring the model a higher power of generalization when used for free energy estimations of DNA duplexes under non-standard experimental conditions.ConclusionsBased on our preliminary results, we conclude that no statistically significant differences exist among free energy approximations obtained with 4 publicly available and widely used programs, when benchmarked against a collection of 695 pairs of short oligos collected and curated by the authors of this work based on 29 publications. The extended Nearest-Neighbour Model based on triplet interactions presented in this work is capable of performing accurate estimations of free energies for perfect match duplexes under both standard and non-standard experimental conditions and may serve as a baseline for further developments in this area of research.
BMC Genomics | 2015
Dan Tulpan; Serge Léger; Alain B. Tchagang; Youlian Pan
BackgroundWhile the gargantuan multi-nation effort of sequencing T. aestivum gets close to completion, the annotation process for the vast number of wheat genes and proteins is in its infancy. Previous experimental studies carried out on model plant organisms such as A. thaliana and O. sativa provide a plethora of gene annotations that can be used as potential starting points for wheat gene annotations, proven that solid cross-species gene-to-gene and protein-to-protein correspondences are provided.ResultsDNA and protein sequences and corresponding annotations for T. aestivum and 9 other plant species were collected from Ensembl Plants release 22 and curated. Cliques of predicted 1-to-1 orthologs were identified and an annotation enrichment model was defined based on existing gene-GO term associations and phylogenetic relationships among wheat and 9 other plant species. A total of 13 cliques of size 10 were identified, which represent putative functionally equivalent genes and proteins in the 10 plant species. Eighty-five new and more specific GO terms were associated with wheat genes in the 13 cliques of size 10, which represent a 65% increase compared with the previously 130 known GO terms. Similar expression patterns for 4 genes from Arabidopsis, barley, maize and rice in cliques of size 10 provide experimental evidence to support our model. Overall, based on clique size equal or larger than 3, our model enriched the existing gene-GO term associations for 7,838 (8%) wheat genes, of which 2,139 had no previous annotation.ConclusionsOur novel comparative genomics approach enriches existing T. aestivum gene annotations based on cliques of predicted 1-to-1 orthologs, phylogenetic relationships and existing gene ontologies from 9 other plant species.
workshop on innovative use of nlp for building educational applications | 2015
Cyril Goutte; Guillaume Durand; Serge Léger
A key aspect of cognitive diagnostic models is the specification of the Q-matrix associating the items and some underlying student attributes. In many data-driven approaches, test items are mapped to the underlying, latent knowledge components (KC) based on observed student performance, and with little or no input from human experts. As a result, these latent skills typically focus on modeling the data accurately, but may be hard to describe and interpret. In this paper, we focus on the problem of describing these knowledge components. Using a simple probabilistic model, we extract, from the text of the test items, some keywords that are most relevant to each KC. On a small dataset from the PSLC datashop, we show that this is surprisingly effective, retrieving unknown skill labels in close to 50% of cases. We also show that our method clearly outperforms typical baselines in specificity and diversity.
international conference on agents and artificial intelligence | 2018
Nabil Belacel; Guillaume Durand; Serge Léger; Cajetan Bouchard
Collaborative filtering (CF) is a well-known and successful filtering technique that has its own limits, especially in dealing with highly sparse and large-scale data. To address this scalability issue, some researchers propose to use clustering methods like K-means that has the shortcomings of having its performances highly dependent on the manual definition of its number of clusters and on the selection of the initial centroids, which leads in case of ill-defined values to inaccurate recommendations and an increase in computation time. In this paper, we will show how the Merging and Splitting clustering algorithm can improve the performances of recommendation with reasonable computation time by comparing it with K-means based approach. Our experiment results demonstrate that the performances of our system are independent on the initial partition by considering the statistical nature of data. More specially, results in this paper provide significant evidences that the proposed splitting-merging clustering based CF is more scalable than the well-known K-means clustering
learning analytics and knowledge | 2018
Guillaume Durand; Cyril Goutte; Nabil Belacel; Yassine Bouslimani; Serge Léger
Competency based education (CBE) is seen by many as a way to optimize learning on cost, efficiency and flexibility. However, defining the required competencies, assigning them to specific courses and building the assessments evaluating students proficiency can be tedious. More precisely, making sure that the assessments evaluate what they are supposed to evaluate requires a fair amount of psychometrics knowledge and time that can be difficult for teachers to acquire, maintain and use. Addressing assessment validity and more specifically competency frameworks mapping adequacy, we propose a rule-based tool to ease the building and the refinement of CBE courses and curricula. After introducing the context and briefly the related work, we present our set of rules before illustrating the capacity of the proposed diagnostic tool on an engineering curriculum. Experiments show that this tool can improve mapping adequacy in term of predictive accuracy and would require more efforts towards competency parameters reliability measurement.
artificial intelligence in education | 2018
Cyril Goutte; Guillaume Durand; Serge Léger
Learning curves are a crucial tool to accurately measure learners skills and give meaningful feedback in intelligent tutoring systems. Here we discuss various ways of building learning curves from empirical data for the Additive Factor model (AFM) and highlight their limitations. We focus on the impact of student attrition, a.k.a. attrition bias. We propose a new way to build learning curves, by combining empirical observations and AFM predictions. We validate this proposition on simulated data, and test it on real datasets.