Colin Molter
Université libre de Bruxelles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Colin Molter.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012
Cosmin Lazar; Jonatan Taminau; Stijn Meganck; D. Steenhoff; Alain Coletta; Colin Molter; V. de Schaetzen; Robin Duque; Hugues Bersini; Ann Nowé
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.
BMC Bioinformatics | 2012
Jonatan Taminau; Stijn Meganck; Cosmin Lazar; David Steenhoff; Alain Coletta; Colin Molter; Robin Duque; Virginie de Schaetzen; David Weiss Solís; Hugues Bersini; Ann Nowé
BackgroundWith an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck.ResultsWe present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well.ConclusionsBy providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
Genome Biology | 2012
Alain Coletta; Colin Molter; Robin Duque; David Steenhoff; Jonatan Taminau; Virginie de Schaetzen; Stijn Meganck; Cosmin Lazar; David Venet; Vincent Detours; Ann Nowé; Hugues Bersini; David Weiss Solís
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
Bioinformatics | 2011
Jonatan Taminau; David Steenhoff; Alain Coletta; Stijn Meganck; Cosmin Lazar; Virginie de Schaetzen; Robin Duque; Colin Molter; Hugues Bersini; Ann Nowé; David Weiss Solís
Microarray technology has become an integral part of biomedical research and increasing amounts of datasets become available through public repositories. However, re-use of these datasets is severely hindered by unstructured, missing or incorrect biological samples information; as well as the wide variety of preprocessing methods in use. The inSilicoDb R/Bioconductor package is a command-line front-end to the InSilico DB, a web-based database currently containing 86 104 expert-curated human Affymetrix expression profiles compiled from 1937 GEO repository series. The use of this package builds on the Bioconductor projects focus on reproducibility by enabling a clear workflow in which not only analysis, but also the retrieval of verified data is supported.
international symposium on neural networks | 2003
Colin Molter; Hugues Bersini
This papers aims at experimentally confirming in very small Hopfield networks that chaos becomes the spontaneous dynamics of a network, when the number of inputs to be learned and stored in its dynamical attractors is superior to the size of this network. The spontaneous dynamics of the network are shown to increase in complexity by increasing the size of the learning set. The type of chaos exploited to code these inputs is related with the frustrated chaos described in previous papers. When, following a brute-force but robust learning of the inputs, the network is presented with a series of unlearned ambiguous input, experimental results show how chaotic regimes become the natural response of the network, reflecting in its frustrated dynamics the inherent ambiguity of the input. A live demonstration can be shown where the rhythms associated with the dynamics can be heard. This experimental work might give additional support to the Skarda and Freeman strong intuition that chaos should play an important role in the storage and the search capacities of our brains.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013
Cosmin Lazar; Jonatan Taminau; Stijn Meganck; David Steenhoff; Alain Coletta; David Weiss Solís; Colin Molter; Robin Duque; Hugues Bersini; Ann Nowé
The potential of microarray gene expression (MAGE) data is only partially explored due to the limited number of samples in individual studies. This limitation can be surmounted by merging or integrating data sets originating from independent MAGE experiments, which are designed to study the same biological problem. However, this process is hindered by batch effects that are study-dependent and result in random data distortion; therefore numerical transformations are needed to render the integration of different data sets accurate and meaningful. Our contribution in this paper is two-fold. First we propose GENESHIFT, a new nonparametric batch effect removal method based on two key elements from statistics: empirical density estimation and the inner product as a distance measure between two probability density functions; second we introduce a new validation index of batch effect removal methods based on the observation that samples from two independent studies drawn from a same population should exhibit similar probability density functions. We evaluated and compared the GENESHIFT method with four other state-of-the-art methods for batch effect removal: Batch-mean centering, empirical Bayes or COMBAT, distance-weighted discrimination, and cross-platform normalization. Several validation indices providing complementary information about the efficiency of batch effect removal methods have been employed in our validation framework. The results show that none of the methods clearly outperforms the others. More than that, most of the methods used for comparison perform very well with respect to some validation indices while performing very poor with respect to others. GENESHIFT exhibits robust performances and its average rank is the highest among the average ranks of all methods used for comparison.
european conference on artificial life | 2003
Colin Molter; Hugues Bersini
This papers aims at experimentally confirming in very small Hopfield networks that chaos becomes the spontaneous dynamics of a network, when the number of inputs to be learned and stored in its dynamical attractors is superior to the size of this network. The spontaneous dynamics of the network are shown to increase in complexity by increasing the size of the learning set. The type of chaos exploited to code these inputs is related with the frustrated chaos described in previous papers. When, following a brute-force but robust learning of the inputs, the network is presented with a series of unlearned ambiguous input, experimental results show how chaotic regimes become the natural response of the network, reflecting in its frustrated dynamics the inherent ambiguity of the input. A live demonstration can be shown where the rhythms associated with the dynamics can be heard. This experimental work might give additional support to the Skarda and Freeman strong intuition that chaos should play an important role in the storage and the search capacities of our brains.
Genome Biology | 2011
Alain Coletta; Colin Molter; Robin Duque; David Steenhoff; Jonatan Taminau; V de Schaetzen; Cosmin Lazar; Stijn Meganck; Ann Nowé; Hugues Bersini; D Weiss
There are more than 20,000 genomic studies comprising 500,000 samples freely available in the Gene Expression Omnibus (GEO) database [1]. However, accessing these data requires complex computational steps, including structuring and formatting the clinical vocabulary used to annotate the samples. These complex steps hinder the accessibility of genomic datasets through visualization and analysis software platforms, such as GenePattern and R/Bioconductor, therefore hampering the pace of research. InSilico DB [2] is an online platform that provides a complete collaborative solution for structuring and formatting clinical annotations from GEO, making GenePattern and R datasets one click away for researchers. InSilico DB has made available powerful and intuitive online curation tools to structure the metadata of GEO datasets. The database is automatically updated daily, through GEO import pipelines. Datasets can have multiple annotations given by different users, and one user can have multiple versions of an annotation to suit different experimental questions. The InSilico DB platform supports datasets from Affymetrix human gene expression platforms, which account for 2,900 studies comprising 110,000 samples, making InSilico DB the largest public database of manually curated human gene expression samples. In addition to the web interface, InSilico DB offers programmatic access through an R/Bioconductor package [3]. Future releases of InSilico DB will include Illumina RNA-Seq platform data and Affymetrix mouse gene expression data.
Briefings in Bioinformatics | 2013
Cosmin Lazar; Stijn Meganck; Jonatan Taminau; David Steenhoff; Alain Coletta; Colin Molter; David Y. Weiss-Solís; Robin Duque; Hugues Bersini; Ann Nowé
international symposium on neural networks | 2005
Colin Molter; Utku Salihoglu; Hugues Bersini