Cosmin Lazar
Vrije Universiteit Brussel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cosmin Lazar.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012
Cosmin Lazar; Jonatan Taminau; Stijn Meganck; D. Steenhoff; Alain Coletta; Colin Molter; V. de Schaetzen; Robin Duque; Hugues Bersini; Ann Nowé
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.
BMC Bioinformatics | 2012
Jonatan Taminau; Stijn Meganck; Cosmin Lazar; David Steenhoff; Alain Coletta; Colin Molter; Robin Duque; Virginie de Schaetzen; David Weiss Solís; Hugues Bersini; Ann Nowé
BackgroundWith an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck.ResultsWe present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well.ConclusionsBy providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
Genome Biology | 2012
Alain Coletta; Colin Molter; Robin Duque; David Steenhoff; Jonatan Taminau; Virginie de Schaetzen; Stijn Meganck; Cosmin Lazar; David Venet; Vincent Detours; Ann Nowé; Hugues Bersini; David Weiss Solís
Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
Bioinformatics | 2011
Jonatan Taminau; David Steenhoff; Alain Coletta; Stijn Meganck; Cosmin Lazar; Virginie de Schaetzen; Robin Duque; Colin Molter; Hugues Bersini; Ann Nowé; David Weiss Solís
Microarray technology has become an integral part of biomedical research and increasing amounts of datasets become available through public repositories. However, re-use of these datasets is severely hindered by unstructured, missing or incorrect biological samples information; as well as the wide variety of preprocessing methods in use. The inSilicoDb R/Bioconductor package is a command-line front-end to the InSilico DB, a web-based database currently containing 86 104 expert-curated human Affymetrix expression profiles compiled from 1937 GEO repository series. The use of this package builds on the Bioconductor projects focus on reproducibility by enabling a clear workflow in which not only analysis, but also the retrieval of verified data is supported.
International Scholarly Research Notices | 2014
Jonatan Taminau; Cosmin Lazar; Stijn Meganck; Ann Nowé
An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data—meta-analysis and data merging—are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach.
Journal of Chemical Information and Modeling | 2012
Yunierkis Pérez-Castillo; Cosmin Lazar; Jonatan Taminau; Matheus Froeyen; Miguel Ángel Cabrera-Pérez; Ann Nowé
Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.
Molecular Diversity | 2014
Yunierkis Pérez-Castillo; Maykel Cruz-Monteagudo; Cosmin Lazar; Jonatan Taminau; Mathy Froeyen; Miguel Ángel Cabrera-Pérez; Ann Nowé
Antibiotic resistance has increased over the past two decades. New approaches for the discovery of novel antibacterials are required and innovative strategies will be necessary to identify novel and effective candidates. Related to this problem, the exploration of bacterial targets that remain unexploited by the current antibiotics in clinical use is required. One of such targets is the
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013
Cosmin Lazar; Jonatan Taminau; Stijn Meganck; David Steenhoff; Alain Coletta; David Weiss Solís; Colin Molter; Robin Duque; Hugues Bersini; Ann Nowé
international conference on latent variable analysis and signal separation | 2010
Cosmin Lazar; Danielle Nuzillard; Ann Nowé
\beta
international conference computational systems-biology and bioinformatics | 2010
Jonatan Taminau; Stijn Meganck; Cosmin Lazar; David Y. Weiss-Solís; Alain Coletta; Nic Walker; Hugues Bersini; Ann Nowé