Cosmin Lazar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cosmin Lazar is active.

Explore More

Publication

Featured researches published by Cosmin Lazar.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis

Cosmin Lazar; Jonatan Taminau; Stijn Meganck; D. Steenhoff; Alain Coletta; Colin Molter; V. de Schaetzen; Robin Duque; Hugues Bersini; Ann Nowé

A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

BMC Bioinformatics | 2012

Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

Jonatan Taminau; Stijn Meganck; Cosmin Lazar; David Steenhoff; Alain Coletta; Colin Molter; Robin Duque; Virginie de Schaetzen; David Weiss Solís; Hugues Bersini; Ann Nowé

BackgroundWith an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck.ResultsWe present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well.ConclusionsBy providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].

Genome Biology | 2012

InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor

Alain Coletta; Colin Molter; Robin Duque; David Steenhoff; Jonatan Taminau; Virginie de Schaetzen; Stijn Meganck; Cosmin Lazar; David Venet; Vincent Detours; Ann Nowé; Hugues Bersini; David Weiss Solís

Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.

Bioinformatics | 2011

inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO

Jonatan Taminau; David Steenhoff; Alain Coletta; Stijn Meganck; Cosmin Lazar; Virginie de Schaetzen; Robin Duque; Colin Molter; Hugues Bersini; Ann Nowé; David Weiss Solís

Microarray technology has become an integral part of biomedical research and increasing amounts of datasets become available through public repositories. However, re-use of these datasets is severely hindered by unstructured, missing or incorrect biological samples information; as well as the wide variety of preprocessing methods in use. The inSilicoDb R/Bioconductor package is a command-line front-end to the InSilico DB, a web-based database currently containing 86 104 expert-curated human Affymetrix expression profiles compiled from 1937 GEO repository series. The use of this package builds on the Bioconductor projects focus on reproducibility by enabling a clear workflow in which not only analysis, but also the retrieval of verified data is supported.

International Scholarly Research Notices | 2014

Comparison of merging and meta-analysis as alternative approaches for integrative gene expression analysis.

Jonatan Taminau; Cosmin Lazar; Stijn Meganck; Ann Nowé

An increasing amount of microarray gene expression data sets is available through public repositories. Their huge potential in making new findings is yet to be unlocked by making them available for large-scale analysis. In order to do so it is essential that independent studies designed for similar biological problems can be integrated, so that new insights can be obtained. These insights would remain undiscovered when analyzing the individual data sets because it is well known that the small number of biological samples used per experiment is a bottleneck in genomic analysis. By increasing the number of samples the statistical power is increased and more general and reliable conclusions can be drawn. In this work, two different approaches for conducting large-scale analysis of microarray gene expression data—meta-analysis and data merging—are compared in the context of the identification of cancer-related biomarkers, by analyzing six independent lung cancer studies. Within this study, we investigate the hypothesis that analyzing large cohorts of samples resulting in merging independent data sets designed to study the same biological problem results in lower false discovery rates than analyzing the same data sets within a more conservative meta-analysis approach.

Journal of Chemical Information and Modeling | 2012

GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design

Yunierkis Pérez-Castillo; Cosmin Lazar; Jonatan Taminau; Matheus Froeyen; Miguel Ángel Cabrera-Pérez; Ann Nowé

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

Molecular Diversity | 2014

Toward the computer-aided discovery of FabH inhibitors. Do predictive QSAR models ensure high quality virtual screening performance?

Yunierkis Pérez-Castillo; Maykel Cruz-Monteagudo; Cosmin Lazar; Jonatan Taminau; Mathy Froeyen; Miguel Ángel Cabrera-Pérez; Ann Nowé

Antibiotic resistance has increased over the past two decades. New approaches for the discovery of novel antibacterials are required and innovative strategies will be necessary to identify novel and effective candidates. Related to this problem, the exploration of bacterial targets that remain unexploited by the current antibiotics in clinical use is required. One of such targets is the

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013