Céline Brouard
Helsinki Institute for Information Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Céline Brouard.
Bioinformatics | 2016
Céline Brouard; Huibin Shen; Kai Dührkop; Florence d'Alché-Buc; Sebastian Böcker; Juho Rousu
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
BMC Bioinformatics | 2013
Céline Brouard; Christel Vrain; Julie Dubois; David Castel; Marie-Anne Debily; Florence d'Alché-Buc
BackgroundGene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of first-order logic rules.ResultsWe propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into first-order logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a black-box model such as a pairwise SVM while providing relevant insights on the predictions.ConclusionsThe numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to cross-validate experimental data with existing knowledge.
PLOS ONE | 2016
Jana Kludas; Mikko Arvas; Sandra Castillo; Tiina Pakula; Merja Oja; Céline Brouard; Jussi Jäntti; Merja Penttilä; Juho Rousu
In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker’s yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.
Bioinformatics | 2018
Eric Bach; Sandor Szedmak; Céline Brouard; Sebastian Böcker; Juho Rousu
Motivation Liquid Chromatography (LC) followed by tandem Mass Spectrometry (MS/MS) is one of the predominant methods for metabolite identification. In recent years, machine learning has started to transform the analysis of tandem mass spectra and the identification of small molecules. In contrast, LC data is rarely used to improve metabolite identification, despite numerous published methods for retention time prediction using machine learning. Results We present a machine learning method for predicting the retention order of molecules; that is, the order in which molecules elute from the LC column. Our method has important advantages over previous approaches: We show that retention order is much better conserved between instruments than retention time. To this end, our method can be trained using retention time measurements from different LC systems and configurations without tedious pre‐processing, significantly increasing the amount of available training data. Our experiments demonstrate that retention order prediction is an effective way to learn retention behaviour of molecules from heterogeneous retention time data. Finally, we demonstrate how retention order prediction and MS/MS‐based scores can be combined for more accurate metabolite identifications when analyzing a complete LC‐MS/MS run. Availability and implementation Implementation of the method is available at https://version.aalto.fi/gitlab/bache1/retention_order_prediction.git.
discovery science | 2016
Huibin Shen; Sandor Szedmak; Céline Brouard; Juho Rousu
The two-stage multiple kernel learning (MKL) algorithms gained the popularity due to their simplicity and modularity. In this paper, we focus on two recently proposed two-stage MKL algorithms: ALIGNF and TSMKL. We first show through a simple vectorization of the input and target kernels that ALIGNF corresponds to a non-negative least squares and TSMKL to a non-negative SVM in the transformed space. Then we propose ALIGNF+, a soft version of ALIGNF, based on the observation that the dual problem of ALIGNF is essentially a one-class SVM problem. It turns out that the ALIGNF+ just requires an upper bound on the kernel weights of original ALIGNF. This upper bound makes ALIGNF+ interpolate between ALIGNF and the uniform combination of kernels. Our experiments demonstrate favorable performance and improved robustness of ALIGNF+ comparing to ALIGNF. Experiments data and code written in python are freely available at github (https://github.com/aalto-ics-kepaco/softALIGNF).
Journal of Cheminformatics | 2017
Emma L. Schymanski; Christoph Ruttkies; Martin Krauss; Céline Brouard; Tobias Kind; Kai Dührkop; Felicity Allen; Arpana Vaniya; Dries Verdegem; Sebastian Böcker; Juho Rousu; Huibin Shen; Hiroshi Tsugawa; Tanvir Sajed; Oliver Fiehn; Bart Ghesquière; Steffen Neumann
Journal of Machine Learning Research | 2016
Céline Brouard; Marie Szafranski; Florence d'Alché-Buc
Archive | 2015
Céline Brouard; Florence d'Alché-Buc; Marie Szafranski
asian conference on machine learning | 2017
Céline Brouard; Eric Bach; Sebastian Böcker; Juho Rousu
JOBIM | 2012
Céline Brouard; Marie Szafranski; Florence d'Alché-Buc