Olivier Collier
ENSAE ParisTech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olivier Collier.
Annals of Statistics | 2017
Olivier Collier; Laëtitia Comminges; Alexandre B. Tsybakov
For the Gaussian sequence model, we obtain non-asymptotic minimax rates of estimation of the linear, quadratic and the L2-norm functionals on classes of sparse vectors and construct optimal estimators that attain these rates. The main object of interest is the class s-sparse vectors for which we also provide completely adaptive estimators (independent of s and of the noise variance) having only logarithmically slower rates than the minimax ones. Furthermore, we obtain the minimax rates on the Lq-balls where 0 < q < 2. This analysis shows that there are, in general, three zones in the rates of convergence that we call the sparse zone, the dense zone and the degenerate zone, while a fourth zone appears for estimation of the quadratic functional. We show that, as opposed to estimation of the vector, the correct logarithmic terms in the optimal rates for the sparse zone scale as log(d/s^2) and not as log(d/s). For the sparse class, the rates of estimation of the linear functional and of the L2-norm have a simple elbow at s = sqrt(d) (boundary between the sparse and the dense zones) and exhibit similar performances, whereas the estimation of the quadratic functional reveals more complex effects and is not possible only on the basis of sparsity described by the sparsity condition on the vector. Finally, we apply our results on estimation of the L2-norm to the problem of testing against sparse alternatives. In particular, we obtain a non-asymptotic analog of the Ingster-Donoho-Jin theory revealing some effects that were not captured by the previous asymptotic analysis.
bioRxiv | 2018
Olivier Collier; Véronique Stoven; Jean-Philippe Vert
Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types. Author summary Cancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.
Statistical Inference for Stochastic Processes | 2018
Olivier Collier; Arnak S. Dalalyan
Assume that we observe a sample of size n composed of p-dimensional signals, each signal having independent entries drawn from a scaled Poisson distribution with an unknown intensity. We are interested in estimating the sum of the n unknown intensity vectors, under the assumption that most of them coincide with a given “background” signal. The number s of p-dimensional signals different from the background signal plays the role of sparsity and the goal is to leverage this sparsity assumption in order to improve the quality of estimation as compared to the naive estimator that computes the sum of the observed signals. We first introduce the group hard thresholding estimator and analyze its mean squared error measured by the squared Euclidean norm. We establish a nonasymptotic upper bound showing that the risk is at most of the order of
Annals of Statistics | 2018
Olivier Collier; Laëtitia Comminges; Alexandre B. Tsybakov; Nicolas Verzelen
Journal of Statistical Planning and Inference | 2015
Olivier Collier; Arnak S. Dalalyan
\sigma ^2(sp+s^2\sqrt{p}\log ^{3/2}(np))
international conference on artificial intelligence and statistics | 2012
Arnak S. Dalalyan; Olivier Collier
Journal of Machine Learning Research | 2016
Olivier Collier; Arnak S. Dalalyan
σ2(sp+s2plog3/2(np)). We then establish lower bounds on the minimax risk over a properly defined class of collections of s-sparse signals. These lower bounds match with the upper bound, up to logarithmic terms, when the dimension p is fixed or of larger order than
Archive | 2018
Olivier Collier; Arnak S. Dalalyan
arXiv: Statistics Theory | 2018
Olivier Collier; Laëtitia Comminges; Alexandre B. Tsybakov
s^2
arXiv: Statistics Theory | 2018
Alexandra Carpentier; Olivier Collier; Laëtitia Comminges; Alexandre B. Tsybakov; Yuhao Wang