Peggy Cénac
University of Burgundy
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peggy Cénac.
Bernoulli | 2013
Hervé Cardot; Peggy Cénac; Pierre-André Zitt
With the progress of measurement apparatus and the development of automatic sensors it is not unusual anymore to get thousands of samples of observations taking values in high dimension spaces such as functional spaces. In such large samples of high dimensional data, outlying curves may not be uncommon and even a few individuals may corrupt simple statistical indicators such as the mean trajectory. We focus here on the estimation of the geometric median which is a direct generalization of the real median and has nice robustness properties. The geometric median being defined as the minimizer of a simple convex functional that is differentiable everywhere when the distribution has no atoms, it is possible to estimate it with online gradient algorithms. Such algorithms are very fast and can deal with large samples. Furthermore they also can be simply updated when the data arrive sequentially. We state the almost sure consistency and the L2 rates of convergence of the stochastic gradient estimator as well as the asymptotic normality of its averaged version. We get that the asymptotic distribution of the averaged version of the algorithm is the same as the classic estimators which are based on the minimization of the empirical loss function. The performances of our averaged sequential estimator, both in terms of computation speed and accuracy of the estimations, are evaluated with a small simulation study. Our approach is also illustrated on a sample of more 5000 individual television audiences measured every second over a period of 24 hours.
Computational Statistics & Data Analysis | 2012
Hervé Cardot; Peggy Cénac; Jean-Marie Monnez
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. A new class of recursive stochastic gradient algorithms designed for the k-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions which are known to have better performances. A data-driven procedure that permits a fully automatic selection of the value of the descent step is also proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as k-means, trimmed k-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.
Statistics and Risk Modeling | 2012
Peggy Cénac; Véronique Maume-Deschamps; Clémentine Prieur
Abstract We consider some risk indicators of vectorial risk processes. These indicators take into account the dependencies between business lines as well as some temporal dependencies. By using stochastic algorithms, we may estimate the minimum of these risk indicators, under a fixed total capital constraint. This minimization may apply to capital reserve allocation.
Archive | 2012
Peggy Cénac; Brigitte Chauvin; Frédéric Paccaut; Nicolas Pouyanne
Infinite random sequences of letters can be viewed as stochastic chains or as strings produced by a source, in the sense of information theory. The relationship between Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is studied. We establish a probabilistic frame for context trees and VLMC and we prove that any VLMC is a dynamical source for which we explicitly build the mapping. On two examples, the “comb” and the “bamboo blossom”, we find a necessary and sufficient condition for the existence and the uniqueness of a stationary probability measure for the VLMC. These two examples are detailed in order to provide the associated Dirichlet series as well as the generating functions of word occurrences.
Electronic Journal of Statistics | 2012
Hervé Cardot; Peggy Cénac; Pierre-André Zitt
A recursive estimator of the conditional geometric median in Hilbert spaces is studied. It is based on a stochastic gradient algorithm whose aim is to minimize a weighted L1 criterion and is consequently well adapted for robust online estimation. The weights are controlled by a kernel function and an associated bandwidth. Almost sure convergence and L2 rates of convergence are proved under general conditions on the conditional distribution as well as the sequence of descent steps of the algorithm and the sequence of bandwidths. Asymptotic normality is also proved for the averaged version of the algorithm with an optimal rate of convergence. A simulation study confirms the interest of this new and fast algorithm when the sample sizes are large. Finally, the ability of these recursive algorithms to deal with very high-dimensional data is illustrated on the robust estimation of television audience profiles conditional on the total time spent watching television over a period of 24 hours.
Archive | 2010
Hervé Cardot; Peggy Cénac; Mohamed Chaouch
We propose a very simple algorithm in order to estimate the geometric median, also called spatial median, of multivariate (Small (1990)) or functional data (Gervini (2008)) when the sample size is large. A simple and fast iterative approach based on the Robbins-Monro algorithm (Duflo (1997)) as well as its averaged version (Polyak and Juditsky (1992)) are shown to be effective for large samples of high dimension data. They are very fast and only require O(Nd) elementary operations, where N is the sample size and d is the dimension of data. The averaged approach is shown to be more effective and less sensitive to the tuning parameter. The ability of this new estimator to estimate accurately and rapidly (about thirty times faster than the classical estimator) the geometric median is illustrated on a large sample of 18902 electricity consumption curves measured every half an hour during one week.
Annals of Statistics | 2017
Hervé Cardot; Peggy Cénac; Antoine Godichon-Baggioni
Journal of Applied Probability | 2009
Bernard Bercu; Peggy Cénac; Guy Fayolle
Markov Processes and Related Fields | 2012
Peggy Cénac; Brigitte Chauvin; Samuel Herrmann; Pierre Vallois
Archive | 2003
Peggy Cénac; Guy Fayolle; Jean-Marc Lasgouttes