Aurélien Garivier
Télécom ParisTech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aurélien Garivier.
Annals of Applied Probability | 2011
Randal Douc; Aurélien Garivier; Eric Moulines; Jimmy Olsson
Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models. The aim of this paper is to provide a foundation of particle-based approximation of such distributions and to analyze, in a common unifying framework, different schemes producing such approximations. In this setting, general convergence results, including exponential deviation inequalities and central limit theorems, are established. In particular, time uniform bounds on the marginal smoothing error are obtained under appropriate mixing conditions on the transition kernel of the latent chain. In addition, we propose an algorithm approximating the joint smoothing distribution at a cost that grows only linearly with the number of particles.
Annals of Statistics | 2013
Olivier Cappé; Aurélien Garivier; Odalric-Ambrym Maillard; Rémi Munos; Gilles Stoltz
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.
algorithmic learning theory | 2011
Aurélien Garivier; Eric Moulines
Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions of rewards changes abruptly at unknown time instants. In this paper, we analyze two algorithms designed for solving this issue: discounted UCB (D-UCB) and sliding-window UCB (SW-UCB). We establish an upperbound for the expected regret by upper-bounding the expectation of the number of times suboptimal arms are played. The proof relies on an interesting Hoeffding type inequality for self normalized deviations with a random number of summands. We establish a lower-bound for the regret in presence of abrupt changes in the arms reward distributions. We show that the discounted UCB and the sliding-window UCB both match the lower-bound up to a logarithmic factor. Numerical simulations show that D-UCB and SW-UCB perform significantly better than existing soft-max methods like EXP3.S.
symposium on asynchronous circuits and systems | 2002
Anthony Winstanley; Aurélien Garivier; Mark R. Greenstreet
Events in self-timed rings can propagate evenly spaced or as bursts. By studying these phenomena, we obtain a better understanding of the underlying dynamics of self-timed pipelines, which is a necessary precursor to utilizing these dynamics to obtain higher performance. We show that standard bounded delay models are inadequate to discriminate between bursting and evenly spaced behaviours and show that an extension of the Charlie Diagrams provides a framework for understanding these phenomena. This paper describes our novel analytical approaches and the design and fabrication of a chip to test our theoretical models.
IEEE Transactions on Information Theory | 2006
Aurélien Garivier
The Bayesian information criterion (BIC) and Krichevsky- Trofimov (KT) version of minimum description length (MDL) principle are popular in the study of model selection. For order estimation of Markov chains, both are known to be strongly consistent when there is an upper-bound on the order. In the unbounded case, the BIC is also known to be consistent, but the KT estimator is consistent only with a bound o(log n) on the order. For context trees, a flexible generalization of Markov models widely used in data processing, the problem is more complicated both in theory and practice, given the substantially higher number of possible candidate models. Imre Csiszar and Zsolt Talata proved the consistency of BIC and KT when the hypothetical tree depths are allowed to grow as o(log n). This correspondence proves that such a restriction is not necessary for finite context sources: the BIC context tree estimator is strongly consistent even if there is no constraint at all on the size of the chosen tree. Moreover, an algorithm computing the tree minimizing the BIC criterion among all context trees in linear time is provided
IEEE Transactions on Information Theory | 2009
Stéphane Boucheron; Aurélien Garivier; Elisabeth Gassiat
This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper bounds on minimax regret and lower bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the normalized maximum likelihood (NML) codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp., constant) factors the bounds are matching for source classes defined by algebraically declining (resp., exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend our knowledge concerning universal coding to contexts where the key tools from parametric inference are known to fail.
allerton conference on communication, control, and computing | 2010
Sarah Filippi; Olivier Cappé; Aurélien Garivier
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value iterations under a constraint of consistency with the estimated model transition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recently been shown to guarantee near-optimal regret bounds. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of comparison between the two algorithms based on geometric considerations.
Stochastic Processes and their Applications | 2011
Aurélien Garivier; Florencia Leonardi
We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition probabilities are specific to each source. We consider a class of probabilistic context tree models with three types of contexts: those which appear in one, the other, or both sources. We use a BIC penalized maximum likelihood procedure that jointly estimates the two sources. We propose a new algorithm which efficiently computes the estimated context trees. We prove that the procedure is strongly consistent. We also present a simulation study showing the practical advantage of our procedure over a procedure that works separately on each dataset.
IEEE Transactions on Information Theory | 2006
Aurélien Garivier
The Context Tree Weighting method (CTW) is shown to be almost adaptive on the classes of renewal and Markov renewal processes. Up to logarithmic factor, ctw achieves the minimax pointwise redundancy described by I. Csiszaacuter and P. Shields in IEEE Trans. Inf. Theory, vol. 42, no. 6, pp. 2065-2072, Nov. 1996. This result not only complements previous results on the adaptivity of the Context-Tree Weighting method on the relatively small class of all finite context-tree sources (which encompasses the class of all finite order Markov sources), it shows that almost minimax redundancy can be achieved on massive classes of sources (classes that cannot be smoothly parameterized by subsets of finite-dimensional spaces). Moreover, it shows that (almost) adaptive compression can be achieved in a computationally efficient way on those massive classes. While previous adaptivity results for CTW could rely on the fact that any Markov source is a finite-context-tree source, this is no longer the case for renewal sources. In order to prove almost adaptivity of CTW over renewal sources, it is necessary to establish that CTW carefully balances estimation error and approximation error
Mathematics of Operations Research | 2018
Aurélien Garivier; Pierre Ménard; Gilles Stoltz
We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.