András Antos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where András Antos is active.

Explore More

Publication

Featured researches published by András Antos.

Machine Learning | 2008

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

András Antos; Csaba Szepesvári; Rémi Munos

Abstract In this paper we consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input. We study a policy-iteration algorithm where the iterates are obtained via empirical risk minimization with a risk function that penalizes high magnitudes of the Bellman-residual. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. Moreover, we prove that when a linear parameterization is used the new algorithm is equivalent to Least-Squares Policy Iteration. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1999

Lower bounds for Bayes error estimation

András Antos; Luc Devroye; László Györfi

We give a short proof of the following result. Let (X,Y) be any distribution on N/spl times/{0,1}, and let (X/sub 1/,Y/sub 1/),...,(X/sub n/,Y/sub n/) be an i.i.d. sample drawn from this distribution. In discrimination, the Bayes error L*=inf/sub g/P{g(X)/spl ne/Y} is of crucial importance. Here we show that without further conditions on the distribution of (X,Y), no rate-of-convergence results can be obtained. Let /spl phi//sub n/(X/sub 1/,Y/sub 1/,...,X/sub n/,Y/sub n/) be an estimate of the Bayes error, and let {/spl phi//sub n/(.)} be a sequence of such estimates. For any sequence {a/sub n/} of positive numbers converging to zero, a distribution of (X,Y) may be found such that E{|L*-/spl phi//sub n/(X/sub 1/,Y/sub 1/,...,X/sub n/,Y/sub n/)|}/spl ges/a/sub n/ often converges infinitely.

IEEE Transactions on Automatic Control | 2014

Online Markov Decision Processes Under Bandit Feedback

Gergely Neu; András György; Csaba Szepesvári; András Antos

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in hindsight in terms of the total reward received. Specifically, in each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is an algorithm with an expected regret of O(T2/3lnT). In this paper, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of this algorithm (more precisely, a slightly modified version thereof) is O(T1/2lnT), giving the first rigorously proven, essentially tight regret bound for the problem.

Journal of Machine Learning Research | 2003

Data-dependent margin-based generalization bounds for classification

András Antos; Balázs Kégl; Tamás Linder; Gábor Lugosi

We derive new margin-based inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve empirical complexities measured on the training data (such as the empirical fat-shattering dimension) as opposed to their worst-case counterparts traditionally used in such analyses. Also, our bounds appear to be sharper and more general than recent results involving empirical complexity measures. In addition, we develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is easier to compute than the empirical covering number or fat-shattering dimension. We also show examples of efficient computation of the new bounds.

algorithmic learning theory | 2008

Active Learning in Multi-armed Bandits

András Antos; Varun Grover; Csaba Szepesvári

In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as ni¾? 3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.

IEEE Transactions on Information Theory | 2005

Individual convergence rates in empirical vector quantizer design

András Antos; László Györfi; András György

We consider the rate of convergence of the expected distortion redundancy of empirically optimal vector quantizers. Earlier results show that the mean-squared distortion of an empirically optimal quantizer designed from n independent and identically distributed (i.i.d.) source samples converges uniformly to the optimum at a rate of O(1//spl radic/n), and that this rate is sharp in the minimax sense. We prove that for any fixed distribution supported on a given finite set the convergence rate is O(1/n) (faster than the minimax lower bound), where the corresponding constant depends on the source distribution. For more general source distributions we provide conditions implying a little bit worse O(logn/n) rate of convergence. Although these conditions, in general, are hard to verify, we show that sources with continuous densities satisfying certain regularity properties (similar to the ones of Pollard that were used to prove a central limit theorem for the code points of the empirically optimal quantizers) are included in the scope of this result. In particular, scalar distributions with strictly log-concave densities with bounded support (such as the truncated Gaussian distribution) satisfy these conditions.

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning | 2007

Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

András Antos; C. Szepesvarf; Rémi Munos

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian decision problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian decision problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

Journal of Statistical Planning and Inference | 2000

Lower bounds on the rate of convergence of nonparametric regression estimates

András Antos; László Györfi; Michael Kohler

We show that there exist individual lower bounds on the rate of convergence of nonparametric regression estimates, which are arbitrarily close to Stones minimax lower bounds.

international symposium on information theory | 2001

Estimating the entropy of discrete distributions

András Antos; Ioannis Kontoyiannis

Given an i.i.d. sample (X/sub 1/,...,X/sub n/) drawn from an unknown discrete distribution P on a countably infinite set, we consider the problem of estimating the entropy of P. We show that the plug-in estimate is universally consistent and that, without further assumptions, no rate of convergence results can be obtained for any sequence of entropy estimates. Under additional conditions we get convergence rates for the plug-in estimate and for an estimate based on match-lengths. The behavior of the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case.

european conference on computational learning theory | 1999

Lower Bounds on the Rate of Convergence of Nonparametric Pattern Recognition

András Antos

We show that there exist individual lower bounds corresponding to the upper bounds on the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yangs minimax lower bounds, if the a posteriori probability function is in the classes used by Stone and others. The rates equal to the ones on the corresponding regression estimation problem. Thus for these classes classification is not easier than regression estimation either in individual sense.

Explore More