B. Van Roy
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by B. Van Roy.
IEEE Transactions on Automatic Control | 1997
John N. Tsitsiklis; B. Van Roy
We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counter-examples with regards to the Significance of on-line updating and linearly parameterized function approximators.
IEEE Transactions on Automatic Control | 1999
John N. Tsitsiklis; B. Van Roy
The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellmans equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space.
conference on decision and control | 1995
John N. Tsitsiklis; B. Van Roy
Summary form only given. We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.
IEEE Transactions on Information Theory | 2006
Ciamac C. Moallemi; B. Van Roy
We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge
IEEE Transactions on Information Theory | 2008
Haim H. Permuter; Paul Cuff; B. Van Roy; Tsachy Weissman
We establish that the feedback capacity of the trapdoor channel is the logarithm of the golden ratio and provide a simple communication scheme that achieves capacity. As part of the analysis, we formulate a class of dynamic programs that characterize capacities of unifilar finite-state channels. The trapdoor channel is an instance that admits a simple closed-form solution.
IEEE Transactions on Information Theory | 2001
Paat Rusmevichientong; B. Van Roy
Motivated by its success in decoding turbo codes, we provide an analysis of the belief propagation algorithm on the turbo decoding graph with Gaussian densities. In this context, we are able to show that, under certain conditions, the algorithm converges and that-somewhat surprisingly-though the density generated by belief propagation may differ significantly from the desired posterior density, the means of these two densities coincide. Since computation of posterior distributions is tractable when densities are Gaussian, use of belief propagation in such a setting may appear unwarranted. Indeed, our primary motivation for studying belief propagation in this context stems from a desire to enhance our understanding of the algorithms dynamics in a non-Gaussian setting, and to gain insights into its excellent performance in turbo codes. Nevertheless, even when the densities are Gaussian, belief propagation may sometimes provide a more efficient alternative to traditional inference methods.
Journal of Optimization Theory and Applications | 2000
Daniela Pucci de Farias; B. Van Roy
Approximate value iteration is a simple algorithm that combats the curse of dimensionality in dynamic programs by approximating iterates of the classical value iteration algorithm in a spirit reminiscent of statistical regression. Each iteration of this algorithm can be viewed as an application of a modified dynamic programming operator to the current iterate. The hope is that the iterates converge to a fixed point of this operator, which will then serve as a useful approximation of the optimal value function. In this paper, we show that, in general, the modified dynamic programming operator need not possess a fixed point; therefore, approximate value iteration should not be expected to converge. We then propose a variant of approximate value iteration for which the associated operator is guaranteed to possess at least one fixed point. This variant is motivated by studies of temporal-difference (TD) learning, and existence of fixed points implies here existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates exploration.
IEEE Transactions on Information Theory | 2009
Ciamac C. Moallemi; B. Van Roy
We establish the convergence of the min-sum message passing algorithm for minimization of a quadratic objective function given a convex decomposition. Our results also apply to the equivalent problem of the convergence of Gaussian belief propagation.
IEEE Transactions on Information Theory | 2010
Ciamac C. Moallemi; B. Van Roy
We establish that the min-sum message-passing algorithm and its asynchronous variants converge for a large class of unconstrained convex optimization problems, generalizing existing results for pairwise quadratic optimization problems. The main sufficient condition is that of scaled diagonal dominance. This condition is similar to known sufficient conditions for asynchronous convergence of other decentralized optimization algorithms, such as coordinate descent and gradient descent.
conference on decision and control | 2003
Daniela Pucci de Farias; B. Van Roy
In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program - the ALP -, which has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m /spl Lt/ M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, under certain idealized conditions, m can be chosen independently of M and need grow only as a polynomial in K.In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program - the ALP -, which has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m /spl Lt/ M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, under certain idealized conditions, m can be chosen independently of M and need grow only as a polynomial in K.