B. Van Roy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where B. Van Roy is active.

Explore More

Publication

Featured researches published by B. Van Roy.

IEEE Transactions on Automatic Control | 1997

An analysis of temporal-difference learning with function approximation

John N. Tsitsiklis; B. Van Roy

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counter-examples with regards to the Significance of on-line updating and linearly parameterized function approximators.

IEEE Transactions on Automatic Control | 1999

Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives

John N. Tsitsiklis; B. Van Roy

The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellmans equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space.

conference on decision and control | 1995

Feature-based methods for large scale dynamic programming

John N. Tsitsiklis; B. Van Roy

Summary form only given. We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

IEEE Transactions on Information Theory | 2006

Consensus Propagation

Ciamac C. Moallemi; B. Van Roy

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge

IEEE Transactions on Information Theory | 2008

Capacity of the Trapdoor Channel With Feedback

Haim H. Permuter; Paul Cuff; B. Van Roy; Tsachy Weissman

We establish that the feedback capacity of the trapdoor channel is the logarithm of the golden ratio and provide a simple communication scheme that achieves capacity. As part of the analysis, we formulate a class of dynamic programs that characterize capacities of unifilar finite-state channels. The trapdoor channel is an instance that admits a simple closed-form solution.

IEEE Transactions on Information Theory | 2001

An analysis of belief propagation on the turbo decoding graph with Gaussian densities

Paat Rusmevichientong; B. Van Roy

Motivated by its success in decoding turbo codes, we provide an analysis of the belief propagation algorithm on the turbo decoding graph with Gaussian densities. In this context, we are able to show that, under certain conditions, the algorithm converges and that-somewhat surprisingly-though the density generated by belief propagation may differ significantly from the desired posterior density, the means of these two densities coincide. Since computation of posterior distributions is tractable when densities are Gaussian, use of belief propagation in such a setting may appear unwarranted. Indeed, our primary motivation for studying belief propagation in this context stems from a desire to enhance our understanding of the algorithms dynamics in a non-Gaussian setting, and to gain insights into its excellent performance in turbo codes. Nevertheless, even when the densities are Gaussian, belief propagation may sometimes provide a more efficient alternative to traditional inference methods.

Journal of Optimization Theory and Applications | 2000

On the existence of fixed points for approximate value iteration and temporal-difference learning

Daniela Pucci de Farias; B. Van Roy

Approximate value iteration is a simple algorithm that combats the curse of dimensionality in dynamic programs by approximating iterates of the classical value iteration algorithm in a spirit reminiscent of statistical regression. Each iteration of this algorithm can be viewed as an application of a modified dynamic programming operator to the current iterate. The hope is that the iterates converge to a fixed point of this operator, which will then serve as a useful approximation of the optimal value function. In this paper, we show that, in general, the modified dynamic programming operator need not possess a fixed point; therefore, approximate value iteration should not be expected to converge. We then propose a variant of approximate value iteration for which the associated operator is guaranteed to possess at least one fixed point. This variant is motivated by studies of temporal-difference (TD) learning, and existence of fixed points implies here existence of stationary points for the ordinary differential equation approximated by a version of TD that incorporates exploration.

IEEE Transactions on Information Theory | 2009

Convergence of Min-Sum Message Passing for Quadratic Optimization

Ciamac C. Moallemi; B. Van Roy

We establish the convergence of the min-sum message passing algorithm for minimization of a quadratic objective function given a convex decomposition. Our results also apply to the equivalent problem of the convergence of Gaussian belief propagation.

IEEE Transactions on Information Theory | 2010

Convergence of Min-Sum Message-Passing for Convex Optimization

Ciamac C. Moallemi; B. Van Roy

We establish that the min-sum message-passing algorithm and its asynchronous variants converge for a large class of unconstrained convex optimization problems, generalizing existing results for pairwise quadratic optimization problems. The main sufficient condition is that of scaled diagonal dominance. This condition is similar to known sufficient conditions for asynchronous convergence of other decentralized optimization algorithms, such as coordinate descent and gradient descent.

conference on decision and control | 2003

On constraint sampling in the linear programming approach to approximate linear programming

Daniela Pucci de Farias; B. Van Roy

In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program - the ALP -, which has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m /spl Lt/ M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, under certain idealized conditions, m can be chosen independently of M and need grow only as a polynomial in K.In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program - the ALP -, which has a relatively small number K of variables but an intractable number M of constraints. In this paper, we study a scheme that samples and imposes a subset of m /spl Lt/ M constraints. A natural question that arises in this context is: How must m scale with respect to K and M in order to ensure that the resulting approximation is almost as good as one given by exact solution of the ALP? We show that, under certain idealized conditions, m can be chosen independently of M and need grow only as a polynomial in K.

Explore More