Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Benjamin Van Roy is active.

Publication


Featured researches published by Benjamin Van Roy.


Machine Learning | 1996

Feature-based methods for large scale dynamic programming

John N. Tsitsiklis; Benjamin Van Roy

We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.


Mathematics of Operations Research | 2004

On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming

Daniela Pucci de Farias; Benjamin Van Roy

In the linear programming approach to approximate dynamic programming, one tries to solve a certain linear program--the ALP--that has a relatively small numberK of variables but an intractable numberM of constraints. In this paper, we study a scheme that samples and imposes a subset ofm <


Mathematics of Operations Research | 2014

Learning to Optimize via Posterior Sampling

Daniel Russo; Benjamin Van Roy

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multiarmed bandit problems. The algorithm, also known as Thompson Sampling and as probability matching, offers significant advantages over the popular upper confidence bound (UCB) approach, and can be applied to problems with finite or infinite action spaces and complicated relationships among action rewards. We make two theoretical contributions. The first establishes a connection between posterior sampling and UCB algorithms. This result lets us convert regret bounds developed for UCB algorithms into Bayesian regret bounds for posterior sampling. Our second theoretical contribution is a Bayesian regret bound for posterior sampling that applies broadly and can be specialized to many model classes. This bound depends on a new notion we refer to as the eluder dimension, which measures the degree of dependence among action rewards. Compared to UCB...


workshop on algorithms and models for the web graph | 2004

Making Eigenvector-Based Reputation Systems Robust to Collusion

Hui Zhang; Ashish Goel; Ramesh Govindan; Kahn Mason; Benjamin Van Roy

Eigenvector based methods in general, and Google’s PageRank algorithm for rating web pages in particular, have become an important component of information retrieval on the Web. In this paper, we study the efficacy of, and countermeasures for, collusions designed to improve user rating in such systems.


Operations Research | 2010

Dynamic Pricing with a Prior on Market Response

Vivek F. Farias; Benjamin Van Roy

We study a problem of dynamic pricing faced by a vendor with limited inventory, uncertain about demand, and aiming to maximize expected discounted revenue over an infinite time horizon. The vendor learns from purchase data, so his strategy must take into account the impact of price on both revenue and future observations. We focus on a model in which customers arrive according to a Poisson process of uncertain rate, each with an independent, identically distributed reservation price. Upon arrival, a customer purchases a unit of inventory if and only if his reservation price equals or exceeds the vendors prevailing price. We propose a simple heuristic approach to pricing in this context, which we refer to as decay balancing. Computational results demonstrate that decay balancing offers significant revenue gains over recently studied certainty equivalent and greedy heuristics. We also establish that changes in inventory and uncertainty in the arrival rate bear appropriate directional impacts on decay balancing prices in contrast to these alternatives, and we derive worst-case bounds on performance loss. We extend the three aforementioned heuristics to address a model involving multiple customer segments and stores, and provide experimental results demonstrating similar relative merits in this context.


Operations Research | 2006

A Nonparametric Approach to Multiproduct Pricing

Paat Rusmevichientong; Benjamin Van Roy; Peter W. Glynn

Developed by General Motors (GM), the Auto Choice Advisor website (http://www.autochoiceadvisor.com) recommends vehicles to consumers based on their requirements and budget constraints. Through the website, GM has access to large quantities of data that reflect consumer preferences. Motivated by the availability of such data, we formulate a nonparametric approach to multiproduct pricing. We consider a class of models of consumer purchasing behavior, each of which relates observed data on a consumers requirements and budget constraint to subsequent purchasing tendencies. To price products, we aim at optimizing prices with respect to a sample of consumer data. We offer a bound on the sample size required for the resulting prices to be near-optimal with respect to the true distribution of consumers. The bound exhibits a dependence of O(n log n) on the number n of products being priced, showing thatin terms of sample complexitythe approach is scalable to large numbers of products. With regards to computational complexity, we establish that computing optimal prices with respect to a sample of consumer data is NP-complete in the strong sense. However, when prices are constrained by a price ladderan ordering of prices defined prior to price determinationthe problem becomes one of maximizing a supermodular function with real-valued variables. It is not yet known whether this problem is NP-hard. We provide a heuristic for our price-ladder-constrained problem, together with encouraging computational results. Finally, we apply our approach to a data set from the Auto Choice Advisor website. Our analysis provides insights into the current pricing policy at GM and suggests enhancements that may lead to a more effective pricing strategy.


Operations Research | 2010

Investment and Market Structure in Industries with Congestion

Ramesh Johari; Gabriel Y. Weintraub; Benjamin Van Roy

We analyze investment incentives and market structure under oligopoly competition in industries with congestion effects. Our results are particularly focused on models inspired by modern technology-based services such as telecommunications and computing services. We consider situations where firms compete by simultaneously choosing prices and investments; increasing investment reduces the congestion disutility experienced by consumers. We define a notion of returns to investment, according to which congestion models inspired by delay exhibit increasing returns, whereas loss models exhibit nonincreasing returns. For a broad range of models with nonincreasing returns to investment, we characterize and establish uniqueness of pure-strategy Nash equilibrium. We also provide conditions for existence of pure-strategy Nash equilibrium. We extend our analysis to a model in which firms must additionally decide whether to enter the industry. Our theoretical results contribute to the basic understanding of competition in service industries and yield insight into business and policy considerations.


Mathematics of Operations Research | 2006

Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Benjamin Van Roy

We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporal-difference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective.


Archive | 2002

Neuro-Dynamic Programming: Overview and Recent Trends

Benjamin Van Roy

Neuro-dynamic programming is comprised of algorithms for solving large-scale stochastic control problems. Many ideas underlying these algorithms originated in the field of artificial intelligence and were motivated to some extent by descriptive models of animal behavior. This chapter provides an overview of the history and state-of-the-art in neuro-dynamic programming, as well as a review of recent results involving two classes of algorithms that have been the subject of much recent research activity: temporal-difference learning and actor-critic methods.


Archive | 2006

Tetris: A Study of Randomized Constraint Sampling

Vivek F. Farias; Benjamin Van Roy

Approximate Dynamic Programming is a means of synthesizing nearoptimal policies for large scale stochastic control problems. We examine here the LP approach to approximate Dynamic Programming [98] which requires the solution of a linear program with a tractable number of variables but a potentially large number of constraints. Randomized constraint sampling is one means of dealing with such a program and results from [99] suggest that in fact, such a scheme is capable of producing good solutions to the linear program that arises in the context of approximate Dynamic Programming. We present here a summary of those results, and a case study wherein the technique is used to produce a controller for the game of Tetris. The case study highlights several practical issues concerning the applicability of the constraint sampling approach. We also demonstrate a controller that matches - and in some ways outperforms - controllers produced by other state of the art techniques for large-scale stochastic control.

Collaboration


Dive into the Benjamin Van Roy's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

John N. Tsitsiklis

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniela Pucci de Farias

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paat Rusmevichientong

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge