Michael L. Littman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael L. Littman is active.

Explore More

Publication

Featured researches published by Michael L. Littman.

Artificial Intelligence | 1998

Planning and acting in partially observable stochastic domains

Leslie Pack Kaelbling; Michael L. Littman; Anthony R. Cassandra

In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable MDPs (pomdps). We then outline a novel algorithm for solving pomdps off line and show how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.

Machine Learning | 2000

Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Satinder P. Singh; Tommi S. Jaakkola; Michael L. Littman; Csaba Szepesvári

An important application of reinforcement learning (RL) is to finite-state control problems and one of the most difficult problems in learning for control is balancing the exploration/exploitation tradeoff. Existing theoretical results for RL give very little guidance on reasonable ways to perform exploration. In this paper, we examine the convergence of single-step on-policy RL algorithms for control. On-policy algorithms cannot separate exploration from learning and therefore must confront the exploration problem directly. We prove convergence results for several related on-policy algorithms with both decaying exploration and persistent exploration. We also provide examples of exploration strategies that can be followed during learning that result in convergence to both optimal values and optimal policies.

Cognitive Systems Research | 2001

Value-function reinforcement learning in Markov games

Michael L. Littman

Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason about the behavior of simultaneous learners in a shared environment.

international conference on machine learning | 2006

PAC model-free reinforcement learning

Alexander L. Strehl; Lihong Li; Eric Wiewiora; John Langford; Michael L. Littman

For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learnings per-experience computation cost is much less than that of previous PAC algorithms.

Archive | 1998

Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing

Michael L. Littman; Susan T. Dumais; Thomas K. Landauer

We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using Latent Semantic Indexing (LSI). We present strong preliminary test results for our cross-language LSI (CL-LSI) method for a French-English collection. We also provide some evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).

Journal of Artificial Intelligence Research | 1998

The computational complexity of probabilistic planning

Michael L. Littman; Judy Goldsmith; Martin Mundhenk

We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, co-NP, PP, NPPP, co-NPPP, and PSPACE. In the process of proving that certain planning problems are complete for NPPP, we introduce a new basic NPPP -complete problem, E-MAJSAT, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for E-MAJSAT could be important for the creation of efficient algorithms for a wide variety of problems.

Machine Learning | 2005

Corpus-based Learning of Analogies and Semantic Relations

Peter D. Turney; Michael L. Littman

We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning “A is to B as C is to D”; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as “laser printer”, according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.

Journal of Computational and Graphical Statistics | 2008

Data Visualization With Multidimensional Scaling

Andreas Buja; Deborah F. Swayne; Michael L. Littman; Nathaniel Dean; Heike Hofmann; Lisha Chen

We discuss methodology for multidimensional scaling (MDS) and its implementation in two software systems, GGvis and XGvis. MDS is a visualization technique for proximity data, that is, data in the form of N × N dissimilarity matrices. MDS constructs maps (“configurations,” “embeddings”) in IRk by interpreting the dissimilarities as distances. Two frequent sources of dissimilarities are high-dimensional data and graphs. When the dissimilarities are distances between high-dimensional objects, MDS acts as a (often nonlinear) dimension-reduction technique. When the dissimilarities are shortest-path distances in a graph, MDS acts as a graph layout technique. MDS has found recent attention in machine learning motivated by image databases (“Isomap”). MDS is also of interest in view of the popularity of “kernelizing” approaches inspired by Support Vector Machines (SVMs; “kernel PCA”). This article discusses the following general topics: (1) the stability and multiplicity of MDS solutions; (2) the analysis of structure within and between subsets of objects with missing value schemes in dissimilarity matrices; (3) gradient descent for optimizing general MDS loss functions (“Strain” and “Stress”); (4) a unification of classical (Strain-based) and distance (Stress-based) MDS. Particular topics include the following: (1) blending of automatic optimization with interactive displacement of configuration points to assist in the search for global optima; (2) forming groups of objects with interactive brushing to create patterned missing values in MDS loss functions; (3) optimizing MDS loss functions for large numbers of objects relative to a small set of anchor points (“external unfolding”); and (4) a non-metric version of classical MDS. We show applications to the mapping of computer usage data, to the dimension reduction of marketing segmentation data, to the layout of mathematical graphs and social networks, and finally to the spatial reconstruction of molecules.

Journal of Automated Reasoning | 2001

Stochastic Boolean Satisfiability

Michael L. Littman; Stephen M. Majercik; Toniann Pitassi

Satisfiability problems and probabilistic models are core topics of artificial intelligence and computer science. This paper looks at the rich intersection between these two areas, opening the door for the use of satisfiability approaches in probabilistic domains. The paper examines a generic stochastic satisfiability problem, SSAT, which can function for probabilistic domains as SAT does for deterministic domains. It shows the connection between SSAT and well-studied problems in belief network inference and planning under uncertainty, and defines algorithms, both systematic and stochastic, for solving SSAT instances. These algorithms are validated on random SSAT formulae generated under the fixed-clause model. In spite of the large complexity gap between SSAT (PSPACE) and SAT (NP), the paper suggests that much of what we have learned about SAT transfers to the probabilistic domain.

international conference on machine learning | 2007

Analyzing feature generation for value-function approximation

Ronald Parr; Christopher Painter-Wakefield; Lihong Li; Michael L. Littman

We analyze a simple, Bellman-error-based approach to generating basis functions for value-function approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems.

Explore More