Steven D. Whitehead | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven D. Whitehead is active.

Explore More

Publication

Featured researches published by Steven D. Whitehead.

Machine Learning | 1991

Learning to Perceive and Act by Trial and Error

Steven D. Whitehead; Dana H. Ballard

This article considers adaptive control architectures that integrate active sensory-motor systems with decision systems based on reinforcement learning. One unavoidable consequence of active perception is that the agents internal representation often confounds external world states. We call this phoenomenon Perceptual aliasing and show that it destabilizes existing reinforcement learning algorithms with respect to the optimal decision policy. We then describe a new decision system that overcomes these difficulties for a restricted class of decision problems. The system incorporates a perceptual subcycle within the overall decision cycle and uses a modified learning algorithm to suppress the effects of perceptual aliasing. The result is a control architecture that learns not only how to solve a task but also where to focus its visual attention in order to collect necessary sensory information.

Robot Learning | 1993

Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging

Steven D. Whitehead; Jonas Karlsson; Josh D. Tenenberg

An ability to coordinate the pursuit of multiple, time-varying goals is important to an intelligent robot. In this chapter we consider the application of reinforcement learning to a simple class ofdynamicmulti-goal tasks.Not surprisingly, we find that the most straightforward, monolithic approach scales poorly, since the size of the state space is exponential in the number of goals. As an alternative, we propose a simple modular architecture which distributes the learning and control task amongst a set of separate control modules, one for each goal that the agent might encounter. Learning is facilitated since each module learns the optimal policy associated with its goal without regard for other current goals. This greatly simplifies the state representation and speeds learning time compared to a single monolithic controller. When the robot is faced with a single goal, the module associated with that goal is used to determine the overall control policy. When the robot is faced with multiple goals, information from each associated module is merged to determine the policy for the combined task. In general, these merged strategies yield good but suboptimal performance. Thus, the architecture trades poor initial performance, slow learning, and an optimal asymptotic policy in favor of good initial performance, fast learning, and a slightly sub-optimal asymptotic policy. We consider several merging strategies, from simple ones that compare and combine modular information about the current state only, to more sophisticated strategies that use lookahead search to construct more accurate utility estimates.

international conference on machine learning | 1989

A role for anticipation in reactive systems that learn

Steven D. Whitehead; Dana H. Ballard

Publisher Summary This chapter reviews the role of anticipation in reactive learning systems. Most work in intelligent robotics can be characterized as focusing either on rote behavior or thought behavior. During plan generation, the system reasons about the world to construct a plan that is followed during execution. Reactive systems focus on rote decisions and knowing what to do. Instead of relying on costly symbolic reasoning, reactive systems depend on precompiled knowledge about the way to behave in particular situations. Reactive systems rely on efficiently organized built in knowledge to obtain their performance. Plan-based controllers and reactive controllers represent endpoints on a spectrum, and for the most part, their respective architectures are quite dissimilar. More work is needed to explore the middle ground, and develop models capable of both reactive behavior and reasoning activity. Incremental learning systems in general, and systems using temporal difference methods in particular, are characterized by slow learning rates. Also, these systems typically learn by participating in their environments and receiving reinforcement commensurate with the appropriateness of their behavior.

international symposium on intelligent control | 1990

Advances in reinforcement learning and their implications for intelligent control

Steven D. Whitehead; Richard S. Sutton; Dana H. Ballard

The focus of this work is on control architectures that are based on reinforcement learning. A number of recent advances that have contributed to the viability of reinforcement learning approaches to intelligent control are surveyed. These advances include the formalization of the relationship between reinforcement learning and dynamic programming, the use of internal predictive models to improve learning rate, and the integration of reinforcement learning with active perception. On the basis of these advances and other results, it is concluded that control architectures base on reinforcement learning are now in a position to satisfy many of the criteria associated with intelligent control.<<ETX>>

visual communications and image processing | 1990

Visual Behavior and Intelligent Agents

Randal C. Nelson; Dana H. Ballard; Steven D. Whitehead

Recent robotic models suggest that many complex representational problems in visual perception are actually simplified in systems with behavioral capabilities that permit them to move and interact with the environment. These studies have shown that some complex behaviors can be reduced to a collection of loosely coordinated agents, greatly reducing the complexity of control protocols. These and many other observations emphasize the importance of behavioral context in models of intelligence, and suggest that the natural coordinates for representing information are in terms of behaviors.

Neural networks for perception (Vol. 2) | 1992

Learning visual behaviors

Dana H. Ballard; Steven D. Whitehead

Publisher Summary This chapter explains the structure and function of eye movements in the human visual system. The human eye is distinguished from current electronic cameras as the human eye has better resolution in a small region near the optical axis. This region is termed as the fovea. Over this region, the resolution is better than that in the periphery. One feature of this design is the simultaneous representation of a large field of view and local high acuity. With the small fovea at a premium in a large visual field, the human visual system has special behaviors (saccades) for quickly moving the fovea to different spatial targets. The data on the fovea and saccades explain the dynamics of visual behavior process. Most of the brain structures that represent visual information are retinally indexed. This means that their state is changed with each eye movement.

Philosophical Transactions of the Royal Society B | 1992