W. Bradley Knox
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by W. Bradley Knox.
International Journal of Social Robotics | 2012
W. Bradley Knox; Brian D. Glass; Bradley C. Love; W. Todd Maddox; Peter Stone
Human beings are a largely untapped source of in-the-loop knowledge and guidance for computational learning agents, including robots. To effectively design agents that leverage available human expertise, we need to understand how people naturally teach. In this paper, we describe two experiments that ask how differing conditions affect a human teacher’s feedback frequency and the computational agent’s learned performance. The first experiment considers the impact of a self-perceived teaching role in contrast to believing one is merely critiquing a recording. The second considers whether a human trainer will give more frequent feedback if the agent acts less greedily (i.e., choosing actions believed to be worse) when the trainer’s recent feedback frequency decreases. From the results of these experiments, we draw three main conclusions that inform the design of agents. More broadly, these two studies stand as early examples of a nascent technique of using agents as highly specifiable social entities in experiments on human behavior.
Frontiers in Psychology | 2012
W. Bradley Knox; A. Ross Otto; Peter Stone; Bradley C. Love
In non-stationary environments, there is a conflict between exploiting currently favored options and gaining information by exploring lesser-known options that in the past have proven less rewarding. Optimal decision-making in such tasks requires considering future states of the environment (i.e., planning) and properly updating beliefs about the state of the environment after observing outcomes associated with choices. Optimal belief-updating is reflective in that beliefs can change without directly observing environmental change. For example, after 10 s elapse, one might correctly believe that a traffic light last observed to be red is now more likely to be green. To understand human decision-making when rewards associated with choice options change over time, we develop a variant of the classic “bandit” task that is both rich enough to encompass relevant phenomena and sufficiently tractable to allow for ideal actor analysis of sequential choice behavior. We evaluate whether people update beliefs about the state of environment in a reflexive (i.e., only in response to observed changes in reward structure) or reflective manner. In contrast to purely “random” accounts of exploratory behavior, model-based analyses of the subjects’ choices and latencies indicate that people are reflective belief updaters. However, unlike the Ideal Actor model, our analyses indicate that people’s choice behavior does not reflect consideration of future environmental states. Thus, although people update beliefs in a reflective manner consistent with the Ideal Actor, they do not engage in optimal long-term planning, but instead myopically choose the option on every trial that is believed to have the highest immediate payoff.
Frontiers in Psychology | 2013
Jin Joo Lee; W. Bradley Knox; Jolie B. Wormwood; Cynthia Breazeal; David DeSteno
We present a computational model capable of predicting—above human accuracy—the degree of trust a person has toward their novel partner by observing the trust-related nonverbal cues expressed in their social interaction. We summarize our prior work, in which we identify nonverbal cues that signal untrustworthy behavior and also demonstrate the human minds readiness to interpret those cues to assess the trustworthiness of a social robot. We demonstrate that domain knowledge gained from our prior work using human-subjects experiments, when incorporated into the feature engineering process, permits a computational model to outperform both human predictions and a baseline model built in naiveté of this domain knowledge. We then present the construction of hidden Markov models to investigate temporal relationships among the trust-related nonverbal cues. By interpreting the resulting learned structure, we observe that models built to emulate different levels of trust exhibit different sequences of nonverbal cues. From this observation, we derived sequence-based temporal features that further improve the accuracy of our computational model. Our multi-step research process presented in this paper combines the strength of experimental manipulation and machine learning to not only design a computational trust model but also to further our understanding of the dynamics of interpersonal trust.
robot and human interactive communication | 2012
W. Bradley Knox; Peter Stone
Several studies have demonstrated that teaching agents by human-generated reward can be a powerful technique. However, the algorithmic space for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward in goal-based, episodic tasks, we investigate how anticipated future rewards should be discounted to create behavior that performs well on the task that the human trainer intends to teach. We identify a “positive circuits” problem with low discounting (i.e., high discount factors) that arises from an observed bias among humans towards giving positive reward. Empirical analyses indicate that high discounting (i.e., low discount factors) of human reward is necessary in goal-based, episodic tasks and lend credence to the existence of the positive circuits problem.
Cognitive, Affective, & Behavioral Neuroscience | 2014
A. Ross Otto; W. Bradley Knox; Arthur B. Markman; Bradley C. Love
Physiological arousal, a marker of emotional response, has been demonstrated to accompany human decision making under uncertainty. Anticipatory emotions have been portrayed as basic and rapid evaluations of chosen actions. Instead, could these arousal signals stem from a “cognitive” assessment of value that utilizes the full environment structure, as opposed to merely signaling a coarse, reflexive assessment of the possible consequences of choices? Combining an exploration–exploitation task, computational modeling, and skin conductance measurements, we find that physiological arousal manifests a reflective assessment of the benefit of the chosen action, mirroring observed behavior. Consistent with the level of computational sophistication evident in these signals, a follow-up experiment demonstrates that anticipatory arousal is modulated by current environment volatility, in accordance with the predictions of our computational account. Finally, we examine the cognitive costs of the exploratory choice behavior these arousal signals accompany by manipulating concurrent cognitive demand. Taken together, these results demonstrate that the arousal that accompanies choice under uncertainty arises from a more reflective and “cognitive” assessment of the chosen action’s consequences than has been revealed previously.
intelligent user interfaces | 2013
Saleema Amershi; Maya Cakmak; W. Bradley Knox; Todd Kulesza; Tessa Lau
Many applications of Machine Learning (ML) involve interactions with humans. Humans may provide input to a learning algorithm (in the form of labels, demonstrations, corrections, rankings or evaluations) while observing its outputs (in the form of feedback, predictions or executions). Although humans are an integral part of the learning process, traditional ML systems used in these applications are agnostic to the fact that inputs/outputs are from/for humans. However, a growing community of researchers at the intersection of ML and human-computer interaction are making interaction with humans a central part of developing ML systems. These efforts include applying interaction design principles to ML systems, using human-subject testing to evaluate ML systems and inspire new methods, and changing the input and output channels of ML systems to better leverage human capabilities. With this Interactive Machine Learning (IML) workshop at IUI 2013 we aim to bring this community together to share ideas, get up-to-date on recent advances, progress towards a common framework and terminology for the field, and discuss the open questions and challenges of IML.
international conference on robotics and automation | 2008
W. Bradley Knox; Juhyun Lee; Peter Stone
This video shows a Segway robot from the University of Texas at Austin competing in the finals of the 2007 Robocup @Home competition, which featured home assistant robots performing various challenging tasks. This demonstration combines a few tasks which will likely be performed by a future home assistant robot. The robot learns a humans appearance, follows the human with his back turned, distinguishes the human from a similarly clothed stranger, and adapts when it notices that the human has changed his clothing. For this task, we introduce a novel two-classifier architecture, using the subjects face as a primary identifying characteristic and his shirt as a secondary characteristic.
Autonomous Agents and Multi-Agent Systems | 2018
Guangliang Li; Shimon Whiteson; W. Bradley Knox; Hayley Hung
Learning from rewards generated by a human trainer observing an agent in action has been proven to be a powerful method for teaching autonomous agents to perform challenging tasks, especially for those non-technical users. Since the efficacy of this approach depends critically on the reward the trainer provides, we consider how the interaction between the trainer and the agent should be designed so as to increase the efficiency of the training process. This article investigates the influence of the agent’s socio-competitive feedback on the human trainer’s training behavior and the agent’s learning. The results of our user study with 85 participants suggest that the agent’s passive socio-competitive feedback—showing performance and score of agents trained by trainers in a leaderboard—substantially increases the engagement of the participants in the game task and improves the agents’ performance, even though the participants do not directly play the game but instead train the agent to do so. Moreover, making this feedback active—sending the trainer her agent’s performance relative to others—further induces more participants to train agents longer and improves the agent’s learning. Our further analysis shows that agents trained by trainers affected by both the passive and active social feedback could obtain a higher performance under a score mechanism that could be optimized from the trainer’s perspective and the agent’s additional active social feedback can keep participants to further train agents to learn policies that can obtain a higher performance under such a score mechanism.
international conference on knowledge capture | 2009
W. Bradley Knox; Peter Stone
adaptive agents and multi agents systems | 2010
W. Bradley Knox; Peter Stone