Doina Precup | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Doina Precup is active.

Explore More

Publication

Featured researches published by Doina Precup.

Artificial Intelligence | 1999

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Richard S. Sutton; Doina Precup; Satinder P. Singh

Learning, planning, and representing knowledge at multiple levels of temporal ab- straction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforce- ment learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options—closed-loop policies for taking ac- tion over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as mus- cle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methods such as Q-learning. Formally, a set of options defined over an MDP constitutes a semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: 1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, 2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and 3) we propose a notion of subgoal that can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that these results can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the macro-utility problem.

IEEE Transactions on Medical Imaging | 2015

The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

Bjoern H. Menze; András Jakab; Stefan Bauer; Jayashree Kalpathy-Cramer; Keyvan Farahani; Justin S. Kirby; Yuliya Burren; Nicole Porz; Johannes Slotboom; Roland Wiest; Levente Lanczi; Elizabeth R. Gerstner; Marc-André Weber; Tal Arbel; Brian B. Avants; Nicholas Ayache; Patricia Buendia; D. Louis Collins; Nicolas Cordier; Jason J. Corso; Antonio Criminisi; Tilak Das; Hervé Delingette; Çağatay Demiralp; Christopher R. Durst; Michel Dojat; Senan Doyle; Joana Festa; Florence Forbes; Ezequiel Geremia

In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients - manually annotated by up to four raters - and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.

international conference on machine learning | 2006

Automatic basis function construction for approximate dynamic programming and reinforcement learning

Philipp W. Keller; Shie Mannor; Doina Precup

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

symposium on abstraction, reformulation and approximation | 2002

Learning Options in Reinforcement Learning

Martin Stolle; Doina Precup

Temporally extended actions (e.g., macro actions) have proven very useful for speeding up learning, ensuring robustness and building prior knowledge into AI systems. The options framework (Precup, 2000; Sutton, Precup & Singh, 1999) provides a natural way of incorporating such actions into reinforcement learning systems, but leaves open the issue of howgood options might be identified. In this paper, we empirically explore a simple approach to creating options. The underlying assumption is that the agent will be asked to perform different goalachievement tasks in an environment that is otherwise the same over time. Our approach is based on the intuition that states that are frequently visited on system trajectories, could prove to be useful subgoals (e.g., McGovern & Barto, 2001; Iba, 1989).We propose a greedy algorithm for identifying subgoals based on state visitation counts. We present empirical studies of this approach in two gridworld navigation tasks. One of the environments we explored contains bottleneck states, and the algorithm indeed finds these states, as expected. The second environment is an empty gridworld with no obstacles. Although the environment does not contain any obvious subgoals, our approach still finds useful options, which essentially allow the agent to explore the environment more quickly.

IEEE Transactions on Biomedical Engineering | 2010

Classification of Normal and Hypoxic Fetuses From Systems Modeling of Intrapartum Cardiotocography

Philip A. Warrick; Emily F. Hamilton; Doina Precup; Robert E. Kearney

Recording of maternal uterine pressure (UP) and fetal heart rate (FHR) during labor and delivery is a procedure referred to as cardiotocography. We modeled this signal pair as an input-output system using a system identification approach to estimate their dynamic relation in terms of an impulse response function. We also modeled FHR baseline with a linear fit and FHR variability unrelated to UP using the power spectral density, computed from an auto-regressive model. Using a perinatal database of normal and pathological cases, we trained suport-vector-machine classifiers with feature sets from these models. We used the classification in a detection process. We obtained the best results with a detector that combined the decisions of classifiers using both feature sets. It detected half of the pathological cases, with very few false positives (7.5%), 1 h and 40 min before delivery. This would leave sufficient time for an appropriate clinical response. These results clearly demonstrate the utility of our method for the early detection of cases needing clinical intervention.

european conference on machine learning | 2005

Active learning in partially observable markov decision processes

Robin Jaulmes; Joelle Pineau; Doina Precup

This paper examines the problem of finding an optimal policy for a Partially Observable Markov Decision Process (POMDP) when the model is not known or is only poorly specified. We propose two approaches to this problem. The first relies on a model of the uncertainty that is added directly into the POMDP planning problem. This has theoretical guarantees, but is impractical when many of the parameters are uncertain. The second, called MEDUSA, incrementally improves the POMDP model using selected queries, while still optimizing reward. Results show good performance of the algorithm even in large problems: the most useful parameters of the model are learned quickly and the agent still accumulates high reward throughout the process.

Archive | 1998

Constructive Function Approximation

Paul E. Utgoff; Doina Precup

The problem of automatically constructing features for use in a learned evaluation function is visited. Issues of feature overlap, independence, and coverage are addressed. Three algorithms are applied to two tasks, measuring the error in the approximated function as learning proceeds. The issues are discussed in the context of their apparent effects on the function approximation process.

Theory in Biosciences | 2012

An information-theoretic approach to curiosity-driven reinforcement learning

Susanne Still; Doina Precup

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner’s predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration–exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration–exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013

Time Series Analysis Using Geometric Template Matching

Jordan Frank; Shie Mannor; Joelle Pineau; Doina Precup

We present a novel framework for analyzing univariate time series data. At the heart of the approach is a versatile algorithm for measuring the similarity of two segments of time series called geometric template matching (GeTeM). First, we use GeTeM to compute a similarity measure for clustering and nearest-neighbor classification. Next, we present a semi-supervised learning algorithm that uses the similarity measure with hierarchical clustering in order to improve classification performance when unlabeled training data are available. Finally, we present a boosting framework called TDEBOOST, which uses an ensemble of GeTeM classifiers. TDEBOOST augments the traditional boosting approach with an additional step in which the features used as inputs to the classifier are adapted at each step to improve the training error. We empirically evaluate the proposed approaches on several datasets, such as accelerometer data collected from wearable sensors and ECG data.

european conference on machine learning | 2011

Activity recognition with mobile phones

Jordan Frank; Shie Mannor; Doina Precup

Our demonstration consists of a working activity and gait recognition system, implemented on a commercial smartphone. The activity recognition feature allows participants to train various activities, such as running, walking, or jumping, on the phone; the system can then identify when those activities are performed. The gait recognition feature learns particular characteristics of how participants walk, allowing the phone to identify the person carrying it.

Explore More