[PDF] A Review on Learning Planning Action Models for Socio-Communicative HRI

Abstract

For social robots to be brought more into widespread use in the fields of companionship, care taking and domestic help, they must be capable of demonstrating social intelligence. In order to be acceptable, they must exhibit socio-communicative skills. Classic approaches to program HRI from observed human-human interactions fails to capture the subtlety of multimodal interactions as well as the key structural differences between robots and humans. The former arises due to a difficulty in quantifying and coding multimodal behaviours, while the latter due to a difference of the degrees of liberty between a robot and a human. However, the notion of reverse engineering from multimodal HRI traces to learn the underlying behavioral blueprint of the robot given multimodal traces seems an option worth exploring. With this spirit, the entire HRI can be seen as a sequence of exchanges of speech acts between the robot and human, each act treated as an action, bearing in mind that the entire sequence is goal-driven. Thus, this entire interaction can be treated as a sequence of actions propelling the interaction from its initial to goal state, also known as a plan in the domain of AI planning. In the same domain, this action sequence that stems from plan execution can be represented as a trace. AI techniques, such as machine learning, can be used to learn behavioral models (also known as symbolic action models in AI), intended to be reusable for AI planning, from the aforementioned multimodal traces. This article reviews recent machine learning techniques for learning planning action models which can be applied to the field of HRI with the intent of rendering robots as socio-communicative.

Full PDF

AA Review on Learning Planning Action Modelsfor Socio-Communicative HRI

Ankuj Arora, Humbert Fiorino, Damien Pellier and Sylvie PestyLaboratoire LIG, Universit´e Grenoble-Alpes,Grenoble, FranceJune 2016

Abstract

For social robots to be brought more into widespread use in the ﬁeldsof companionship, care taking and domestic help, they must be capableof demonstrating social intelligence. In order to be acceptable, they mustexhibit socio-communicative skills. Classic approaches to program HRIfrom observed human-human interactions fails to capture the subtlety ofmultimodal interactions as well as the key structural diﬀerences betweenrobots and humans. The former arises due to a diﬃculty in quantifyingand coding multimodal behaviours, while the latter due to a diﬀerence ofthe degrees of liberty between a robot and a human. However, the notionof reverse engineering from multimodal HRI traces to learn the underly-ing behavioral blueprint of the robot given multimodal traces seems anoption worth exploring. With this spirit, the entire HRI can be seen asa sequence of exchanges of speech acts between the robot and human,each act treated as an action, bearing in mind that the entire sequence isgoal-driven. Thus, this entire interaction can be treated as a sequence ofactions propelling the interaction from its initial to goal state, also knownas a plan in the domain of AI planning. In the same domain, this actionsequence that stems from plan execution can be represented as a trace.AI techniques, such as machine learning, can be used to learn behav-ioral models (also known as symbolic action models in AI), intended tobe reusable for AI planning, from the aforementioned multimodal traces.This article reviews recent machine learning techniques for learning plan-ning action models which can be applied to the ﬁeld of HRI with theintent of rendering robots as socio-communicative.

With the near simultaneous advances in mechatronics on the engineering sideand ergonomics on the human factors side, the ﬁeld of social robotics has seen1 a r X i v : . [ c s . A I] O c t signiﬁcant spike in interest in the recent years. Driven with the objective ofrendering robots as socio-communicative, there has been an equally heightenedinterest towards researching techniques to endow robots with cognitive, emo-tional and social skills. The strategy to do so draws inspiration from study ofhuman behaviors. For robots, social and emotive qualities not only lubricatethe interface between humans and robots, but also promote learning, decisionmaking and so on. These qualities strengthen the possibility of acceptabilityand emotional attachment to the robot [5, 6]. This acceptance is only likely ifthe robot fulﬁls a fundamental expectation that one being has of the other: notonly to do the right thing, but also at the right time and in the right manner[5]. This social intelligence or ’commonsense’ of the robot is what eventuallydetermines its social acceptability in the long run.Commonsense, however, is not that common. Robots can, thus, only learn tobe acceptable with experience. However, teaching a humanoid the subtleties ofa social interaction is not evident. Even a standard dialogue exchange integratesthe widest possible panel of signs which intervene in the communication and arediﬃcult to codify (synchronization between the expression of the body, the face,the tone of the voice, etc.). In such a scenario, learning the behavioral model ofthe robot is a promising approach.Thus, another way of solving this problem is given a set of HRI traces, tolearn the interaction script or the behavioral model of the robot which governsthis interaction. This learning can be conducted with the help of some fairlyrecent and other ongoing advances in the ﬁeld of AI. In the ﬁeld of AI planningfor instance, this entire interaction can be viewed as a series of actions (whereeach speech act is treated as an action) which take the system from the initial tothe goal state, the goal being the successful termination of the interaction [18].In the literature, AI (or Automated) planning has been used in HRI for robotaction planning and reasoning [1] . There has been little work done in using AIplanning approaches to empower robots with socio-communicative abilities.In such domains, planners now leverage recent advancements in machinelearning (ML) to recreate the blueprint of the actions applicable to the domain,but cannot be easily identiﬁed or programmed, alongwith their signatures, pre-conditions and eﬀects. Several classical and recent ML techniques can be lever-aged to reproduce the underlying behavioral model. This model, once learnt,can further render the humanoid as autonomous and pioneer future HRI inter-actions. This is a conscious and directed eﬀort to decrease laborious manualcoding and increase quality. This article brieﬂy reviews recent machine learningtechniques for learning planning action models for the ﬁeld of HRI.This article is organized as follows: we start by brieﬂy introducing andexplaining the interplay between Automated Planning (AP), Machine Learning(ML) and HRI to solve a common problem. We then represent a classiﬁcationof various techniques of learning action models, citing examples of each. Theseexamples are then detailed in the following section. We then brieﬂy discussesthe persisting issues in the ﬁeld despite the advances, and terminate with aconclusion. 2 Problem Statement

Consider the case where Automated Planning (AP) were to be used to constructthe core behavioral model of a robot to govern its multimodal interactions withhumans. It is very diﬃcult, if not impossible, to ﬁne tune such a model by a do-main expert to inculcate and account for subtle human behaviours which ariseand interplay even in the simplest dialog exchange. However, using ML tech-niques, it is possible to learn this model from an actual HRI. As demonstratedin the ﬁgure 1, the HRI can be viewed as a planning problem: in an initial run ofthe interaction, the robot speech acts are governed by observation, imitation ordemonstration techniques [2, 21]. One particular approach that seems promis-ing is that of ’beaming’ by human pilots [3]. Thanks to this technique, a humanoperator can perceive, analyze, act and interact with a remote person throughrobotic embodiment. A human operator will solve both the scaling and socialproblems by optimally exploiting the robots aﬀordances. The following speechact exchange sequence between the robot and human is treated as a sequenceof actions which constitutes the trace set (also called the execution experience).This speech act exchange is what drives the interaction from its initial to goalstate. An initial state or a goal is composed of a set of predicates. A predicateis a set of constant symbols or variable symbols. It is a set of constant symbolsin case it is grounded, in which case it evaluates to true or false. Thus, a singletrace is constituted of: initial state predicates, speech act sequence, and theﬁnal state predicates.These traces are then fed to the learner (see ﬁgure 2), whose role is to learnthe behavioral (action) model m that serves as the ’blueprint’ of the actions. Anaction model is deﬁned by ( a, P re, Add, Del ), where a is an action name withzero or more variables (as parameters), P re , Add and

Del being the preconditionlist, add list and delete list, respectively. The precondition list is the set ofconditions (eventually a conjunction of predicates) which need to be satisﬁedfor the action to be triggered in a particular state. The add and delete lists arethe set of grounded predicates which will be added or deleted respectively fromthe current state, upon the application of the action to the current state. Thisaction application then produces the next state, which upon successive actionapplications leads to the goal state. In the ﬁeld of AI planning, the modelis represented in a standard language called the Planning Domain DeﬁnitionLanguage (PDDL). It has been the oﬃcial language for the representation ofthe problems and solutions in all of the International Planning Competitions(IPC) which have been held 1998 onwards [14].A sample action called ’inform’ in the PDDL language is represented inthe ﬁgure 3. The objective of this action is for the agent (in this case therobot) to inform a human about the presence of a telephone in the room. Thepreconditions for this action: both the robot and human are in the room, therobot believes in the presence of a telephone in the room, and that the robothas seen the human; are represented in the form of predicates. As an eﬀectof this action, the robot believes that the human believes in the presence of atelephone in the room, signifying its successful execution of the action ’inform’.3igure 1: Step 1 - Initial Run of HRI experiment by beaming [3]Figure 2: Step 2 - Learning the behavioral model from collected tracesThe challenge lies scripting this action model with more complex speech acts,relying solely on the expertise of a domain expert who also in his own right, islikely to commit errors while scripting.The eﬀort required by the domain expert to script this subtle and delicatehumanoid behavioral model can be diminished by ML. The work done in MLgoes hand in hand with the long history of planning, as ML is viewed as apotentially powerful means of endowing an agent with greater autonomy andﬂexibility. It often compensates for the designer’s incomplete knowledge of theworld that the agent will face. The developed ML techniques could be appliedacross a wide variety of domains to speed up planning. This is done by learningthe underlying behavioral model from the experience accumulated during theplanning and execution phases (refers to speech act exchanges). These employedlearning techniques vary widely in terms of context of application, technique ofapplication, adopted learning methodology and information learned.Once the model m is learnt, it is fed to a planner (see ﬁgure 4) along withan initial state s g . Together, all three constitute a planningproblem which is deﬁned by ( s , g, m ). A solution to a planning problem is aplan composed of an action sequence ( a , a , an ), where the actions guide thetransition of the system from the initial to the goal state [28]. Thus, the learntmodel can be re-usable to plan future dialogue sequences between the robotand the human, in such a way that the need of a ’teacher’ to govern the robotbehavior is suppressed, and the robot can interact autonomously.4igure 3: Domain Description and Schema for Operator ’inform’ in an HRIdomain Figure 4: Step 3 - Autonomous re-run of HRIIn summary, Machine Learning (ML) is increasingly being used to resolvethe aforementioned planning problem. This article tries to classify various ap-proaches based on several criterion. The techniques for learning planning action models can be classiﬁed as depictedin ﬁgure 5.

The determination of the current state of the system after the action executionmay be ﬂawed because of a faulty sensor calibration. Thus, in the case ofpartial observability of a system, it may be assumed to be in one of a set of’belief states’.Similarly action eﬀects may be probabilistic, which means that in a realworld scenario, it is not necessary that a unitary action be applicable to a state.5igure 5: Learning Planning Action Models (D=Deterministic, P=Probabilistic,FO=Fully Observable, PO=Partially Observable)On the contrary, multiple actions may be applied, each with a diﬀerent executionprobability.Keeping these variations of action eﬀects and state observability in mind, wedeﬁne four categories of implementations: • Deterministic eﬀects, full state observability: For example, the EXPO[23, 12, 9] system. • Deterministic eﬀects, partial state observability: In this family, the systemmay be in one of a set of ’belief states’ after the execution of each action.For example, ARMS (Action-Relation Modelling System) [25]. • Probabilistic eﬀects, full state observability: For example, PELA (Plan-ning, Execution and Learning Architecture)[11]. • Probabilistic Eﬀects, Partial State Observability: Barring a few initialworks in this area, this classiﬁcation remains as the most understudiedone to date, with no general approach in sight either (for example, Yoonet al. [26]).

This section introduces some classic as well as recently prominent learning tech-niques that have been successfully used in learning action models. The following6ubsections have not been conceptualized as learning families, but as orthogonal(and sometimes overlapping) techniques. • Inductive learning: The learning system is confronted with a hypothe-sis space H and a set of training examples D . The desired output isa hypothesis h from H that is consistent with these training examples.Inductive methods generate statistically justiﬁed hypotheses [32]. Theheart of the learning problem is generalizing successfully from examples.In these cases, inductive techniques that can identify patterns over manyexamples in the absence of a domain model can come in handy. Oneprominent inductive learning technique is that of decision tree and regres-sion tree learning. Regression trees oﬀer the advantages of being able topredict a continuous variable and the ability to model noise in the data.A regression tree predicts a value along the dependent dimension for allenvironmental observations, in contrast to a decision tree, which enablesa prediction along a categorical variable (i.e., class). • Analytic learning: The learning system is confronted with the same hy-pothesis space and training examples as for inductive learning. However,the learner has an additional input: background knowledge B that canexplain observed training examples. The desired output is a hypothesis h from H that is consistent with both the training examples D and thebackground knowledge B [32]. Analytic learning leans on the learner’sbackground knowledge to analyze a given training instance to identify therelevant features.More details about classical techniques which have been comprehensively usedin operator learning can be found in [32]. The current article sheds light oncertain interesting techniques which have more recently come to light and oﬀerinteresting possibilities with respect to the task at hand, which is that of learningoperators. Many machine learning methods work well only under a common assumption:the training and test data are drawn from the same feature space and the samedistribution. When the distribution changes, most statistical models need tobe rebuilt from scratch using newly collected training data. In many real worldapplications, it is expensive or impossible to re-collect the needed training dataand rebuild the models. It would be nice to reduce the need and eﬀort to re-collect the training data. In such cases, knowledge transfer or transfer learningbetween task domains would be desirable. Transfer learning [16], in contrast,allows the domains, tasks, and distributions used in training and testing to bediﬀerent. In the real world, we observe many examples of transfer learning. Forexample, we may ﬁnd that learning to recognize apples might help to recognize7ranges. Transfer learning aims to extract the knowledge from one or moresource tasks and applies the knowledge to a target task when the latter hasfewer high-quality training data [16].The advantages of using transfer learning are centered around the fact thata change of features, domains, tasks, and distributions from the training tothe testing phase does not require the statistical model to be rebuilt. Thedisadvantages, however, are listed as follows: • Many proposed transfer learning algorithms assume that the source andtarget domains are related to each other in some sense. However, if theassumption does not hold, negative transfer may happen, which is worsethan no transfer at all (for example, an American tourist learning to driveon the left side of the road in the UK for the ﬁrst time). In order to avoidnegative transfer learning, we need to ﬁrst study transferability betweensource domains and target domains. Based on suitable transferabilitymeasures, we can then select relevant source domains/tasks to extractknowledge for learning the target tasks. • Most existing transfer learning algorithms so far have focused on improv-ing generalization across diﬀerent distributions between source and targetdomains or tasks. In doing so, they assumed that the feature spaces be-tween the source and target domains are the same. However, in manyapplications, we may wish to transfer knowledge across domains or tasksthat have diﬀerent feature spaces, and transfer from multiple such sourcedomains. This type of transfer learning is referred to as heterogeneoustransfer learning, which is a persisting challenge. • Has mainly been applied to small scale applications [16].One particular implementation is an algorithm called LAWS (Learn Ac-tion models with transferring knowledge from a related source domain via WebSearch) [31].

Reinforcement learning (RL) [4] is a speciﬁc case of inductive learning, and de-ﬁned more clearly by characterizing a learning problem instead of a learningtechnique. A general reinforcement learning problem can be seen as composedof just three elements: (1) goals an agent must achieve, (2) an observable envi-ronment, and (3) actions an agent can take to aﬀect the environment. Throughtrial-and-error online visitation of states in its environment, such a reinforce-ment learning system seeks to ﬁnd an optimal policy for achieving the problemgoals. The strength of reinforcement learning lies in its ability to handle stochas-tic environments in which the domain theory is either unknown or incomplete.With respect to the planning-learning goal dimension, reinforcement learningcan be viewed as both ’improving plan quality’ (the process moves toward theoptimal policy) and ’learning the domain theory’ (begins without a model of8ransition probability between states) [32]. However, one of the major draw-backs of RL stems from the fact that in its bid to achieve particular goals, itcannot gather general knowledge of the system dynamics, leading to a problemof generalization. RL is particularly interesting for robotics, for this approachoften involves learning to achieve particular goals, without gathering any gen-eral knowledge of the world dynamics. As a result, the robots can learn todo particular tasks without having trouble generalizing to new ones [17]. Forexample, LOPE (Learning by Observation in Planning Environments) [8].

In Surprise-Based Learning (SBL) [20] a surprise is produced if the latest pre-diction is noticeably diﬀerent from the latest observation. After performing anaction, the world is sensed via the perceptor module which extracts feature in-formation from one or more sensors. If the algorithm had made a prediction,the surprise analyzer will validate it. If the prediction was incorrect, the modelmodiﬁer will adjust the world model accordingly. Based on the updated modelthe action selector will perform the next action so as to repeat the learning cycle(see ﬁgure 6). Figure 6: Overview of surprise based learning [20]A series of approaches based on SBL have used Goal Driven Autonomy(GDA). GDA is a conceptual model for creating an autonomous agent thatmonitors a set of expectations during plan execution, detects when discrepan-cies occur, builds explanations for the cause of failures, and formulates newgoals to pursue when planning failures arise. In order to identify when planningfailures occur, a GDA agent requires the planning component to generate anexpectation of world state after executing each action in the execution environ-ment. The GDA model thus provides a framework for creating agents capable ofresponding to unanticipated failures during plan execution in complex, dynamicenvironment [24]. For example, the system FOOLMETWICE [15].9 .3 Quality of traces

The execution traces may be classiﬁed into pure or adulterated as follows: • Noisy: The traces can be adulterated because of sensor miscalibrationor faulty annotation by a domain expert. For instance, AMAN (Action-Model Acquisition from Noisy plan traces) [28] belongs to this category. • Ideal: There is no discrepancy between the ideal action and the recordedaction. For example, the system OBSERVER [23] falls into this category.

This refers to the elements (state information, action information) which con-stitute the traces. These can be divided into the following: • Action Sequences: Refers to the case where the executed plan traces canbe represented in the form of a sequence of action executions. For example,Opmaker [13]. • State-Action Interleavings: The case where the executed plan traces canbe represented in the form of a sequence of alternate state-action represen-tations. For example, LAMP (Learning Action Models from Plan traces)proposed by [30].

Before the learning phase begins, the action model may exist in one of thefollowing capacities: • No Model: This refers to the fact that the no information on the actionsthat constitute the model is available in the beginning, and the entiremodel must be learnt from scratch. For example, OBSERVER [23]. • Partial Model: Some elements of the model are available to the learner inthe beginning, and the model is enriched with more knowledge at the endof the learning phase. For example, RIM (Reﬁning Incomplete planningdomain Models through plan traces) [29].

The ideal language would be able to compactly model every action eﬀect theagent might encounter, and no others. Choosing a good representation languageprovides a strong bias for any algorithm that will learn models in that language.Some languages and their features are summarized in the table 1.10able 1: Representation LanguagesLanguage Features ReferencePDDL(Planning DomainDescription Language) Machine-readable,standardized syntaxfor representing STRIPSand other languages.Has types, constants,predicates and actions [14]STRIPS(Stanford ResearchInstitute ProblemSolver) Sublanguage of PDDL [7]OCL(Object CenteredLanguage) High level languagewith representation centeredaround objects insteadof states [13]STRIPS+WS STRIPS + functional terms,leading tohigher expressiveness [22]

Brief descriptions of some key algorithms corresponding to the aforementionedclassiﬁcation can be found in the following section. These algorithms are alsosummarized in the table 2.

OBSERVER [23] is a system that learns operator preconditions by creating andupdating both most speciﬁc representation and a most general representation forthe preconditions, based on operator executions while solving practice problems.It also learns operator eﬀects by generalizing the delta-state (the diﬀerencebetween post-state and pre-state) from multiple observations.

RIM (Reﬁning Incomplete planning domain Models through plan traces) [29]constructs sets of soft and hard constraints which are solved using a weightedMAX-SAT solver to obtain sets of macro-operators and (reﬁned) action models.

Opmaker [13] is a mixed initiative (where both the human and the machine takeinitiative), graphical knowledge acquisition tool for inducing parametrized, hier-archical (each object may have relations and attributes inherited from diﬀerent11evels [13]) operator descriptions from example action sequences and declarativedomain knowledge, with the minimum of user interaction. It is implementedinside of a graphic tool called GIPO (Graphical Interface for Planning withObjects), which facilitates domain knowledge capture and domain modelling([13, 10]), a perfect tool for novice users to create plans and learn models withminimum eﬀort.

LAMP (Learning Action Models from Plan traces) [30] learn action models withquantiﬁers and logical implications. Firstly, the input plan traces (includingobserved states and actions) are encoded into propositional formulas, which isa conjunction of ground literals to store into a database as a collection of facts.Secondly, candidate formulas are generated according to the predicate lists anddomain constraints. Thirdly, a Markov Logic Network (MLN) uses the formulasgenerated in the above two steps to select the most likely subset from the setof candidate formulas. Finally, this subset is converted into the ﬁnal actionmodels.

AMAN (Action-Model Acquisition from Noisy plan traces) [28] ﬁnds a domainmodel that best explains the observed noisy plan traces. First, a set of candidatedomain models is built by scanning each action in plan traces and substituting itsinstantiated parameters with their corresponding variables. This is followed bya graphical model to capture the relationship between the current state, correctaction, observed action and the domain model. Afterwards the parameters ofthe graphical model are learnt, following which AMAN generates a set of actionmodels according to the learnt parameters.

The EXPO ([23, 12, 9]) system reﬁnes incomplete planning operators by ORM(operator reﬁnement method). EXPO does this by generating plans and mon-itoring their execution to detect the diﬀerences between the state predictedaccording to the internal action model and the observed state. EXPO then con-structs a set of speciﬁc hypotheses to ﬁx the detected diﬀerences. After beingheuristically ﬁltered, each hypothesis is tested in turn with an experiment and aplan is constructed to achieve the situation required to carry out the experiment.

The ARMS (Action-Relation Modelling System) [25] system learns an actionmodel in two phases. In phase one of the algorithm, ARMS ﬁnds frequentaction sets from plans that share a common set of parameters. In addition,ARMS ﬁnds some frequent relation-action pairs with the help of the initial state12nd the goal state. These relation-action pairs give us an initial guess on thepreconditions, add lists and delete lists of actions in this subset. These actionsubsets and pairs are used to obtain a set of constraints that must hold in orderto make the plans correct. The constraints extracted from the plans are thentransformed into a weighted MAX-SAT representation, the solution to whichproduces action models. The process iterates until all actions are modelled.

PELA (Planning, Execution and Learning Architecture) [11] performs the threefunctions suggested in its name. The learning component allows PELA to gen-erate probabilistic rules about the execution of actions. PELA generates theserules from the execution of plans and compiles them to upgrade its determinis-tic planning model. This is done by performing multiclass classiﬁcation, whichfurther consists of ﬁnding the smallest decision tree that ﬁts a given data setfollowing a Top-Down Induction of Decision Trees (TDIDT) algorithm [19].

LAWS (Learn Action models with transferring knowledge from a related sourcedomain via Web search) [31] makes use of action-models already created before-hand in other related domains, which are called source domains, to help learnactions in a target domain. The target domain and a related source domain arebridged by searching Web pages related to the target domain and the sourcedomain, and then building a mapping between them by means of a similarityfunction done by calculating the similarity between their corresponding Webpages. The similarity is calculated using the Kullback-Leibler (KL) divergence.Based on the calculated similarity, a set of weighted constraints, called webconstraints, are built. Based any available example plan traces in the targetdomain, other constraints such as state constraints, action constraints and planconstraints, are also built. All the above constraints are solved using a weightedMAX-SAT solver, and target-domain action models are generated based on thesolution to the constraint satisfaction problem.

LOPE (Learning by Observation in Planning Environments) [8] learns by shar-ing among multi agent systems. Learning is performed by three integratedtechniques: rote learning of an experience (observation) by creating an opera-tor directly from it, heuristic generalization of incorrect learned operators; anda global reinforcement strategy of operators by rewarding and punishing thembased on their success in predicting the behavior of the environment. Rein-forcement of an operator means punishment of similar ones, so there is a globalreinforcement of the same action. This global reinforcement is done by meansof a virtual generalized Q table [8]. 13 .11 FOOLMETWICE

FOOLMETWICE [15] is a goal-oriented (GDA-Goal Driven Agent) algorithmwhich learns from surprises. It tries to ﬁnd inaccuracies in environment model M by attempting to explain all observations received. When a consistent expla-nation cannot be found, it infers that some unknown event E happened that isnot represented in M . After determining when unknown events occur, it createsa model of their preconditions requires generalizing over the states that triggerthem. Despite the bright prospects that the aforementioned approaches oﬀer, therepersist some open issues and loopholes which are discussed as follows: • Learning with the time dimension: Time plays an imperative role mostreal life domains. For example, each dialogue in a HRI is composed ofan utterance further accompanied by gestural, body and eye movements,all of them interleaved in a narrow time frame. These interactions maythus be represented by a time sequence, with the intent of learning theunderlying action model. Barring some initial works in this area, timeremains an interesting aspect to explore [27]. • Direct re-applicability of learned model for dialogue exchange: re-use ofa learned model by a planner continues to remain a concern. A modelthat has been learned by applying ML techniques is more often than notincomplete, or more concretely inadept to be fed to a planner to directlygenerate plans, or in the case of HRI, to reproduce a multimodal dialoguewhich respects social rules. It needs to be retouched and ﬁne tuned by adomain expert in order to be reusable. This marks a stark incapability ofthe prominently used machine learning techniques to be and comprehen-sive, and leaves scope for much more research. • Extension of classical planning to a full scope domain: The applicability ofthe aforementioned approaches, most of which have been tested on highlysimpliﬁed toy domains and not in real scenarios, remains an issue to be ad-dressed. Classical planning refers to a world model in which predicates arepropositional: they do not change unless the planning agent acts to changethem, all relevant attributes can be observed at any time, the impact ofexecuting an action on the environment is known and deterministic, the ef-fects of taking an action occurring instantly and so on. However, the realworld is laced with unpredictability: a predicate might switch its valuespontaneously, the world may have hidden variables, the exact impact ofactions may be unpredictable, the actions may have durations and so on[32]. 14

Conclusion

This article argues for the usage of AI planning techniques with the intent ofendowing robots with socio-communicative skills, thus augmenting their accept-ability. It justiﬁes the notion of learning the underlying behavioral blueprint ofthe robot from a set of multimodal HRI traces. This learning is achieved bythe usage of several state-of-the-art and classical Machine Learning (ML) tech-niques. The article tries to classify various ML approaches based on severalcriterion and conditions, along with the merits and demerits of each approach.It then broadly highlights some persisting open issues with the discussed ap-proaches, concluding that a signiﬁcant number of prominent and interestingtechniques have been applied to highly controlled experimental setups, and theirapplication to a real world HRI scenario is a topic of further research.

References [1] R. Alami, A. Clodic, V. Montreuil, E. A. Sisbot, and R. Chatila. Taskplanning for human-robot interaction. In

Proceedings of the joint confer-ence on Smart objects and ambient intelligence: innovative context-awareservices: usages and technologies , pages 81–85. ACM, 2005.[2] B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robotlearning from demonstration.

Robotics and autonomous systems , 57(5):469–483, 2009.[3] G. Bailly, F. Elisei, and M. Sauze. Beaming the gaze of a humanoid robot.In

Human-Robot Interaction , 2015.[4] A. G. Barto. Reinforcement learning and adaptive critic methods.

Handbookof intelligent control: Neural, fuzzy, and adaptive approaches , pages 469–491, 1992.[5] C. Breazeal. Emotion and sociable humanoid robots.

International Journalof Human-Computer Studies , 59(1):119–155, 2003.[6] C. Breazeal. Social interactions in hri: the robot view.

Systems, Man,and Cybernetics, Part C: Applications and Reviews, IEEE Transactionson , 34(2):181–186, 2004.[7] R. E. Fikes and N. J. Nilsson. Strips: A new approach to the application oftheorem proving to problem solving.

Artiﬁcial intelligence , 2(3-4):189–208,1971.[8] R. Garc´ıa-Mart´ınez and D. Borrajo. An integrated approach of learn-ing, planning, and execution.

Journal of Intelligent and Robotic Systems ,29(1):47–78, 2000.[9] Y. Gil. Acquiring domain knowledge for planning by experimentation.Technical report, DTIC Document, 1992.1510] R. Jilani, A. Crampton, D. E. Kitchin, and M. Vallati. Automated knowl-edge engineering tools in planning: state-of-the-art and future challenges.2014.[11] S. Jim´enez, F. Fern´andez, and D. Borrajo. The PELA architecture: inte-grating planning and learning to improve execution. In

National Conferenceon Artiﬁcial Intelligence , pages 1294–1299, 2008.[12] S. Jim´enez, F. Fern´andez, and D. Borrajo. Integrating planning, execu-tion, and learning to improve plan execution.

Computational Intelligence ,29(1):1–36, 2013.[13] T. L. McCluskey, N. E. Richardson, and R. M. Simpson. An interactivemethod for inducing operator descriptions. In

Artiﬁcial Intelligence Plan-ning Systems , pages 121–130, 2002.[14] D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso,D. Weld, and D. Wilkins. Pddl-the planning domain deﬁnition language.1998.[15] M. Molineaux and D. W. Aha. Learning unknown event models. Technicalreport, DTIC Document, 2014.[16] S. J. Pan and Q. Yang. A survey on transfer learning.

Knowledge and DataEngineering, IEEE Transactions on , 22(10):1345–1359, 2010.[17] H. Pasula, L. S. Zettlemoyer, and L. P. Kaelbling. Learning probabilis-tic relational planning rules. In

International Conference on AutomatedPlanning and Scheduling , pages 73–82, 2004.[18] C. R. Perrault and J. F. Allen. A plan-based analysis of indirect speechacts.

Computational Linguistics , 6(3-4):167–182, 1980.[19] J. R. Quinlan. Induction of decision trees.

Machine learning , 1(1):81–106,1986.[20] N. Ranasinghe and W. Shen. Surprise-based learning for developmentalrobotics. In

Learning and Adaptive Behaviors for Robotic Systems, 2008.LAB-RS’08. ECSIS Symposium on , pages 65–70. IEEE, 2008.[21] N. Verstaevel, C. R´egis, M.-P. Gleizes, and F. Robert. Principles andexperimentations of self-organizing embedded agents allowing learning fromdemonstration in ambient robotic.

Procedia Computer Science , 52:194–201,2015.[22] T. J. Walsh and M. L. Littman. Eﬃcient learning of action schemas andweb-service descriptions. In

Association for the Advancement of ArtiﬁcialIntelligence , pages 714–719, 2008.[23] X. Wang.

Learning planning operators by observation and practice . PhDthesis, Carnegie Mellon University, 1996.1624] B. G. Weber, M. Mateas, and A. Jhala. Learning from demonstration forgoal-driven autonomy. In

Association for the Advancement of ArtiﬁcialIntelligence , 2012.[25] Q. Yang, K. Wu, and Y. Jiang. Learning action models from plan examplesusing weighted MAX-SAT.

Artiﬁcial Intelligence , 171(2):107–143, 2007.[26] S. Yoon and S. Kambhampati. Towards model-lite planning: A proposal forlearning & planning with incomplete domain models. In

ICAPS Workshopon AI Planning and Learning , 2007.[27] Y. Zhang, S. Sreedharan, and S. Kambhampati. Capability models andtheir applications in planning. In

Proceedings of the International Con-ference on Autonomous Agents and Multiagent Systems , pages 1151–1159,2015.[28] H. H. Zhuo and S. Kambhampati. Action-model acquisition from noisy plantraces. In

Proceedings of the international joint conference on ArtiﬁcialIntelligence , pages 2444–2450, 2013.[29] H. H. Zhuo, T. Nguyen, and S. Kambhampati. Reﬁning incomplete plan-ning domain models through plan traces. In

Proceedings of the Twenty-Third international joint conference on Artiﬁcial Intelligence , pages 2451–2457. AAAI Press, 2013.[30] H. H. Zhuo, Q. Yang, D. H. Hu, and L. Li. Learning complex actionmodels with quantiﬁers and logical implications.

Artiﬁcial Intelligence ,174(18):1540–1569, 2010.[31] H. H. Zhuo, Q. Yang, R. Pan, and L. Li. Cross-domain action-modelacquisition for planning via web search. In

International Conference onAutomated Planning and Scheduling , 2011.[32] T. Zimmerman and S. Kambhampati. Learning-assisted automated plan-ning: looking back, taking stock, going forward.