Beneficial and Harmful Explanatory Machine Learning
Lun Ai, Stephen H. Muggleton, Céline Hocquette, Mark Gromowski, Ute Schmid
MMachine learning manuscript No. (will be inserted by the editor)
Beneficial and Harmful Explanatory MachineLearning
Lun Ai · Stephen H. Muggleton · CélineHocquette · Mark Gromowski · Ute Schmid
Received: date / Accepted: date
Abstract
Given the recent successes of Deep Learning in AI there has been increasedinterest in the role and need for explanations in machine learned theories. A distinctnotion in this context is that of Michie’s definition of Ultra-Strong Machine Learning(USML). USML is demonstrated by a measurable increase in human performance ofa task following provision to the human of a symbolic machine learned theory fortask performance. A recent paper demonstrates the beneficial effect of a machinelearned logic theory for a classification task, yet no existing work has examinedthe potential harmfulness of machine’s involvement in human learning. This paperinvestigates the explanatory effects of a machine learned theory in the context ofsimple two person games and proposes a framework for identifying the harmfulnessof machine explanations based on the Cognitive Science literature. The approachinvolves a cognitive window consisting of two quantifiable bounds and it is supportedby empirical evidence collected from human trials. Our quantitative and qualitativeresults indicate that human learning aided by a symbolic machine learned theorywhich satisfies a cognitive window has achieved significantly higher performancethan human self learning. Results also demonstrate that human learning aided by asymbolic machine learned theory that fails to satisfy this window leads to significantlyworse performance than unaided human learning.
Lun AiDepartment of Computing, Imperial College London, London, UKE-mail: [email protected] H. MuggletonDepartment of Computing, Imperial College London, London, UKE-mail: [email protected]éline HocquetteDepartment of Computing, Imperial College London, London, UKE-mail: [email protected] GromowskiCognitive Systems Group, University of Bamberg, Bamberg, GermanyE-mail: [email protected] SchmidCognitive Systems Group, University of Bamberg, Bamberg, GermanyE-mail: [email protected] a r X i v : . [ c s . A I] S e p Lun Ai et al.
In a recent paper [34] the authors provided an operational definition for comprehensi-bility of logic programs and used this, in experiments with humans, to provide thefirst demonstration of Michie’s
Ultra-Strong Machine Learning (USML). The authorsdemonstrated USML via empirical evidence that humans improve out-of-sample per-formance in concept learning from a training set E when presented with a first-orderlogic theory which has been machine learned from E . The improvement of humanperformance indicates a beneficial effect of comprehensible machine learned modelson human skill acquisition. The present paper investigates the explanatory effectsof machine’s involvement in human skill acquisition of simple games. Our resultsindicate that when a machine learned theory is used to teach strategies to humans,in some cases the human’s out-of-sample performance is reduced. This degradation ofhuman performance is recognised to indicate the existence of harmful explanations.In the current paper, which extends our previous work on the phenomenon ofUSML, both beneficial and harmful effects of a machine learned theory are explored inthe context of simple games. Our definition of explanatory effects is based on humanout-of-sample performance in the presence of natural language explanation generatedfrom a machine learned theory (Figure 1). The analogy between understanding alogic program via declarative reading and understanding a piece of natural languagetext allows the explanatory effects of a machine learned theory to be investigated.Fig. 1: Textual and visual explanations are shown to treated participants along witha training example for winning a two player game isomorphic to Noughts and Crosses.Textual explanations were generated from the rules learned by our M eta- I nterpretiveex Plain able game learner
M IP lain .The results of relevant Cognitive Science literature allow the properties of alogic theory which are harmful to human comprehension to be characterised. Ourapproach is based on developing a framework describing a cognitive window whichinvolves bounds with regard to 1) descriptive complexity of a theory and 2) executionstack requirements for knowledge application. We hypothesise that a machine learnedtheory provides a harmful explanation to humans when theory complexity is highand execution is cognitively challenging. Our proposed cognitive window model isconfirmed by empirical evidence collected from multiple experiments involving human participants of various backgrounds.We summarise our main contributions as follows: – We define a measure to evaluate beneficial/harmful explanatory effects of machinelearned theory on human comprehension. eneficial and Harmful Explanatory Machine Learning 3 – We develop a framework to assess a cognitive window of a machine learned theory.The approach encompasses theory complexity and the required execution stack. – Our quantitative and qualitative analyses of the experimental results demonstratethat a machine learned theory has a harmful effect on human comprehensionwhen its search space is too large for human knowledge acquisition and it fails toincorporate executional shortcuts.This paper is arranged as follows. In Section 2, we discuss existing work relevantto the paper. The theoretical framework with relevant definitions is presented inSection 3. We describe our experimental framework and the experimental hypothesesin Section 4. Section 5 describes several experiments involving human participants ontwo simple games. We examine the impact of a cognitive window on the explanatoryeffects of a machine learned theory based on human performance and verbal input. InSection 6, we conclude our work and comment on our analytical results – only a shortand simple-to-execute theory can have a beneficial effect on human comprehension.We discuss potential extensions to the current framework, curriculum learning andbehavioural cloning, for enhancing explanatory effects of a machine learned theory.
This section summarises related research of game learning and familiarises the readerwith the core motivations for our work. We first present a short overview of relatedinvestigations in explanatory machine learning of games. Subsequently, we covervarious approaches for teaching and learning between humans and machines.2.1 Explanatory machine learning of gamesEarly approaches to learning game strategies [47,41] used the decision tree learner ID3to classify minimax depth-of-win for positions in chess end games. These approachesused carefully selected board attributes as features. However, chess experts haddifficulty understanding the learned decision tree due to its high complexity [26].Methods for simplifying decision trees without compromising their accuracy havebeen investigated [42] on the basis that simpler models are more comprehensible tohumans. An early Inductive Logic Programming (ILP) [35] approach learned optimalchess endgame strategies at depth 0 or 1 [5]. An informal complexity constraint wasapplied which limits the number of clauses used in any predicate definition to 7 ± ± M EN ACE (Matchbox EducableNoughts And Crosses Engine) [25] was specifically designed to learn an optimal agent policy for Noughts and Crosses. Later, Q-Learning [54] and Deep ReinforcementLearning were spawned and have led to a variety of applications including theAtari 2600 games [33] and the game of Go [50]. While these systems defeated thestrongest human players, they are not human-like since they lack the ability toexplain the encoded knowledge to humans. Recent approaches such as [55] haveaimed to explain the policies learned by these models, but the learned strategy is
Lun Ai et al. implicitly encoded into the continuous parameters of the policy function which makestheir operation opaque to humans. Relational Reinforcement Learning [14] and DeepRelational Reinforcement Learning [56] have attempted to address these drawbacksby incorporating the use of relational biases to ensure human understandability.In [30,31], the author provided a survey of most relevant work in explainable AIand argued that explanatory functionalities were mostly subjective to the developer’sview. While there is a general lack of demonstration on explanatory effect whichshould be examined by empirical trials, no existing framework accounts for theexplanatory harmfulness of machine learned models.2.2 Two-way learning between human and machineAs an emerging sub-field of AI, Machine Teaching [16] provides an algorithmic modelfor quantifying the teaching effort and a framework for identifying an optimizedteaching set of examples to allow maximum learning efficiency for the learner. Thelearner is usually a machine learning model of a human in a hypothesised setting. Ineducation, machine teaching has been applied to devise intelligent tutoring systemsto select examples for teaching [59,43]. On the other hand, rule-based logic theoriesare important mechanisms of explanation. Rule-based knowledge representations aregeneralised means of concept encoding and have a structure analogous to humanconception. Mechanisms of logical reasoning, induction and abduction, have longbeen shown to be highly related to human concept attainment and informationprocessing [23,19]. Additionally, humans’ ability to apply recursion plays a key rolein understanding of relational concepts and semantics of language [17] which areimportant for communication.The process of reconstructing implicit target knowledge which is easy to operatebut difficult to describe via machine learning has been explored under the topic ofBehavioural Cloning. The cloning of human operation sequence has been applied invarious domains such as piloting [28] and crane operation [53]. The cloned humanknowledge and experience are more dependable and less error-prone due to perceptualand executional inconsistency being averaged across the original behavioural trace. Toour knowledge, no existing work has attempted to estimate human errors and targetthese mistakes in interactive teaching sessions for achieving a measurable "clean up"effect [27] from machine explanations. B , M , E + , E− ) where the background knowledge B is afirst-order logic program, meta-rules M are second-order clauses, positive examples E + and negative examples E− are ground atoms, a MIL algorithm returns a logicprogram hypothesis H such that M ∪ H ∪ B | = E + and M ∪ H ∪ B 6| = E− . Themeta-rules (for examples see Figure 3) contain existentially quantified second-ordervariables and universally quantified first-order variables. They clarify the declarativebias employed for substitutions of second-order Skolem constants. The resultingfirst-order theories are thus strictly logical generalisation of the meta-rules. eneficial and Harmful Explanatory Machine Learning 5 Table 1: A set of win rules is learned by
M IGO . MIGO ’s background knowledgecontains a general move generator move/2 and a won classifier won/1 to encode theminimum rules of the game. The program is dyadic and win _1 / win _1( A, B ) : − move ( A, B ) , won ( B ) by removing literals after unfolding. Depth Rules win_1(A,B):- win_1_1_1(A,B),won(B).win_1_1_1(A,B):-move(A,B),won(B). win_2(A,B):-win_2_1_1(A,B),not(win_2_1_1(B,C)).win_2_1_1(A,B):-move(A,B), not(win_1(B,C)). win_3(A,B):-win_3_1_1(A,B),not(win_3_1_1(B,C)).win_3_1_1(A,B):-win_2_1_1(A,B), not(win_2(B,C)). Table 2: The logic program learned by
M IP lain represents a strategy for the firstplayer to win at different depths of the game. The predicate win _3_4 / win _3_4( A ) : − win _2( A, B ) by removing literals after unfolding.
Depth Rules win_1(A,B):-move(A,B),won(B). win_2(A,B):-move(A,B),win_2_1(B).win_2_1(A):-number_of_pairs(A,x,2), number_of_pairs(A,o,0). win_3(A,B):-move(A,B),win_3_1(B).win_3_1(A):-number_of_pairs(A,x,1),win_3_2(A).win_3_2(A):-move(A,B),win_3_3(B).win_3_3(A):-number_of_pairs(A,x,0),win_3_4(A).win_3_4(A):-win_2(A,B),win_2_1(B). The MIL game learning framework
MIGO [36] is a purely symbolic system basedon the adapted Prolog meta-interpreter Metagol [12].
MIGO learns exclusively frompositive examples by playing against the optimal opponent. MIGO is provided with aset of three relational primitives, move/2, won/1, drawn/1 which are a move generator,a won and a drawn classifier respectively. These primitives represent the minimalinformation a human would expect to know before playing a two-person game. ForNoughts and Crosses and Hexapawn,
MIGO learns a rule-like symbolic game strategy(Table 1) that supports human understanding and was demonstrated to convergeusing less training data compared to Deep and classical Q-Learning. For successivevalues of k, MIGO learns a series of inter-related definitions for predicates win _ k/ M IP lain , a variant of M IGO which focuses on learning the taskof winning for the game of Noughts and Crosses. In addition to learning from positiveexamples,
M IP lain identifies moves which are negative examples for the task ofwinning. When a game is drawn or lost for the learner, the corresponding path inthe game tree is saved for later backtracking following the most updated strategy.
M IP lain performs a selection of hypotheses based on the efficiency of hypothesisedprograms using
M etaopt [13].An additional primitive number _ of _ pairs/ M IP lain which depicts the number of pairs for a player (X or O) on a given board. A pair is thealignment of two marks of one player, the third square of this line being empty.An example of pairs is shown in Figure 2. This additional primitive serves as anexecutional shortcut that reduces the depth of the search when executing the learned MIPlain source is available at https://github.com/LAi1997/MIPlain Lun Ai et al.
X OO X OOXX OXO O O XX OOXO p1/2skip2words/2 skip1wordcopy1wordend/2skip1word/2 copyalphanum/2skipalphanum/2 skiprest/2skip1/2 copy1wordend/2
Fig. 2: O has two pairsrepresented in green andX has no pairs.
Meta-rule P ( A, B ) ← Q ( A, B ) , R ( B ) .P ( A ) ← Q ( A, B ) , R ( B ) .P ( A ) ← Q ( A, S, T ) , R ( A ) .P ( A ) ← Q ( A, S, T ) , R ( A, U, V ) . Fig. 3: Letters P, Q, R, S, T, U, V denote existen-tially quantified second-order variables and A, B,C are universally quantified first-order variables.strategy. Furthermore,
M IP lain is given the meta-rules described in Figure 3, whichare two variants of the postcon meta-rule with monadic or dyadic head, and twovariants of the conjunction meta-rule with currying in either the first or both bodyliterals. Currying allows the learning of programs with higher-arity predicates whereexistentially quantified argument variables are bound to constants. The learnedstrategy presented in Table 2 describes conditions in a rule-like manner that theplayer’s optimal move has to satisfy.3.2 Explanatory effectiveness of a machine learned theoryWe extend the machine-aided human comprehension of examples in [34] and C ( D, H, E )denotes the unaided human comprehension of examples where D is a target definition, H is a group of humans and E is a set of examples. Based on the analogy betweendeclarative understanding of a logic program and understanding of a natural languageexplanation, we describe measures for estimating the degree to which the output of asymbolic machine learning algorithm as an explanation can aid human comprehension. Definition 1 (Machine-explained human comprehension of examples, C ex ( D, H, M ( E )) ) : Given a definition D , a group of humans H , a theory M ( E )learned using machine learning algorithm M and examples E , the machine-explainedhuman comprehension of examples E is the mean accuracy with which a human h ∈ H after brief study of an explanation based on M ( E ) can classify new materialselected from the domain of D . Definition 2 (Explanatory effect of a machine learned theory, E ex ( D, H, M ( E )) ) :Given a definition D , a group of humans H , a symbolic machine learning algorithm M , the explanatory effect of the theory M ( E ) learned from examples E is E ex ( D, H, M ( E )) = C ex ( D, H, M ( E )) − C ( D, H, E ) Definition 3 (Beneficial/harmful effect of a machine learned theory) : Givena definition D , a group of humans H , a symbolic machine learning algorithm M : – M ( E ) learned from examples E is beneficial to H if E ex ( D, H, M ( E )) > – M ( E ) learned from examples E is harmful to H if E ex ( D, H, M ( E )) < – Otherwise, M ( E ) learned from examples E does not have observable effect on H In the scope of this work, we relate the explanatory effectiveness of a theoryto performance which means that a harmful explanation provided by the machinedegrades comprehension of the task and therefore reduces performance. eneficial and Harmful Explanatory Machine Learning 7
Definition 4 (Cognitive bound on the hypothesis space size, B ( P, H ) ) : Con-sider a symbolic machine learned datalog program P using p predicate symbols and m meta-rules each having at most j body literals. For a group of humans H , B ( P, H )is a bound on the size of hypothesis space such that at most n clauses in P can becomprehended by H and B ( P, H ) = m n p (1+ j ) n . Lun Ai et al.
When learned knowledge is cognitively challenging, execution overflows humanworking memory and instruction stack. We then expect decision making to be moreerror prone and the task performance of human learners to be less dependable. Toaccount for the cognitive complexity of applying a machine learned theory, we definethe cognitive resource of a logic term and atom.
Definition 5 (Cognitive cost of a logic term and atom, C ( T ) ) : Given T alogic term or atom, the cost of C ( T ) can be computed as follows: – C ( > ) = C ( ⊥ ) = 1 – A variable V has cost C ( V ) = 1 – A constant c has cost C ( c ) which is the number of digits and characters in c – A list [ T , T , ... ] as a data structure used by M IGO and
M IP lain has cost C ([ T , T , ... ]) = C ( T ) + C ( T )+ . . . – An atom Q ( T , T , ... ) has cost C ( Q ( T , T , ... )) = 1 + C ( T ) + C ( T )+ . . . Example 1
The Noughts and Crosses position in Figure 2 is represented by an array[ e, x, o, e, e, x, o, e, o ], where e is an empty field and o and x are marks on the board.It has cognitive cost C ([ e, x, o, e, e, x, o, e, o ]) = 9.Note that we compute cognitive costs of programs without redundancy sincerepeated literals in programs learned by M IGO and
M IP lain were removed afterunfolding for generating explanations which are presented to human populations.Also, a game position can be represented by different data types. We ignore the costdue to implementation and only count digits and marks.
Example 2
An atom win _2([ e, x, o, e, e, x, o, e, o ] , X ) with variable X has a cognitivecost C ( win _2([ e, x, o, e, e, x, o, e, o ] , X )) = 11.We model the inferential process of evaluating training and testing examplesby the run-time execution stack of a datalog program. The resolution of a queryrepresents a mental application of a piece of knowledge given a training or testingexample. In this work, we neglect the cost of computing the sub-goals of a primitiveand compute its cost as if it were a normal predicate for simplicity. Example 3
A primitive move ( S , S
2) which is an atom with variables S S C ( move ( S , S Definition 6 (Execution stack of a datalog program, S ( P, q ) ) : Given a query q , the execution stack S ( P, q ) of a datalog program P is a set of atoms or termsevaluated during the execution of P to compute q . Each exit point of the executionis replaced with the value > , and each backtrack point has the value ⊥ . Definition 7 (Cognitive cost of a datalog program,
Cog ( P, q ) ) : Given a query q , and let St represent S ( P, q ), the cognitive cost of a datalog program P is Cog ( P, q ) = min St X t ∈ St C ( t ) Example 4
The primitive move/ move ( s , B ). The execution stack contains move ( s , B ) and move(s1, s2), Cog ( P, move ( s , B )) is 10. S(move(A,B), move(s1, B)) move(s1, B) move(s1, s2) > C(T) 4 5 1eneficial and Harmful Explanatory Machine Learning 9
The maintenance cost of task goals in working memory affects performance ofproblem solving [10]. Background knowledge provides key mappings from solutionsobtained in other domains or past experience [4,40] and grants shortcuts for theconstruction of the current solution process. We expect that when knowledge thatprovides executional shortcuts is comprehended, the efficiency of human problemsolving could be improved due to a lower demand for cognitive resource. Contrarily,in the absence of informative knowledge, performance would be limited by humanoperational error and would not be better than solving the problem directly. Toaccount for the latter case, we define the cognitive cost of a problem solution thatinvolves the minimum amount of information from the task.
Definition 8 (Minimum primitive solution program, ¯ M φ ( E ) ) : Given a set ofprimitives φ and examples E , a datalog program learned from examples E using asymbolic machine learning algorithm ¯ M and a set of primitives φ ⊆ φ is a minimumprimitive solution program ¯ M φ ( E ) if and only if for all sets of primitives φ ⊆ φ where | φ | < | φ | and for all symbolic machine learning algorithm M using φ , thereexists no machine learned program M ( E ) that is consistent with examples E .Given a machine learning algorithm M using primitives φ and examples E , aminimum primitive solution program ¯ M φ ( E ) is learned by using the smallest subsetof φ such that ¯ M φ ( E ) is consistent with E . A minimum primitive solution program is defined to not use more auxiliary knowledge than necessary but does not necessarily have the minimum cognitive cost over all programs learned with examples E . Remark 1
Given that the training examples of Noughts and Crosses are winnable and
M IP lain uses the set of primitives φ = { move/ , won/ , number _ of _ pairs/ } , aminimum primitive solution program is produced by M IGO . This is because
M IGO uses primitives { move/ won/ } which is a strict subset of φ for making a move anddeciding a win when the input is winnable. Primitives move/ won/ φ with the cardinality of one. Definition 9 (Cognitive cost of a problem solution,
CogP ( E, φ, q ) ) : Givenexamples E , primitive set φ and a query q , the cognitive cost of a problem solution is CogP ( E, φ, q ) = min ¯ M Cog ( ¯ M φ ( E ) , q )where ¯ M φ ( E ) is a minimum primitive solution program. Remark 2
The program P learned by M IP lain has less cognitive cost than the onelearned by
M IGO except for queries concerning win _1 /
2. Given sufficient examples E and primitive set used by M IP lain , φ = { move/ , won/ , number _ of _ pairs/ } ,based on Definition 6 to 9, we have Cog ( P, x ) = CogP ( E, φ, x ), Cog ( P, x ) < CogP ( E, φ, x ) and Cog ( P, x ) < CogP ( E, φ, x ) where x i = win i ( s i , V ) in which s i represents a position winnable in i moves and V is a variable. We give a definition of human cognitive window based on theory complexityduring knowledge acquisition and theory execution cost during knowledge application.A machine learned theory has 1) a harmful explanatory effect when its hypothesisspace size exceeds the cognitive bound and 2) no beneficial explanatory effect if itscognitive cost is not sufficiently lower than the cognitive cost of the problem solution.
Table 3: Criteria for evaluating verbal responses and examples for category win _2 / Q ( r ) Criteria Exemplary r Level 0 r does not fit into any of the categoriesbelow “Follow the instructions.”Level 1 One or more primitives in the machinelearned theory, directly or by synonyms,are described correctly in r “This move gives me a pair.”Level 2 All primitives in the machine learnedtheory, directly or by synonyms, are de-scribed correctly in r “I should have picked this move to pre-vent the opponent and get two attacks.”Level 3 r is unambiguous and follows a matchingexecutional order as the machine learnedtheory "This move gives me two attacks andprevents the opponent from getting apair."Level 4 r explains one or more primitives in themachine learned theory in correct causalrelations “This is a good move because by makingtwo pairs and blocking the opponent,the opponent cannot win in one turnand can only block one of my pairs.” Definition 10 (Cognitive window of a machine learned theory) : Given adefinition D , a symbolic machine learning algorithm M , examples E , M ( E ) is amachine learned theory using the primitive set φ and belongs to a program class withhypothesis space S . For a group of humans H , E ex satisfies E ex ( D, H, M ( E )) < 0 if | S | > B ( M ( E ) , H ) and E ex ( D, H, M ( E )) ≤ Cog ( M ( E ) , x ) ≥ CogP ( E, φ, x ) for queries x that h ∈ H have to perform after study In the following section, we describe an experimental framework for assessing theimpact of cognitive window on the explanatory effects of a machine learned theory.Our experimental framework involves 1) a set of criteria for evaluating the participants’learning quality from their own verbal descriptions of learned strategies and 2) anoutline of experimental hypotheses. For game playing, we assume humans are able toexplain actions by verbalising procedural rules of strategy. We expect verbal responsesto provide insights about human decision making and knowledge acquisition. Thequality of verbal responses can be affected by multiple factors such as motivation,familiarity with the introduced concepts and understanding of the game rules. Wetake into account these factors in the evaluation criteria.
Definition 11 (Primitive coverage of a verbal response) : A verbal responsecorrectly describes a primitive if the semantic meaning of the primitive is unambigu-ously stated in the response. The primitive coverage is the number of primitives in asymbolic machine learned theory that are described correctly in a verbal response.
Definition 12 (Quality of a verbal response, Q ( r ) ) : A verbal response r is checked against the specifications from Table 3 in an increasing order from criterialevel 1 to level 4. Q ( r ) is the highest level i that r can satisfy. When a response doesnot satisfy any of the higher levels, the quality of this response is the lowest level 0.To illustrate, we consider the predicate win _2 / M IP lain (Table 2).Primitive predicates are move/ number _ of _ pairs/
3. We present in Table 3 eneficial and Harmful Explanatory Machine Learning 11 a number of examples of verbal responses. A high quality response reflects a highmotivation and good understanding of game concepts and strategy. On the other hand,a poor quality response demonstrates a lack of motivation or poor understanding.
Definition 13 (High ( HQ ) / low ( LQ ) quality verbal response) : A HQ re-sponse rh has Q ( rh ) ≥ LQ response rl has Q ( rl ) < M denote a symbolic machine learning algorithm. E standsfor examples, D is a target definition, H is a group of participants sampled froma human population. M ( E ) denotes a machine learned theory which belongs to adefinite clause program class with hypothesis space S . First, we are interested indemonstrating whether 1) the verbal response quality of learned knowledge reflectscomprehension, 2) there exist cognitive bounds for humans to provide verbal responsesof higher quality and 3) the machine learned theory helps improve the quality ofverbal responses. H1 : Unaided human comprehension C ( D, H, E ) and machine-explained human com-prehension C ex ( D, H, M ( E )) manifest in verbal response quality Q ( r ). We ex-amine if high post-test accuracy correlates with high response quality and highprimitive coverage of each question category. H2 : Difficulty for human participants to provide verbal response increases with qualityQ(r) . We examine if the proportion of verbal responses reduces with respect tohigh response quality and high primitive coverage of each question category. H3 : Machine learned theory M ( E ) improves verbal response quality Q ( r ). We examineif machine-aided learning results in more HQ responses.The impact of a cognitive window on explanatory effects is tested via the followinghypotheses. φ is a set of primitives introduced to H . Let x denote the set of questionsthat human h ∈ H answers after learning. H4 : Learning a complex theory ( | S | > B ( M ( E ) , H ) ) exceeding the cognitive boundleads to a harmful explanatory effect ( E ex ( D, H, M ( E )) < ) . We examine if thepost-test accuracy, after studying a machine learned theory that participantscannot recall fully, is worse than the accuracy following self-learning. H5 : Applying a theory without a low cognitive cost ( Cog ( M ( E ) , x ) ≥ CogP ( E, φ, x )) does not lead to a beneficial explanatory effect ( E ex ( D, H, M ( E )) ≤ ) . Weexamine if the post-test accuracy, after studying a machine learned theory thatis cognitively costly, is equal to or worse than the accuracy following self-learning. This section introduces the materials and experimental procedure which we designedto examine the explanatory effects of a machine learned theory on human learners.
Afterwards, we describe the experiment interface and present experimental results.5.1 MaterialsWe assume that Noughts and Crosses is a widely known game a lot of participants ofthe experiments are familiar with. This might result in many participants already
Table 4: Summary of experiment parts. Participants played one mock game against arandom computer player for the more difficult Island Game. After selecting a move intraining and regardless of its correctness, participants received the labels of the twomoves presented; treated participants additionally received explanations generatedfrom
M IP lain ’s learned program. We introduced the primitive set used by
M IP lain . Part Participant’s assignment No. Question formatIntro Understand rules to move andwin 1 practicePre-test Choose the optimal move 15 five canonical positions each forwin_1, win_2 & win_3Training Understand the concept ofpairs; choose the optimal moveand reflect on the choice 9 two choices each for win_1, win_2& win_3; presentation of the labelsPost-test Choose the optimal move 15 five canonical positions each forwin_1, win_2 & win_3; Rotatedand flipped from pre-test questions.Open ques-tions Describe the strategy of a pre-viously made move 6 Questions requiring verbal responseSurvey Provide gender, age group &education level 3 multiple choices playing optimally before receiving explanations, leaving no room for potential per-formance increase. In order to address this issue, the
Island Game was designed asa problem isomorphic to Noughts and Crosses. [51] define isomorphic problems as"problems whose solutions and moves can be placed in one-to-one relation with thesolutions and moves of the given problem". This changes the superficial presentationof a problem without modifying the underlying structure. Several findings implythat this does not impede solving the problem via analogical inference if the originalproblem is consciously recognized as an analogy; on the other hand, the prior stepof initially identifying a helpful analogy via analogical access is highly influencedby superficial similarity [15,20,44]. Given that the Island Game presents a majorre-design of the game surface, we expect that participants will less likely recall priorexperience of Noughts and Crosses that would facilitate problem solving, leading toless optimal play initially and more potential for performance increase.The Island Game (Figure 4) contains three islands, each with three territories onwhich one or more resources are marked. The winning condition is met when a playercontrols either all territories on one island or three instances of the same resource.The nine territories resemble the nine fields in Noughts and Crosses and the structureof the original game is maintained in regard to players’ turns, possible moves, boardstates and win conditions. This isomorphism masks a number of spatial relations thatrepresent the membership of a field to a win condition. In this way, the fields can berearranged in an arbitrary order without changing the structure of the game.5.2 Methods and design
Fig. 4: Example of pre- and post-test ques-tion for the Island Game. A board is pre-sented to the participant who has to selectthe move that he or she thinks is optimal.We use two experiment interfaces, onefor Noughts and Crosses and anotherone for the Island Game. For both, weadopt a two-group pre-test post-test de-sign (Table 4). In the pre-test, perfor-mance of participants in both self learn- eneficial and Harmful Explanatory Machine Learning 13 ing and machine-aided learning groupsare measured in an identical way. Dur-ing training, we introduce to participantsthe concept of pairs and they are able tosee correct answers of some game po-sitions. In the post-test, performanceof both self-learning and machine-aidedgroups are evaluated in the exact sameway as in the pre-test. This experiment setting allows to evaluate the degree of changein performance as the result of explanations. Each question in pre- and post-test isthe presentation of a board for which it is the participant’s turn to play. They areasked to select what they consider to be the optimal move. A question category of win i denotes a game position winnable in i moves of the human player. An exemplaryquestion is shown in the Figure 4. The post-test questions are rotated and flippedfrom pre-test questions. In each test, only 15 questions are given to limit experimentduration to one hour. The response time of participants was recorded for each pre-testand post-test question.The treatment was applied to the machine-aided group. In the interest of exper-imentation, during treatment, we present both visual and textual explanations toavoid unnecessary effort of participants to associate textual explanations to gamepositions and concepts. This is based on the consideration that direct associationbetween textual explanations and game states which can be abstract for participantswho are not familiar with the designed game domain. Learned first-order theorieshave been translated with manual adjustments based on primitives provided to allparticipants and to M IP lain . An exemplary explanation is shown in Figure 1. Bothvisual and textual explanations preserve the structure of hypotheses to account forthe reasons that make a move right and the other move wrong. Conversely, duringtraining, the self-learning group was presented with similar game position withoutthe corresponding visual and textual explanations. For the Island Game experiments,we recorded an English description of the strategy they used for each of the selectedpost-test questions. Participants are presented previously submitted answers, one at atime along with a text input box for written answers. Moves for these open questionsare selected from post-test with a preference order from wrong and hesitant movesto consistently correct moves. We associate hesitant answers with higher responsetimes. A total of six questions are selected based on individual performance duringthe post-test.5.3 Experiment resultsWe conducted three experiments using the interface with Noughts and Crossesquestions and explanations. These experiments were carried out on three samples:an undergraduate student group from Imperial College London, a junior studentgroup from a German middle school and a mixed background group from Amazon Mechanical Turk (AMT). No consistent explanatory effects could be observed forany of the mentioned samples. The problem solving strategy that humans apply canbe affected by factors such as task familiarity, problem difficulty, and motivation. For raw data are available upon request from the authors AMT is a online crowdsourcing platform which we used to recruit experiment samples4 Lun Ai et al.(a) Mixed background self learning andmachine-aided learning. (b) Student self learning and machine-aidedlearning.
Fig. 5: Number of correct answers in pre- and post-test with respect to questioncategories.instance, [45] suggested that a rather superficial analogical transfer of a strategy isapplied when a problem is too difficult or when there is no reason to gain a moregeneral understanding of a problem. Given that the majority of subjects achievedreasonable initial performance, we ascribe the reason of such results to experiencewith the game and complexity of explanations. The game familiarity of adult groupsled to less potential for performance improvement. Early middle school students hadlimited attention and were overwhelmed by information intake. Alternatively, wefocused on specially designed experiment materials in the following experiments.
A sample from Amazon Mechanical Turk and a student sample from the Universityof Bamberg participated in experiments that used the interface with Island Gamequestions and explanations. To test hypotheses H1 to H5 , we employed a quantitativeanalysis on test performance and a qualitative analysis on verbal responses. A sub-sample with a mediocre initial performance within one standard deviation of themean was selected for the performance analysis. This aims to discount the ceilingeffect (initial performance too high) and outliers (e.g. struggling to use the interface).From AMT sample, we had 90 participants who were 18 to above 65 years old.A sub-sample of 58 participants with a mediocre initial performance was randomlypartitioned into two groups, MS (Mixed background Self learning n = 29) and MM (Mixed background Machine-aided learning, n = 29). A different sub-sample of 30participants completed open questions and was randomly split into two groups, MSR (Mixed background Self learning and strategy Recall, n = 15) and MMR (Mixed background Machine-aided learning and strategy Recall, n = 15). As shown in Figure5a, in category win _2, MM post-test had a better comprehension ( p = 0.028) than MS post-test while MM and MS had similar pre-test performance ( p > 0.1) inthis category. Results in category win _2 indicate that explanations have a beneficialeffect on MM . However, MM did not have a better comprehension on win _1 than MS given the same initial performance ( p > 0.1). In addition, MM had the same eneficial and Harmful Explanatory Machine Learning 15 initial performance as MS on win _3 ( p > 0.1) but MM ’s performance reduced afterreceiving explanations of win _3 ( p = 0.005).From a group of students involved in a Cognitive Systems course at the Universityof Bamberg, we had 13 participants who were 18 to 24 years old and a few outliersbetween 25 and 54 years. All participants were asked to complete open questions andwere randomly split into two groups, SSR (Student Self learning and strategy Recall, n = 4) and SMR (Student Machine-aided learning and strategy Recall, n = 9). Asub-sample of 9 with a mediocre initial performance was randomly divided into SS (Student Self learning, n = 2) and SM (Student Machine-aided learning, n = 7). Theimbalance in the student sample was caused by a number of participants leavingduring the experiment. The machine-aided learning results show large performancevariances in post-test as evidence for insignificant levels of performance degradation.In Table 5, we identified that participants who were able to provide high qualityresponses for their test answers scored higher on these questions. This is not thecase for win _3, however, due to the high difficulty of providing good descriptionof strategy for win _3 category. Additionally, in the win _2 category, both machine-aided groups ( MMR : 2/(2+35),
SMR : 9/(9+14)) have greater proportions of highquality responses than self learning groups (
MSR : 1/(1+32),
SSR : 1/(1+8)). Also,we observed a pattern in which there are less HQ responses than LQ responses in win _1 and win _2 categories. This pattern is more significant in win _2 category.Figure 6 illustrates the difficulty of providing good quality verbal response for thenon-trivial category win _3. Since win _1 contains only two predicates, we examinedprimitive coverage of non-trivial categories win _2 and win _3. However, for clarityof presentation, we only show category win _3 which has more remarkable trends.When counting primitives based on Definition 11, we only consider the constraint number _ of _ pairs/ move/ monotonicallyincreasing trend in accuracy with respect to primitive coverage. This indicates thathigh matching between verbal responses and the machine learned theory correlateswith high performance. In Figure 6b, we observed downward curves for MSR and
MMR in the number of verbal responses from the lower to the higher primitivecoverage. More responses were provided by
SSR and
SMR covering one primitive than
MSR and
MMR . Participants gave very few responses that cover more thantwo primitives. Based on the learned theory in Table 2, the results suggest anTable 5: The number and accuracy of HQ and LQ responses for groups
MSR , MMR , SSR , SMR and each question category. For win _3, the most mentally challengingcategory of all three, no HQ response was given. win _1 win _2 win _3MSR No. HQ / post-train accuracy 9 / 0.889 1 / 1.0 -No. LQ / post-train accuracy 19 / 0.421 32 / 0.406 29 / 0.517MMR No. HQ / post-train accuracy 8 / 1.00 2 / 1.00 -No. LQ / post-train accuracy 16 / 0.250 35 / 0.486 29 / 0.483SSR No. HQ / post-train accuracy 6 / 1.00 1 / 1.00 -No. LQ / post-train accuracy 0 / 0.00 8 / 0.750 9 / 0.667SMR No. HQ / post-train accuracy 9 / 1.00 9 / 0.778 -No. LQ / post-train accuracy 3 / 0.00 14 / 0.571 19 / 0.7376 Lun Ai et al.(a) The accuracy of verbal responses in-creases with respect to the number of prim-itives covered. (b) The proportion of quality verbal re-sponses decreases with respect to the num-ber of primitives covered. Fig. 6: win _3 reuses win _2 and uses four number _ of _ pairs/ MSR and
MMR ) had lower proportionsof responses covering one predicate than student groups (
SSR and
SMR ). Mixedbackground and student groups could not provide a significant proportion of responsecovering more than one and two primitives respectively (Figure 6a).Table 6: Hypotheses concerning quality of verbal responses and comprehension.C stands for c onfirmed, N denotes n ot confirmed, H stands for h ypothesis. Testoutcomes are presented for win _1, win _2 and win _3 categories. H win _1 win _2 win _3H1 Human comprehensions manifest in verbal response quality C C CH2 Difficulty for human participants to provide verbal responseincreases with verbal response quality C C CH3 Machine learned theory improves verbal response quality N C N increasing difficulty to provide more complete strategy descriptions beyond two(mixed background groups) and four (student groups) clauses of win _3.5.4 DiscussionResults concerning null hypotheses H1 to H5 are summarised in Table 6 and 7. First,we assume that (H1 Null) comprehension does not correlate with verbal responsequality. Results of HQ responses in two categories (Table 5) suggest that being able toprovide better verbal responses of strategy corresponds to a high comprehension. Wealso examined the coverage of primitives (specifically for LQ responses of win _3) inverbal responses (Figure 6a). Evidence in all categories shows a correlation betweencomprehension and the degree of verbal response matching with explanations. We reject the null hypothesis in all categories which implies the confirmation of H1.In addition, we assume that (H2 Null) the difficulty for human participants toprovide verbal response is not affected by verbal response quality. Since high responsequality is difficult to achieve (Table 5) and it is challenging to correctly describe allprimitives (Figure 6b), we reject this null hypothesis for all categories and confirm H2 eneficial and Harmful Explanatory Machine Learning 17 Table 7: Hypotheses concerning cognitive window and explanatory effects. C standsfor c onfirmed, H stands for h ypothesis, T stands for t est outcome. H TH4 Learning a complex theory exceeding the cognitive bound leads to a harmfulexplanatory effect CH5 Applying a learned theory without a low cognitive cost does not lead to a beneficialexplanatory effect C as it is increasingly difficult for participants to provide higher quality verbal response.Hence, two additional trends we observed from the same figure suggest two mentalbarriers of learning. As we assume a human sample is a collection of version spacelearners, the search space of participants is limited to programs of size two (mixedbackground groups) and four (student groups). When H is taken as the studentsample and P to be the machine learned theory on winning the Island Game, thecognitive bound B ( P, H ) = m ∗ p j +1) = 4 ∗ corresponds to the hypothesisspace size for programs with four clauses (four metarules are used with at most twobody literals in each clause, primitives are move/ number _ of _ pairs/ win _2. Thus, for win _2, we reject this null hypothesis which means H3 is confirmed in category win _2 where the machine explanations result in more high quality verbal responses being provided.We assume that (H4 Null) learning a descriptively complex theory does not affectcomprehension harmfully. When P is the program learned by M IP lain , B ( P, H ) fortwo samples correspond to program class with size no larger than 4. Only win _3 whichhas a larger size of seven after unfolding exceeds these cognitive bounds. As harmfuleffects (Figure 5a and 5b) have been observed in category win _3, this null hypothesisis rejected and H4 is confirmed as learning a complex machine learned theory has aharmful effect on comprehension. We also assume that (H5 Null) applying a theorywithout a sufficiently low cognitive cost has a beneficial effect on comprehension.Given that the predicate win _1 in
M IP lain ’s learned theory does not have a lowcognitive cost, we reject this null hypothesis since no significant beneficial effecthas been observed. This null hypothesis is therefore rejected and we confirm H5 –knowledge application requiring much cognitive resource does not result in bettercomprehension.The performance analysis (Figure 5a) demonstrates a comprehension differencebetween self learning and machine-aided learning in category win _2. An explanatoryeffect has not been observed for the student sample. While the conflicting resultssuggest that a larger sample size would likely ensure consistency of statistical evidence,the patterns in results suggest more significant results in category win _2 than win _1and win _3. The predicate win _2 in the program learned by
M IP lain satisfies bothconstraints on hypothesis space bound for knowledge acquisition and cognitive cost for knowledge application. In addition, the cognitive window explains the lack ofbeneficial effects of predicates win _1 and win _3. The former does not have a lowercognitive cost for execution so that operational errors cannot be reduced, thus therehas been no observable effects. The latter is a complex rule with a larger hypothesisspace for human participants to search from and harmful effects have been observeddue to partial knowledge being learned. win_1(A,B):-move(A,B),win_1_1(B).win_1_1(A):-number_of_pairs(A,x,1),number_of_pairs(A,x,1).
Fig. 7: Left: participant’s chosen move from the initial position in Figure 4. Right:
M etagol one-shot learns from participant’s move a program representing his strategy.The learned program denotes strategy of finding a pair rather than going for a directwin, which is a mismatch between taught and learned knowledge.
While the focus of explainable AI approaches has been on explanations of clas-sifications [1], we have investigated explanations in the context of game strategylearning. In addition, we have explored both beneficial and harmful sides of the AI’sexplanatory effect on human comprehension. Our theoretical framework involves acognitive window to account for the properties of a machine learned theory thatlead to improvement or degradation of human performance. The presented empir-ical studies have shown that explanations are not helpful in general but only ifthey are of appropriate complexity – being neither informatively overwhelming normore cognitively expensive than the solution to a problem itself. It would appearthat complex machine learning models and models which cannot provide abstractdescriptions of internal decisions are difficult to be explained effectively. However,we acknowledge the limitation of our empirical studies in terms of consistency ofstatistical evidence as groups vary greatly in sample size which might be addressedwith further experimentation.To explain a strategy, typically goals or sub-goals must be related to actionswhich can fulfill these goals. If the strategy involves to keep in mind a stack of opensub-goals – as for example the Tower of Hanoi [2,46] – explanations might becomemore complex than figuring out the action sequence. Based on [8], knowledge islearned by humans in an incremental way, which was recently emphasized by [58] onhuman category learning. A potential approach to improve explanatory effectivenessof a machine learned theory is to process complex concepts into smaller chunks byinitially providing simple-to-execute and short sub-goal explanations. Mapping inputto another sub-goal output thus consumes lower cognitive resources and improvementin performance is more likely. It is worth investigating for future work a teachingprocedure involving a sequence of teaching sessions that issues increasingly difficulttasks and explanations. Abstract descriptions might be generated in the form ofinvented predicates as it has been shown in previous work on ILP as an approachto USML [34]. An example for such an abstract description for the investigatedgame is the predicate number _ of _ pairs/
3. Therefore, learning might be organisedincrementally, guided by a curriculum [6,52].
In addition, the current teaching procedure, which only specifies humans aslearners, could be augmented to enable two-way learning between human and machine.Human decisions might be machine learned and explanations would be provided basedon estimation of human errors during the course of training. A simple demonstrationof this idea is presented in Figure 7. We would like to explore, in the future, aninteractive procedure in which a machine iteratively re-teaches human learners by eneficial and Harmful Explanatory Machine Learning 19 targeting human learning errors via specially tailored explanations. [7] suggested it iscrucial for machine produced clones to be able to represent goal-oriented knowledgewhich is in a form that is similar to human conceptual structure. Hence, MIL is anappropriate candidate for cloning since it is able to iteratively learn complex conceptsby inventing sub-goal predicates. We hope to incorporate cloning to predict andtarget mistakes in human learned knowledge from answers in a sequence of re-training.We expect a "clean up" on operation errors of human behaviours from empiricalexperiments by presenting appropriate explanations in re-training. Such correctionsand improvements guided by identified errors in a human strategy are also helpful inthe context of intelligent tutoring [57] where classic strategies such as algorithmicdebugging [48] can be applied to make humans and machines learn from each other.
Acknowledgements
The contribution of the authors from University of Bamberg is part of a project fundedby the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) –405630557 (PainFaceReader). The second author acknowledges support from theUK’s EPSRC Human-Like Computing Network, for which he acts as director.
References
1. A. Adadi and M. Berrada. Peeking inside the black-box: A survey on explainable artificialintelligence (XAI).
IEEE Access , 6:52138–52160, 2018.2. E. Altmann and J. G. Trafton. Memory for goals: An activation-based model.
CognitiveScience , 26:39–83, 2002.3. J. R. Anderson, N. Kushmerick, and C. Lebiere.
Rules of the Mind , chapter The Tower ofHanoi and goal structures, pages 121–142. Hillsdale, NJ: L. Erlbaum, 1993.4. J. R. Anderson and R. Thompson.
Use of Analogy in a Production System Architecture ,page 267–297. Cambridge University Press, USA, 1989.5. M. Bain and S. H. Muggleton.
Learning Optimal Chess Strategies , pages 291–309. OxfordUniversity Press, Inc., New York, NY, USA, 1995.6. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In
Proceedingsof the 26th Annual International Conference on Machine Learning , pages 41–48, 2009.7. I. Bratko, T. Urbančič, and C. Sammut. Behavioural cloning: Phenomena, results andproblems.
IFAC Proceedings Volumes , 28(21):143–149, 1995.8. J. S. Bruner, J. J. Goodnow, and G. A. Austin.
A study of thinking: With an appendix onlanguage by Roger W. Brown.
New York, NY: Wiley, 1956.9. J. Carbonell. Derivational analogy: A theory of reconstructive problem solving and expertiseacquisition.
Machine Learning , 11:26, 1985.10. P. Carpenter, M. Just, and P. Shell. What one intelligence test measures: A theoreticalaccount of the processing in the raven progressive matrices test.
Psychological review ,97:404–431, 1990.11. N. Chomsky.
Aspects of the theory of syntax . Cambridge: M.I.T. Press, 1965.12. A. Cropper and S. H. Muggleton. Metagol system. https://github.com/metagol/metagol,2016.13. A. Cropper and S. H. Muggleton. Learning efficient logic programs.
Machine Learning ,108:1063–1083, 2019.14. S. Džeroski, L. De Raedt, and K. Driessens. Relational reinforcement learning.
MachineLearning , 43:7–52, 2001.15. D. Gentner and R. Landers. Analogical reminding: A good match is hard to find.
Proceedingsof the International Conference on Systems, Man and Cybernetics , 1985.16. S. A. Goldman and M. J. Kearns. On the complexity of teaching.
Journal of Computerand System Sciences , 50:20–31, 1995.17. M. D. Hauser, N. Chomsky, and W. T. Fitch. The faculty of language: what is it, who hasit, and how did it evolve?
Science , 298:1569–1579, 2002.0 Lun Ai et al.18. M. Hind, D. Wei, M. Campbell, N. Codella, A. Dhurandhar, and A. e. a. Mojsilovic. Ted:Teaching ai to explain its decisions.
Proceedings of the 2019 AAAI/ACM Conference onAI, Ethics, and Society , 2019.19. J. R. Hobbs. Abduction in natural language understanding.
In The Handbook of Pragmatics(eds L.R. Horn and G. Ward) , 2008.20. K. J. Holyoak and K. Koh. Surface and structural similarity in analogical transfer.
Memory& Cognition 15(4) , pages 332–340, 1987.21. P. N. Johnson-Laird.
Mental Models: Towards a Cognitive Science of Language, Inference,and Consciousness . Harvard University Press, USA, 1986.22. A. N. Kolmogorov. On tables of random numbers.
Sankhya: The Indian Journal ofStatistics, Series A, , 207(25):369–375, 1963.23. E. Lemke, H. Klausmeier, and C. Harris. Relationship of selected cognitive abilitiesto concept attainment and information processing.
Journal of educational psychology ,58:27–35, 1967.24. D. Lin, E. Dechter, K. Ellis, J. Tenenbaum, and S. H. Muggleton. Bias reformulation forone-shot function induction.
In Proceedings of the 23rd European Conference on ArtificialIntelligence (ECAI 2014) , pages 525–530, 2014.25. D. Michie. Experiments on the mechanization of game-learning part i. characterization ofthe model and its parameters.
The Computer Journal, Volume 6, Issue 3 , pages 232–236,1963.26. D. Michie. Inductive rule generation in the context of the fifth generation.
MachineLearning Workshop , page 65, 1983.27. D. Michie, M. Bain, and J. Hayes-Michie. Cognitive models from sub cognitive skills.
Knowledge-Based Systems in Industrial Control , pages 71–99, 1990.28. D. Michie and R. Camacho. Building symbolic representations of intuitive real-time skillsfrom performance data. In
Machine Intelligence , 1992.29. G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacityfor processing information.
The Psychological Review , 63:81–97, 1956.30. T. Miller. Explanation in artificial intelligence: Insights from the social sciences.
ArtificialIntelligence , 267:1–38, 2019.31. T. Miller, P. Howe, and L. Sonenberg. Explainable ai: Beware of inmates running theasylum or: How i learnt to stop worrying and love the social and behavioural sciences.
Proc. IJCAI Workshop Explainable Artif. Intell. Melbourne, Australia. , 2017.32. T. M. Mitchell. Generalization as search.
Artificial Intelligence , 18:203–226, 1982.33. V. Mnih, K. Kavukcuoglu, and D. e. a. Silver. Human-level control through deep reinforce-ment learning.
Nature , 518:529–533, 2015.34. S. Muggleton, U. Schmid, C. Zeller, A. Tamaddoni-Nezhad, and T. Besold. Ultra-strongmachine learning: comprehensibility of programs learned with ilp.
Machine Learning , 2018.35. S. H. Muggleton. Inductive logic programming.
New Gen. Comput. , 8:295–318, 1991.36. S. H. Muggleton and C. Hocquette. Machine discovery of comprehensible strategies forsimple games using meta-interpretive learning.
New Generation Computing , 37:203–217,2019.37. S. H. Muggleton and D. Lin. Meta-interpretive learning of higher-order dyadic datalog:Predicate invention revisited.
In Proceedings of the 23rd International Joint ConferenceArtificial Intelligence , pages 1551–1557, 2013.38. S. H. Muggleton, D. Lin, N. Pahlavi, and A. Tamaddoni-Nezhad. Meta-interpretive learning:application to grammatical inference.
Machine Learning , pages 25–49, 2014.39. A. Newell.
Unified Theories of Cognition . Harvard University Press, USA, 1990.40. L. Novick and K. Holyoak. Mathematical problem solving by analogy.
Journal of experi-mental psychology. Learning, memory, and cognition , 17:398–415, 1991.41. J. Quinlan.
Learning Efficient Classification Procedures and Their Application to ChessEnd Games , pages 463–482. Springer Berlin Heidelberg, Berlin, Heidelberg, 1983.42. J. R. Quinlan. Simplifying decision trees.
International Journal of Man-Machine Studies ,27:221–234, 1987.43. A. N. Rafferty, E. Brunskill, T. L. Griffiths, and P. Shafto. Faster teaching via pomdpplanning.
Cognitive science , pages 1290–1332, 2016.44. S. K. Reed, C. C. Ackinclose, and A. A. Voss. Selecting analogous problems: Similarityversus inclusiveness.
Memory & Cognition 18(1) , pages 83–98, 1990.45. U. Schmid and J. Carbonell. Empirical evidence for derivational analogy.
Proceedings ofthe 21st annual conference of the cognitive science society , 2000.eneficial and Harmful Explanatory Machine Learning 2146. U. Schmid and E. Kitzelmann. Inductive rule learning on the knowledge level.
CognitiveSystems Research , 12:237–248, 2011.47. A. Shapiro and T. Niblett. Automatic induction of classification rules for a chess endgame.In M. Clarke, editor,
Advances in Computer Chess , volume 3, pages 73–91. Pergammon,Oxford, 1982.48. E. Y. Shapiro. Algorithmic program debugging. acm distinguished dissertation, 1982.49. E. Shohamy. Performance and competence in second language acquisition.
Competenceand performance in language testing , pages 138–151, 1996.50. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, and G. e. a. van den Driessche.Mastering the game of go with deep neural networks and tree search.
Nature , 529(7587):484–489, 2016.51. H. A. Simon and J. R. Hayes. The understanding process: Problem isomorphs.
CognitivePsychology 8 , pages 165–190, 1976.52. J. A. Telle, J. Hernández-Orallo, and C. Ferri. The teaching size: computable teachers andlearners for universal languages.
Machine Learning , 108:1653–1675, 2019.53. T. Urbančič and I. Bratko. Reconstructing human skill with machine learning.
Proceedingsof the 11th European Conference on Artificial Intelligence , pages 498–502, 1994.54. C. Watkins.
Learning from Delayed Rewards . PhD thesis, 1989.55. T. Zahavy, N. B. Zrihem, and S. Mannor. Graying the black box: Understanding dqns.
Proceedings of the 33rd International Conference on Machine Learning , 2016.56. V. F. Zambaldi, D. C. Raposo, A. Santoro, V. Bapst, Y. Li, and I. e. a. Babuschkin. Deepreinforcement learning with relational inductive biases. In
ICLR , 2019.57. C. Zeller and U. Schmid. Automatic generation of analogous problems to help resolvingmisconceptions in an intelligent tutor system for written subtraction. In
WorkshopsProceedings for the Twenty-fourth International Conference on Case-Based Reasoning ,volume 1815, pages 108–117, 2016.58. C. Zeller and U. Schmid. A human like incremental decision tree algorithm: Combiningrule learning, pattern induction, and storing examples. In
LWDA , 2017.59. X. Zhu. Machine teaching: An inverse problem to machine learning and an approach towardoptimal education. In