[PDF] Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Abstract

We study reinforcement learning (RL) for text-based games, which are interactive simulations in the context of natural language. While different methods have been developed to represent the environment information and language actions, existing RL agents are not empowered with any reasoning capabilities to deal with textual games. In this work, we aim to conduct explicit reasoning with knowledge graphs for decision making, so that the actions of an agent are generated and supported by an interpretable inference procedure. We propose a stacked hierarchical attention mechanism to construct an explicit representation of the reasoning process by exploiting the structure of the knowledge graph. We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.

Full PDF

DDeep Reinforcement Learning with StackedHierarchical Attention for Text-based Games

Yunqiu Xu ∗ University of Technology Sydney

[email protected]

Meng Fang ∗ Tencent Robotics X [email protected]

Ling Chen

University of Technology Sydney

[email protected]

Yali Du

University College London [email protected]

Joey Tianyi Zhou

IHPC A*STAR [email protected]

Chengqi Zhang

University of Technology Sydney

[email protected]

Abstract

We study reinforcement learning (RL) for text-based games, which are interactivesimulations in the context of natural language. While different methods have beendeveloped to represent the environment information and language actions, existingRL agents are not empowered with any reasoning capabilities to deal with textualgames. In this work, we aim to conduct explicit reasoning with knowledge graphsfor decision making, so that the actions of an agent are generated and supported byan interpretable inference procedure. We propose a stacked hierarchical attentionmechanism to construct an explicit representation of the reasoning process byexploiting the structure of the knowledge graph. We extensively evaluate ourmethod on a number of man-made benchmark games, and the experimental resultsdemonstrate that our method performs better than existing text-based agents.

Language plays a core role in human intelligence and cognition [14, 43]. Text-based games [13, 20],where both the states and actions are described by textual descriptions, are suitable simulationenvironments for studying the language-informed decision making process. These games can beregarded as an intersection of natural language processing (NLP) and reinforcement learning (RL)tasks [35]. To solve text-based games via RL, the agent has to tackle many challenges, such aslearning representation from text [42], making decisions based on partial observations [4], handlingcombinatorial action space [57] and sparse rewards [56]. Generally, existing agents for text-basedgames can be classiﬁed as rule-based agents and learning-based agents. Rule-based agents, such asNAIL [21], solve the games based on pre-deﬁned rules, engineering tricks, and pre-trained languagemodels. By heavily relying on prior knowledge of the games, these agents lack ﬂexibility andadaptability. With the progress of deep reinforcement learning [38, 39], learning-based agents suchas LSTM-DRQN [42] become increasingly popular since they learn purely from interaction withoutrequiring expensive human knowledge as prior. Recently, considering the rich information that canbe maintained by its structural memory, knowledge graphs (KGs) have been incorporated into RLagents to facilitate solving text-based games [1, 4, 3]. ∗ Equal contribution.34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. a r X i v : . [ c s . L G ] D ec hile a lot of studies have been conducted on representing useful information from text observations[3, 4, 42] and reducing action spaces [20, 57], few RL agent addresses the reasoning process fortext-based games. Going beyond mapping a question to an answer, human beings have the abilityof reasoning − they can reuse the knowledge [50], or compose the supporting facts (e.g., therelation between objects in the scene) from the question and the knowledge base to interpret theanswer [10, 30]. We believe that RL agents empowered with reasoning capabilities will be bettermimicking human decisions in solving text-based games and achieving enhanced performance. Interms of RL agents, we consider enhancing the reasoning capability of the agent by exploiting KGs.While existing studies [3, 4, 58] treat KGs as a part of the observation to handle partial observability,they ignore the potential of KGs for reasoning [12, 27]. Furthermore, the effectiveness of reasoningis constrained by two problems. Firstly, existing KG-based agents construct one single KG, so thatﬁne-grained information (e.g., the types of object relationship, the newness/oldness of information) ishard to be maintained. Secondly, the multi-modal inputs, such as textual observations and KGs, areaggregated via simple concatenation so that their respective beneﬁts cannot be sufﬁciently exploited.We believe that an intelligent agent should have the ability to conduct explicit reasoning with relationaland temporal awareness being taken into consideration to make decisions. In this paper, our goalis to design an enhanced RL agent with a reasoning process for text-based games. We propose anew method, named as S tacked H ierarchical A ttention with K nowledge G raphs (SHA-KG) , toenable the agent to perform multi-step reasoning via a hierarchical architecture on playing games.Brieﬂy, to leverage the structure information of a KG that maintains the agent’s knowledge about thegame environment, we ﬁrst consider the sub-graphs of the KG with different semantic meanings sothat relational and temporal awareness will be taken into account. Secondly, a stacked hierarchicalattention module is devised to build effective state representation from multi-modal inputs, so thattheir respective importance will be considered.Our contributions include four aspects. Firstly, our work is a ﬁrst step in pursuing reasoning in solvingtext-based games. Secondly, we propose to incorporate sub-graphs of the KG into decision making tointroduce the reasoning process. Thirdly, we propose a new stacked hierarchical attention mechanismfor RL approach featured by multi-level and multi-modal reasoning. Fourthly, we extensively evaluateour method on a wide range of text-based benchmark games, achieving favorable results comparedwith the state-of-the-art methods. Agents for text-based games.

Existing agents either perform based on predeﬁned rules or learn tomake responses by interacting with the environment. Rule-based agents [8, 16, 21, 31] attempt tosolve text-based games by injecting heuristics. They are thus not ﬂexible since a huge amount ofprior knowledge is required to design rules [20]. Learning-based agents [2, 20, 22, 26, 42, 55, 56, 57]usually employ deep reinforcement learning algorithms to deliver adaptive game solving strategies.However, the performance of these agents is still not up to par when playing complex man-madegames, even though efforts have been made to reduce the difﬁculty (e.g., DRRN [22] assumed thatan action can only be selected from a valid action set for each state). KG-based agents have beendeveloped to enhance the performance of learning-based agents with the assistance of KGs. KGs canbe constructed by simple rules so that it substantially reduces the amount of prior knowledge requiredby rule-based agents. While KGs have been leveraged to handle partial observability [3, 4, 58],reduce action space [3, 4], and improve generalizability [1, 5], few of the existing works addresses itspotential for reasoning. Recently, Murugesan et al. [41] tried to introduce commonsense reasoningfor playing synthetic games. They extracted sub-graphs from ConceptNet [44], which is a large-scaleexternal knowledge base with millions of edges and nodes. In contrast, we aim to construct the KGbased on domain information with minimal external knowledge. Besides, we focus on man-madegames which are more complex than synthetic games in terms of logic, so that the reasoning abilitybecomes especially crucial and desirable to the agents.

Attention mechanism.

Attention mechanism has been widely studied in areas of machine learning,psychology and neuroscience [33]. For text-based games, self-attention [45] has been applied toencode textual observation [1, 58], and Graph Attention Networks (GATs) [46] has been employed toencode KGs [3]. Regarding model explainability, the attention mechanism helps to solve the outcome Our code is available at https://github.com/YunqiuXu/SHA-KG . Reasoning via knowledge graph.

A lot of existing studies have used KGs as the knowledge base tofacilitate learning and interpretation, including incorporating KG-based commonsense reasoning forquestion answering [9, 15, 32, 61] and recommendation [47, 48, 52, 60]. We are the ﬁrst to exploitKGs to induce reasoning for RL-based agents playing text-based games.

POMDP

Partially Observable Markov Decision Processes (POMDPs) can be deﬁned as a 7-tuple:the state set S , the action set A , the state transition probabilities T , the reward function R , theobservation set Ω , the conditional observation probabilities O and the discount factor γ ∈ (0 , .At each time step, the agent will receive an observation o t ∈ Ω , depending on the current stateand previous action via the conditional observation probability O ( o t | s t , a t − ) . By executing anaction a t ∈ A , the environment will transit into a new state based on the state transition probability T ( s t +1 | s t , a t ) , and the agent will receive the reward r t +1 = R ( s t , a t ) . Same as Markov DecisionProcess (MDPs), the goal of the agent is to learn an optimal policy π ∗ to maximize the expectedfuture discounted sum of rewards from each time step: R t = E [ (cid:80) ∞ k =0 γ k r t + k +1 ] . KG Knowledge Graph (KG) for a text-based game can be built from a set of triplets (cid:104)

Subject , Relation , Object (cid:105) , denoting that the

Subject has

Relation with the

Object . For example, (cid:104)

Kitchen , Has , Food (cid:105) . The KG is denoted as G = ( V, E ) , where V and E are the node set and the edge set,respectively. Both Subject and

Object belong to the node set V . Relation , which corresponds to theedge connecting them, belongs to E . In this work, we focus on man-made games, which are initially designed for human players [20].These games are devised with more complex logic and much larger action space than syntheticgames [13]. Text-based games require an agent to make automatic responses to achieve speciﬁc goals(e.g., escaping from the dungeon) based on received textual information. Raw textual observationcontains only the feedback of taking an action (e.g., “Taken” is a textual observation after executingthe action “take egg”). As underlying states can not be directly observed by the agent, the text-basedgames can be formulated as POMDPs. Similar to [3], at every step we construct an input s t as thecombination of three components: a textual observation o t, text , a collected raw score o t, score , and aKG o t, KG (note that here s t should not be regarded as a true game state as the games are not fullyobservable). o t, text further includes the current state o t, desc (describing the environment), inventory o t, inv (describing items collected by a player), game feedback o t, feed , and previous action taken a t − .Fig. 1 (a) shows an example of o t, text .While o t, text and o t, score mainly reﬂect the current observation, o t, KG records the game history.Therefore, the KG can help the agent to handle partial observability. At each time step, the triplesextracted from the current textual observation o t, text are used to update the KG as, o t, KG = GraphUpdate ( o t − , KG , o t, text ) (1)Fig. 1 (b) shows an example of o t, KG and how it updates. We provide details of constructing andupdating the KG in Sec. 5.2. As discussed, existing KG-based agents build only one knowledge graph [1, 3, 4, 58]. To introducerelational-awareness and temporal-awareness, in this work, we divide our KG as multiple sub-graphs.3 eedback: You take the pair of glasses and place them under your costume. You had better hurry; the steady gurgling is becoming harder and harder for you to maintain. [Your score has just gone up by five points.]Past action: take glasses.Inventory: You are carrying a pair of glasses , a

ZM$100000 , a Multi-

Implementeers , a Forever

Gores , a Baby

Rune , a pair of razor-like gloves , a glowing/fur-covered body suit , a fish-mouthed mask , a cheaply-made sword .Description:

Convention Hall . You are in attendance at the annual Grue Convention, this year a rather somber affair due to the "adventurer famine" that has gripped gruedom in this isolated corner of the empire. All around you, grues are standing; some in conversation, some drinking, some even paying attention to the speaker. There is a trash chute in one of the walls, and a reader board hangs nearby. You can see some grues , a hat , some clothes and some chewed-up shoes here. o t,text (a) Textual observation o t,KG,3 youimplementeers goressuit maskswordrune have havehave zm$100000glasses gloves o t,KG, 4 lampconvention center lobbychanging roomsouvenir stand cultural complex royal theaterhall of sciencetunnelentrance south of east ofnorth ofnortheast ofnortheast ofsoutheast ofsouth ofboulders has haspopcornpostcardschicken finger hashashas o t,KG,2 gruestrash chutereader boardyouconvention hallhas inhatclothesshoes has o t,KG,1 youconvention hallconvention center lobbychanging roomsouvenir stand cultural complex royal theaterhall of sciencetunnelentrance south of east ofnorth ofnortheast ofnortheast ofsoutheast ofsouth of north ofin gruestrash chutereader boardhatclothesshoeslampyouconvention hallimplementeers goressuit maskswordconvention center lobbychanging roomsouvenir stand cultural complex royal theaterhall of sciencetunnelentrance south of east ofnorth ofnortheast ofnortheast ofsoutheast ofsouth of north ofboulders has haspopcornpostcardschicken finger hashashas hasinrune have havehave o t,KG zm$100000glasses gloves (b) Knowledge graph(c) Sub-graphs grues trash chutereader boardhatclothes shoesconvention hallnorth ofhas Figure 1: (a) Textual observation o t, text . (b) Knowledge graph o t, KG . Yellow region in o t, KG containsinformation extracted from the observation o t, text . (c) Sub-graphs obtained from o t, KG .Inspired by the heterogeneous graph [49, 59] where a graph contains different types of nodes andedges, we ﬁrst classify edges by their types (e.g., “ Has ” and “

East of ” can be regarded as differenttypes), and then build relational-aware sub-graphs based on the edges. In addition, as the KG can notdistinguish between the current and past information, we introduce temporal-awareness via buildingdifferent sub-graphs based on whether the historical information is included (e.g., sub-graphs builtfrom o t, text only and sub-graphs built from o t, text and o t − , KG ). The union of all sub-graphs is thefull KG: o t, KG = o t, KG,1 ∪ o t, KG,2 ... ∪ o t, KG,m-1 ∪ o t, KG,m (2)where m denotes the number of sub-graphs. From the perspective of hierarchical learning [54, 62],the sub-graph division allows observations to be considered in two levels: In the high level, the fullKG captures the overall node connectivity. In the low level, the sub-graphs reﬂect different relationaland temporal relations. Fig. 1 (c) shows an example of the sub-graphs obtained from o t, KG . Before action selection, we represent the input s t as a vector v t ﬁrst. We omit the subscript“t” for simplicity, and denote the KG as o KG,full to distinguish it from the sub-graphs. Since thetextual observation, score and knowledge graph are multi-modal inputs, inspired by the VQAtechniques [29, 34], we aggregate the inputs by constructing query representation for one modal (ortwo) to obtain the attention of another modal, through a stacked hierarchical attention mechanism.Fig. 2 shows an overview of our encoder, which consists of two levels. In the high level, we build aquery vector from the KG and score, then compute multiple groups of attention values across thecomponents of textual observations. In the low level, we treat the output of the high level as a query,and compute attention values across the sub-graphs.

High-level attention

Similar to KG-A2C [3], the KG o KG,full is processed via GATs [46] followed bya linear layer to get the graph representation v KG, full ∈ R d KG . The score representation v score ∈ R d score is obtained via binary encoding. While previous works [3, 20] concatenated all observational vectorsto form state representation, we build the query vector q high ∈ R d high by concatenating v KG, full and4

KG, full o KG,1 o KG,2 o KG,3 o KG,4

GATGATGATGAT v KG, subs

AttnGAT v KG, full ? low Sum v t o text, desc o text, inv o text, feed a t-1 GRUGRUGRUGRU v text o score BinaryEnc v score Concat q high Attn ? high Sum q low High level Low level

Figure 2: Overview of our stacked hierarchical attention network. In the high level (left), thequery vector q high is the combination of the KG representation v KG, full and the score representation v score . Then multiple groups of attention values α high are computed across the components of textualobservation v text . In the low level (right), the query vector q low is the output of high level encoding,and multiple groups of attention values α low are computed across the sub-graphs. The ﬁnal output v t is served as the state representation for action selection. v score followed by a linear layer: q high = W Init concat ( v KG, full , v score ) + b Init (3)where W Init ∈ R d high × ( d KG + d score ) and b Init ∈ R d high are weights and biases.Suppose the textual observation consists of c components , we ﬁrst encode them separately by c GRUs.Instead of concatenating, we consider them individually to build the textual representation vector v text ∈ R d high × c . Therefore v text can be treated as multiple image regions or image representationwith multiple channels. Inspired by SCA-CNN [11], we compute attention values in channel-wise.However, instead of computing one attention value for each channel, we compute multiple groups ofattention values to capture more ﬁne-grained information. Speciﬁcally, one group of attention valuesis computed for each position along the channel: α high = softmax ( W A,high h high + b A,high ) (4)where h high = tanh ( W I,high v text ⊕ ( W Q,high q high + b Q,high )) (5)denotes the intermediate representation and ⊕ denotes the addition of a matrix and a vector. W I,high ∈ R d high × d high , W Q,high ∈ R d high × d high and W A,high ∈ R d high × d high are weight matrices, b Q,high ∈ R d high and b A,high ∈ R d high are biases. This operation is equivalent to dividing v text as d high sub-vectors v text,sub ∈ R × c , then computing channel-wise attention for each of them. The obtained attentionvalues α high ∈ R d high × c reﬂect the multi-positional attentive focus on the textual components. Theﬁnal step of high-level encoding is to attentively aggregate the query vector with the textual vector. Inorder to enable multi-level reasoning, we leverage recent advances in attention techniques [17, 28, 53]to learn multi-step reasoning by iteratively updating the query. We ﬁrst multiply v text with α high viadot-product, then sum all the channels and add it to q high to obtain updated query vector q low ∈ R d high : q low = q high + c (cid:88) i α high ,i (cid:12) v text ,i (6) As discussed in 4.1, c is 4 in this work. ow-level attention The low-level encoding process is similar to high level, except that the attentionvalues are computed across different sub-graphs. We encode sub-graphs with different GATs andcombine them as graph representation v KG ∈ R d low × m , where d low denotes dimensionality. We treatthe output vector of high-level computing as a query vector, and perform linear transformation toensure q low ∈ R d low . Then we apply the similar attention mechanism of high-level: α low = softmax ( W A,low h low + b A,low ) (7)where h low = tanh ( W I,low v KG ⊕ ( W Q,low q low + b Q,low )) (8)denotes the intermediate representation. W I,low ∈ R d low × d low , W Q,low ∈ R d low × d low and W A,low ∈ R d low × d low are weight matrices, b Q,low ∈ R d low and b A,low ∈ R d low are biases. Finally, we aggregate q low with v KG based on the low-level attention values α low ∈ R d low × m to get state representation v t ∈ R d low : v t = q low + m (cid:88) i α low ,i (cid:12) v KG ,i (9) Given the state representation v t , the action selection can be performed via methodssuch as template-based scoring [20] and recurrent decoding [3]. In this work, we use the recurrentdecoding method to select actions via two GRUs. We ﬁrst use a template-GRU to predict a template u ∈ T based on v t , where T denotes the template set. Next, we recurrently execute an object-GRUfor k steps to decode objects { p i , i ∈ [1 , ..., k ] } from the object set P , which is the intersection ofthe vocabulary set V and the set of the objects appeared in o KG,full . The probability of an object p i isconditioned on both v t and the prediction of the last step (i.e., u or p t − ). Finally, the template andobjects are combined as action a t . Training.

Our model SHA-KG is trained via the Advantage Actor Critic (A2C) method [37] witha supervised auxiliary task “valid action prediction” [3]. We provide details in the supplementarymaterial.

We evaluate our method on a set of man-made games in Jericho game suite [20]. We conductexperiments to validate the effectiveness of our sub-graph division and stacked hierarchical attention,and interpret the reasoning and decision making processes.

We use KG-A2C [3] as the building backbone of SHA-KG. Following baselines are considered:• NAIL [21]: a general agent with hand-crafted rules and pre-trained language models.• DRRN [20, 22]: a choice-based agent that selects actions from a valid action set.• TDQN [20]: a parser-based agent with template-based action space.• KG-A2C [3]: an extension of TDQN with KGs and valid action predictions.

The triples for constructing the KG are extracted via Stanford’s Open In-formation Extraction (OpenIE) [6] and two additional rules used in [3]: 1) For interactive objectsdetected in the current observation, those within the inventory are linked to node “you”, while othersare linked to the current room. 2) The room connectivity is inferred from navigational actions. Re-garding the sub-graph division, we deﬁne four graph types: o KG,1 records the connectivity of visitedrooms, o KG,2 represents the objects within the current room, o KG,3 represents the objects withinthe inventory, and o KG,4 is the graph without any connection to “you”. o KG,2 and o KG,3 contain thepresent information only, while o KG,1 and o KG,4 contain both the current and historical information.Besides pre-deﬁned rules, the graph partitioning process can be implemented via automatic methods,which we leave as future work. 6able 1: Raw scores of SHA-KG and baselines. For the baselines, we use the results reported in theiroriginal paper [3, 20] except “reverb” and “tryst205” in KG-A2C, which are not reported. |T | and |V| denote the size of template set and vocabulary set.

MaxR denotes the maximum possible score(collected based on walkthrough without step limit). All results are averaged over ﬁve independentruns.

Game |T | |V|

NAIL DRRN TDQN KG-A2C SHA-KG (Ours) MaxR acorncourt 151 343 0

10 10 -3.5 -5.3 0 0.2 25enchanter 290 722 0 Training implementation.

We follow the hyper-parameter setting of KG-A2C [3] except that wereduce the node embedding dimension in GATs from 50 to 25 to reduce GPU cost. We set d high as 100,and d low as 50. For both SHA-KG and KG-A2C, the graph mask for action selection is constructedfrom o KG,full . We denote an action as “valid” if it does not lead to meaningless feedback in a state (e.g.,“Nothing happens”). An episode will be terminated after 100 valid steps or game over / victory. Foreach game, an individual agent is trained for interaction steps. The training data is collected from32 environments in parallel. An optimization step is performed per 8 interaction steps via the Adamoptimizer with the learning rate 0.003. All baselines follow their original implementations [3, 20, 21].To compare with the baselines, we report the average raw score over the last 100 ﬁnished episodesduring training. We also report the learning curve during ablation study. All the quantitative resultsare averaged over ﬁve independent runs. Table 1 shows the performance of SHA-KG and baselines in 20 games. The proposed SHA-KGachieves the new state-of-the-art results in 8 games and is equivalent to the current best baselines inanother 7 games. Comparing the three agents using a template-based action space (i.e., SHA-KG,TDQN, KG-A2C), SHA-KG obtains equal or better performance than TDQN and KG-A2C in allof the games, showing the effectiveness of introducing reasoning. The best result of the other 5games is still achieved by NAIL and DRRN, which apply additional rules and assumptions. NAILfollows nearly ﬁxed rules (e.g., interacting with all the objects, then exploring another room). Thoughthis design principle may be useful for some particular games (e.g., “dragon” and “zork3”), it lacksﬂexibility so that it performs worse than all learning-based agents in most of the games. DRRNlargely reduces the difﬁculty by selecting actions from the set of admissible actions only. For example,to obtain the ﬁrst reward “+10” in “acorncourt”, the agent has to select a high proportion of complexactions, in which case the assumption of an admissible action set is a large advantage. KG-A2Cand our SHA-KG relax this assumption but the action set is as large as O ( T P ) , making theminfeasible to achieve the ﬁrst reward stably. However, the reasoning ability still brings improvementcompared with the backbone model. While KG-A2C shows worse performance than DRRN in 9games, SHA-KG outperforms DRRN in 6 of these games, and shows closer scores in other 3 games. We perform ablation studies to validate the contributions of different components. We ﬁrst compareSHA-KG with its three variants with different attention modules:7 S c o r e inhumane SHA-KG (full)w/o GroupAttnw/o high-levelw/o low-levelKG-A2C 0 20000 40000 60000 80000 100000Steps0.02.55.07.510.012.5 S c o r e reverb SHA-KG (full)w/o GroupAttnw/o high-levelw/o low-levelKG-A2C 0 10000 20000 30000 40000 50000Steps510152025303540 S c o r e zork1 SHA-KG (full)w/o GroupAttnw/o high-levelw/o low-levelKG-A2C 0 10000 20000 30000 40000 50000Steps0.00.20.40.60.81.0 S c o r e zork3 SHA-KG (full)w/o GroupAttnw/o high-levelw/o low-levelKG-A2C 0 20000 40000 60000 80000 100000Steps51015202530 S c o r e ztuu SHA-KG (full)w/o GroupAttnw/o high-levelw/o low-levelKG-A2C

Figure 3: Learning curves of models with different attention modules. The dashed line in “ztuu”denotes KG-A2C’s result reported in [3]. The shaded regions indicate standard deviations. S c o r e inhumane SHA-KG (full)w/o relational-awarenessw/o temporal-awarenessw/o history 0 20000 40000 60000 80000 100000Steps0.02.55.07.510.012.5 S c o r e reverb SHA-KG (full)w/o relational-awarenessw/o temporal-awarenessw/o history 0 10000 20000 30000 40000 50000Steps510152025303540 S c o r e zork1 SHA-KG (full)w/o relational-awarenessw/o temporal-awarenessw/o history 0 10000 20000 30000 40000 50000Steps0.00.20.40.60.81.0 S c o r e zork3 SHA-KG (full)w/o relational-awarenessw/o temporal-awarenessw/o history 0 20000 40000 60000 80000 100000Steps51015202530 S c o r e ztuu SHA-KG (full)w/o relational-awarenessw/o temporal-awarenessw/o history

Figure 4: Learning curves of SHA-KG with different types of sub-graphs.• “w/o GroupAttn”: applies single attention value for each channel, i.e., α high ∈ R , α low ∈ R .• “w/o high-level”: constructs the initial query vector from v text and v score , then computesattention values across the sub-graphs. The full KG is not used.• “w/o low-level”: constructs the initial query vector from v KG, full and v score , then computesattention values across the textual components. The sub-graph division is not used.Fig. 3 shows the learning curves of 5 games, where following observations can be made: The fullmodel SHA-KG shows similar or better performance than all variants in all cases. Our two-levelattention mechanism provides an effective and explainable way to reﬁne information. The ﬁrst levelof hierarchy tells the agent which part of the textual information should be focused on. Based onthe output of this level, the second level of hierarchy informs the agent which part of a knowledgegraph should be targeting at. The variant “w/o high-level”, which considers sub-graphs only butnot the full graph (i.e., the pink curve), works for some games (e.g., “reverb” and “zork1”) but failsfor the others. For the games where the variant is able to achieve the best score, the sample efﬁciencyis also improved, which means that this variant requires fewer interaction steps to achieve the bestscore. For the variant “w/o low-level” that considers the full KG only but not sub-graphs (i.e.,the orange curve), the sample efﬁciency is a bit low in some games (e.g., “zork1”). The variant“w/o GroupAttn” generally shows a similar performance curve to SHA-KG by considering both thefull KG and sub-graphs. However, the performance gap between “w/o GroupAttn” and SHA-KGdemonstrates the effectiveness of computing multiple groups of attention values along the channels,which allows SHA-KG to capture more ﬁne-grained information from textual components (in highlevel) and sub-graphs (in low level). Overall, the learning curves demonstrate that the sub-graphdivision and stacked hierarchical attention provide complementary contributions to the performanceimprovement of SHA-KG.Regarding the contributions of different types of sub-graphs, we further design three variants withdifferent graph partitioning strategies:• “w/o relational-awareness” combines o KG,2 (room objects) and o KG,3 (collected objects).• “w/o temporal-awareness” combines o KG,4 with o KG,2 and o KG,3 , respectively.• “w/o history” removes all historical information.Fig. 4 shows the results and indicates that the effect of different types of awareness varies with respectto the games. No simple conclusion can be made regarding which type of awareness contributesthe most to the ﬁnal performance (e.g., “w/o relational-awareness” and “w/o temporal-awareness”behave differently in “zork1” and “zork3”). However, considering them collectively and learning tobalance their importance lead to the improved performance of our method.8 ouveStand babypopcorncounternorthimplementeerspostcardsnotice yougoresbrass O KG,2

KG,3

Description: You find yourself standing in front of a souvenir stand that is clearly affiliated with the Convention Center. A notice on the counter points toward a small, curtained room to the north. On the counter are two candy bars (a Baby Rune and a ZM$100000), some popcorn, some postcards and a chicken finger. You can see a

Multi-Implementeers here. Inventory: You are carrying a Forever Gores, a brass lanternFeedback: (first taking the

Multi-Implementeers ) [Your score has just gone up by one point.] Dropped.Past action: lower implement ztuu? high, top25sum ? low, top25sum a t / r t lower implement +1 O KG,1 exit_to_west exit_to_north exit_to_northeast exit_to_eastexit_to_south exit_to_upexit_to_northwest Souvenir_Standentrance_of_gue_convention_center narrow_tunnelConvention_Center_Lobby Tunnel New_ExcavationCultural_Complex

Figure 5: Illustration of the reasoning and decision making processes of the game “ztuu”. Left: α high,top25sum denotes the high-level attention for o text,desc , o text,inv , o text,feed and a t-1 . α low,top25sum denotes the low-level attention for o KG, 1 , o KG, 2 , o KG, 3 and o KG, 3 . The subscript “top25sum” denotesthe sum of 25 largest attention values within a channel (textual observation or sub-graph). a t is theselected action, and r t is the reward. Right: the extracted sub-graphs ( o KG,4 is omitted due to spacelimit). The sub-graph with the highest graph-level attention (by SHA) is in the red dashed box. Ineach sub-graph, the top 3 nodes with the highest attention (by GATs) are highlighted in yellow.

We interpret the reasoning and decision making process by examining the attentive focus. Althoughthere are controversial points about whether attention values are explainable when being appliedto text [25, 51], we believe our attention mechanism is beneﬁcial for interpretability. Compared toword-level attention, the stacked hierarchical attention is conducted in a manner more similar tothe region/channel-level attention mechanism, which has been proved to providing interpretabilityin vision tasks, especially visual-based RL tasks [18, 40] As multiple groups of attention valuesare computed (e.g., each sub-graph is associated with d low attention values), we aggregate themto facilitate explaining. Speciﬁcally, for each channel (e.g., textual component or sub-graph), wesum the top 25 largest values as its attention value, then perform softmax operation across thechannels . We denote the obtained attention values as α high, top25sum for the high level, whichcorrespond to (cid:104) o text, desc , o text, inv , o text, feed , a t-1 (cid:105) , and α low, top25sum for the low level, which correspondto (cid:104) o KG,1 , o KG,2 , o KG,3 , o KG,4 (cid:105) . Fig. 5 (left) shows a decision making example of the game “ztuu”. Inthe high level (textual components), the agent focuses mostly on the description o text, desc , and thenthe feedback o text, feed , which is followed by the last action a t-1 . All of the three text componentscontain “implement”. In the low level (sub-graphs), the sub-graph of objects in the current room( o KG,2 ) has the highest attention. Combining both two levels of attention, the agent ﬁnally selectsthe action “lower implement”, which receives a positive reward. It shows that the reasoning processenables the agent to select actions leading to positive rewards. Although the GATs in our work aremainly used for obtaining initial graph embeddings instead of assigning attention values, we alsovisualize the node-level attention within sub-graphs to help understand the reasoning process. Fig. 5(right) shows three extracted sub-graphs. The digit under the sub-graph denotes graph-level attention.Since the o KG,2 has the highest attention, the agent will focus more on objects it contains. In eachsub-graph, nodes with top-3 highest attention (by GATs) are highlighted in yellow. Such node-levelattention helps to further constrain (softly) the objects in o KG,2 to derive actions. We conclude thatour SHA helps the agent to use information efﬁciently for taking actions.

In this paper, we have studied empowering RL for text-based games with reasoning by exploitingknowledge graphs. We conducted sub-graph division to explicitly introduce relational-awarenessand temporal-awareness. Then we designed a stacked hierarchical attention mechanism to obtaineffective state representation from multi-modal inputs. Besides obtaining favorable experimentalresults in a wide range of man-made games, the sub-graph division and attention mechanism enableus to better interpret the reasoning and decision making processes of RL agents. We also conducted other aggregation methods such as “top10, sum”, “top25, mean” and “all, sum”, amongwhich we found that “top25, sum” can best interpret the processes. See the supplementary material. roader Impact The high-level goal of this work is to bridge artiﬁcial intelligence with human intelligence, cognitionand language learning. Researchers in reinforcement learning will beneﬁt from this work by theappropriate use of knowledge graphs. By recording and organizing the information in a structuralway, the difﬁculty of learning can be largely reduced. Besides, KGs can be used to conduct reasoningto interpret the decision making process. Researchers in multi-modal learning will also beneﬁt fromthe attention mechanism proposed in this work. Although the stacked hierarchical attention is usedto aggregate text representation and graph representation, it can also be extended to other formsof inputs such as visual and audio signals. Our work can also be served as an initial study beforeconducting experiments in real life and with animal / human participants, since it’s performed insimulated systems and there’s no safety consideration.Regarding the ethical implications, although currently our work is conducted in games, wherelanguage commands are constrained in limited action spaces, for more practical applications thissystem will be deployed with a richer corpus. From the perspective of language generation, theinappropriate use of generated language commands should be seriously taken into consideration.Another concern lies in the unintended behavior and decision making process, which may lead todangerous conditions in real world applications. Although we try to improve interpretability throughreasoning, there’s still a long way to go to make human-AI interaction in a safe and reasonable way.

Acknowledgements

We would like to thank the anonymous reviewers for helpful feedback. This research was partlysupported by ARC Discovery Project DP180100966.

References [1] Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, RomainLaroche, Pascal Poupart, Jian Tang, Adam Trischler, and William L Hamilton. Learning dynamicknowledge graphs to generalize on text-based games. arXiv preprint arXiv:2002.09127 , 2020.[2] Leonard Adolphs and Thomas Hofmann. Ledeepchef: Deep reinforcement learning agent for families oftext-based games. arXiv preprint arXiv:1909.01646 , 2019.[3] Prithviraj Ammanabrolu and Matthew Hausknecht. Graph constrained reinforcement learning for naturallanguage action spaces. In

International Conference on Learning Representations , 2020.[4] Prithviraj Ammanabrolu and Mark Riedl. Playing text-adventure games with graph-based deep reinforce-ment learning. In

Proceedings of the 2019 Conference of the North American Chapter of the Associationfor Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages3557–3565, 2019.[5] Prithviraj Ammanabrolu and Mark O Riedl. Transfer in deep reinforcement learning using knowledgegraphs. arXiv preprint arXiv:1908.06556 , 2019.[6] Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. Leveraging linguisticstructure for open domain information extraction. In

Proceedings of the 53rd Annual Meeting of theAssociation for Computational Linguistics and the 7th International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers) , pages 344–354, 2015.[7] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick,and Devi Parikh. Vqa: Visual question answering. In

Proceedings of the IEEE international conference oncomputer vision , pages 2425–2433, 2015.[8] Timothy Atkinson, Hendrik Baier, Tara Copplestone, Sam Devlin, and Jerry Swan. The text-basedadventure ai competition.

IEEE Transactions on Games , 2019.[9] Lisa Bauer, Yicheng Wang, and Mohit Bansal. Commonsense for generative multi-hop question answeringtasks. In

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ,pages 4220–4230, 2018.[10] Léon Bottou. From machine learning to machine reasoning.

Machine learning , 94(2):133–149, 2014.[11] Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, and Tat-Seng Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In , pages 6298–6306. IEEE, 2017.

12] Xiaojun Chen, Shengbin Jia, and Yang Xiang. A review: Knowledge reasoning over knowledge graph.

Expert Systems with Applications , 141:112948, 2020.[13] Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore,Matthew Hausknecht, Layla El Asri, Mahmoud Adada, et al. Textworld: A learning environment fortext-based games. 2018.[14] Daniel C Dennett. The role of language in intelligence. In

What is Intelligence? The Darwin CollegeLectures . Cambridge Univ. Press, 1994.[15] Ming Ding, Chang Zhou, Qibin Chen, Hongxia Yang, and Jie Tang. Cognitive graph for multi-hop readingcomprehension at scale. In

Proceedings of the 57th Annual Meeting of the Association for ComputationalLinguistics , pages 2694–2703, 2019.[16] Nancy Fulda, Daniel Ricks, Ben Murdoch, and David Wingate. What can you do with a rock? affordanceextraction via word embeddings. In

Proceedings of the Twenty-Sixth International Joint Conference onArtiﬁcial Intelligence, IJCAI-17 , pages 1039–1045, 2017.[17] Zhe Gan, Yu Cheng, Ahmed Kholy, Linjie Li, Jingjing Liu, and Jianfeng Gao. Multi-step reasoning viarecurrent dual attention for visual dialog. In

Proceedings of the 57th Annual Meeting of the Association forComputational Linguistics , pages 6463–6474, 2019.[18] Sam Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. Visualizing and understanding atari agents. arXiv preprint arXiv:1711.00138 , 2017.[19] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi.A survey of methods for explaining black box models.

ACM computing surveys (CSUR) , 51(5):1–42, 2018.[20] Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, and Xingdi Yuan. Interactive ﬁctiongames: A colossal adventure. arXiv preprint arXiv:1909.05398 , 2019.[21] Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, and Jason D Williams. Nail: Ageneral interactive ﬁction agent. arXiv preprint arXiv:1902.04259 , 2019.[22] Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, and Mari Ostendorf. Deepreinforcement learning with a natural language action space. In

Proceedings of the 54th Annual Meeting ofthe Association for Computational Linguistics (Volume 1: Long Papers) , pages 1621–1630, 2016.[23] Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to reason:End-to-end module networks for visual question answering. In

Proceedings of the IEEE InternationalConference on Computer Vision , pages 804–813, 2017.[24] Drew Arad Hudson and Christopher D. Manning. Compositional attention networks for machine reasoning.In

International Conference on Learning Representations , 2018.[25] Sarthak Jain and Byron C Wallace. Attention is not explanation. In

Proceedings of the 2019 Conferenceof the North American Chapter of the Association for Computational Linguistics: Human LanguageTechnologies, Volume 1 (Long and Short Papers) , pages 3543–3556, 2019.[26] Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, and Marc G Bellemare. Algorithmic improve-ments for deep reinforcement learning applied to interactive ﬁction. arXiv preprint arXiv:1911.12511 ,2019.[27] Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and Philip S Yu. A survey on knowledge graphs:Representation, acquisition and applications. arXiv preprint arXiv:2002.00388 , 2020.[28] Zhong Ji, Yanwei Fu, Jichang Guo, Yanwei Pang, Zhongfei Mark Zhang, et al. Stacked semantics-guidedattention model for ﬁne-grained zero-shot learning. In

Advances in Neural Information Processing Systems ,pages 5995–6004, 2018.[29] Jin-Hwa Kim, Jaehyun Jun, and Byoung-Tak Zhang. Bilinear attention networks. In

Advances in NeuralInformation Processing Systems , pages 1564–1574, 2018.[30] Wonjae Kim and Yoonho Lee. Learning dynamics of attention: Human prior for interpretable machinereasoning. In

Advances in Neural Information Processing Systems , pages 6019–6030, 2019.[31] Bartosz Kostka, Jaroslaw Kwiecieli, Jakub Kowalski, and Pawel Rychlikowski. Text-based adventures ofthe golovin ai agent. In , pages181–188. IEEE, 2017.[32] Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. Kagnet: Knowledge-aware graph networksfor commonsense reasoning. In

Proceedings of the 2019 Conference on Empirical Methods in NaturalLanguage Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages 2822–2832, 2019.[33] Grace W Lindsay. Attention in psychology, neuroscience, and machine learning.

Frontiers in ComputationalNeuroscience , 14:29, 2020.

34] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical question-image co-attention forvisual question answering. In

Advances in neural information processing systems , pages 289–297, 2016.[35] Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette,Shimon Whiteson, and Tim Rocktäschel. A survey of reinforcement learning informed by natural language.In

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence, IJCAI-19 ,pages 6309–6317. International Joint Conferences on Artiﬁcial Intelligence Organization, 7 2019.[36] David Mascharka, Philip Tran, Ryan Soklaski, and Arjun Majumdar. Transparency by design: Closing thegap between performance and interpretability in visual reasoning. In

Proceedings of the IEEE conferenceon computer vision and pattern recognition , pages 4942–4950, 2018.[37] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley,David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In

International conference on machine learning , pages 1928–1937, 2016.[38] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra,and Martin Riedmiller. Playing atari with deep reinforcement learning. In

NIPS Deep Learning Workshop .2013.[39] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare,Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie,Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and DemisHassabis. Human-level control through deep reinforcement learning.

Nature , 518(7540):529–533, 022015.[40] Alexander Mott, Daniel Zoran, Mike Chrzanowski, Daan Wierstra, and Danilo Jimenez Rezende. Towardsinterpretable reinforcement learning using attention augmented agents. In

Advances in Neural InformationProcessing Systems , pages 12329–12338, 2019.[41] Keerthiram Murugesan, Mattia Atzeni, Pushkar Shukla, Mrinmaya Sachan, Pavan Kapanipathi, and KartikTalamadupula. Enhancing text-based reinforcement learning agents with commonsense knowledge. arXivpreprint arXiv:2005.00811 , 2020.[42] Karthik Narasimhan, Tejas D Kulkarni, and Regina Barzilay. Language understanding for text-basedgames using deep reinforcement learning. In

Conference on Empirical Methods in Natural LanguageProcessing, EMNLP 2015 , pages 1–11. Association for Computational Linguistics (ACL), 2015.[43] Steven Pinker.

The language instinct: How the mind creates language . Penguin UK, 2003.[44] Robyn Speer, Joshua Chin, and Catherine Havasi. Conceptnet 5.5: An open multilingual graph of generalknowledge. In

Thirty-First AAAI Conference on Artiﬁcial Intelligence , 2017.[45] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. Attention is all you need. In

Advances in neural information processingsystems , pages 5998–6008, 2017.[46] Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio.Graph attention networks. In

International Conference on Learning Representations , 2018.[47] Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. Dkn: Deep knowledge-aware network fornews recommendation. In

Proceedings of the 2018 world wide web conference , pages 1835–1844, 2018.[48] Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, and Tat-Seng Chua. Explainablereasoning over knowledge graphs for recommendation. In

Proceedings of the AAAI Conference on ArtiﬁcialIntelligence , volume 33, pages 5329–5336, 2019.[49] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graphattention network. In

The World Wide Web Conference , pages 2022–2032, 2019.[50] Sam Wenke, Dan Saunders, Mike Qiu, and Jim Fleming. Reasoning and generalization in rl: A tool useperspective. arXiv preprint arXiv:1907.02050 , 2019.[51] Sarah Wiegreffe and Yuval Pinter. Attention is not not explanation. In

Proceedings of the 2019 Conferenceon Empirical Methods in Natural Language Processing and the 9th International Joint Conference onNatural Language Processing (EMNLP-IJCNLP) , pages 11–20, 2019.[52] Yikun Xian, Zuohui Fu, S Muthukrishnan, Gerard De Melo, and Yongfeng Zhang. Reinforcementknowledge graph reasoning for explainable recommendation. In

Proceedings of the 42nd InternationalACM SIGIR Conference on Research and Development in Information Retrieval , pages 285–294, 2019.[53] Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked attention networks for imagequestion answering. In ,pages 21–29. IEEE, 2016.[54] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attentionnetworks for document classiﬁcation. In

Proceedings of the 2016 conference of the North American chapterof the association for computational linguistics: human language technologies , pages 1480–1489, 2016.

55] Xusen Yin and Jonathan May. Comprehensible context-driven text game playing. , pages 1–8, 2019.[56] Xingdi (Eric) Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes,Matthew Hausknecht, and Adam Trischler. Counting to explore and generalize in text-based games. In

European Workshop on Reinforcement Learning (EWRL) , October 2018.[57] Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J Mankowitz, and Shie Mannor. Learn what not tolearn: Action elimination with deep reinforcement learning. In

Advances in Neural Information ProcessingSystems , pages 3562–3573, 2018.[58] Mikulas Zelinka, Xingdi Yuan, Marc-Alexandre Côté, Romain Laroche, and Adam Trischler. Buildingdynamic knowledge graphs from text-based games. arXiv preprint arXiv:1910.09532 , 2019.[59] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V Chawla. Heterogeneousgraph neural network. In

Proceedings of the 25th ACM SIGKDD International Conference on KnowledgeDiscovery & Data Mining , pages 793–803, 2019.[60] Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie, and Wei-Ying Ma. Collaborative knowledgebase embedding for recommender systems. In

Proceedings of the 22nd ACM SIGKDD internationalconference on knowledge discovery and data mining , pages 353–362, 2016.[61] Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J Smola, and Le Song. Variational reasoning forquestion answering with knowledge graph. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence ,2018.[62] Zhao Zhang, Fuzhen Zhuang, Meng Qu, Fen Lin, and Qing He. Knowledge graph embedding withhierarchical relation structure. In

Proceedings of the 2018 Conference on Empirical Methods in NaturalLanguage Processing , pages 3198–3207, 2018. UPPLEMENTARY MATERIAL: Deep Reinforcement Learning withStacked Hierarchical Attention for Text-based Games

In the supplementary material, we describe the training details, examples of game interface andinteractions used in the paper.

A Training details

We train our model using the Advantage Actor Critic (A2C) method [37] across valid actions. Given s t , the valid action set Valid ( s t ) consists of actions not leading to meaningless feedback (e.g.,“Nothing happens”). Function to obtain the valid action set is provided by Jericho [20]. We denoteall learnable parameters of our model as θ t . After obtaining the state representation vector v t , weuse a critic network to estimate the value V ( v t ) and treat decoders for templates and objects asdifferent policies: π T and π O . We ﬁrst compute the advantage A ( a t , v t ) , then compute the policyloss L π ( v t , a t ; θ t ) and the critic loss L critic ( v t , a t ; θ t ) according to the gradient. We also add anentropy loss L E ( v t , a t ; θ t ) to encourage diversity.Q-value function: Q ( v t , a t ) = r t + γV ( v t +1 ) (10)Advantage function: A ( v t , a t ) = Q ( v t , a t ) − V ( v t ) (11)Actor loss: L π ( v t , a t ; θ t ) = E (cid:34)(cid:32) − log π T ( u | v t ; θ t ) − n (cid:88) i =1 log π O i ( p i | v t , u , ..., p i − ; θ t ) (cid:33) A ( v t , a t ) (cid:35) (12)Critic loss: L critic ( v t , a t ; θ t ) = E [ r t + γV ( v t +1 ) − V ( v t ; θ t )] (13)Entropy loss: L E ( v t , a t ; θ t ) = (cid:88) a ∈ Valid ( s t ) P ( a | v t ) log P ( a | v t ) (14)Similar to KG-A2C [3], a supervised auxiliary task “valid action prediction” is introduced to assistRL training. We ﬁrst extract the valid template set T valid ( s t ) = { τ , τ , ..., τ N } from the valid actionset Valid ( s t ) , and valid object set O valid ( s t ) = { o , o , ..., o M } from the full knowledge graph o t, KG .Then we treat “whether the predicted template / objects is valid” as binary classiﬁcation thus introducetwo cross entropy loss functions: L T ( v t , a t ; θ t ) for the template and L O ( v t , a t ; θ t ) for the objects.Template loss: L T ( v t , a t ; θ t ) = 1 N N (cid:88) i =1 (cid:16) y τ i log π T ( τ i | v t ) + (1 − y τ i )(1 − log π T ( τ i | v t )) (cid:17) (15)Object loss: L O ( v t , a t ; θ t ) = k (cid:88) j =1 M M (cid:88) i =1 (cid:16) y o i log π O j ( o i | v t ) + (1 − y o i )(1 − log π O j ( o i | v t )) (cid:17) (16)where k denotes the number of object decoding steps (there are at most k objects in an action).We have y τ i = 1 if the template τ i ∈ T valid ( s t ) , else 0. Similarly, we have y o i = 1 if the object o i ∈ O valid ( s t ) , else 0. θ t will be optimized jointly with the total loss L total ( v t , a t ; θ t ) , which is theweighted sum of the above ﬁve losses: L total ( v t , a t ; θ t ) = L π + λ critic L critic + λ E L E + λ T L T + λ O L O (17)where λ critic , λ E , λ T and λ O are coefﬁcients. 14 bservation: Convention Hall. You are in attendance at the annual Grue Convention, this year a rather somber affair due to the "adventurer famine" that has gripped gruedom in this isolated corner of the empire. All around you, grues are standing; some in conversation, some drinking, some even paying attention to the speaker. There is a trash chute in one of the walls, and a reader board hangs nearby. You can see some grues, a hat, some clothes and some chewed-up shoes here > Action: take glasses Observation: You take the pair of glasses and place them under your costume. You had better hurry; the steady gurgling is becoming harder and harder for you to maintain. [Your score has just gone up by five points.] > Action: remove mask

Observation: You take off the fish-mouthed mask.

Figure 6: Interface of the game “ztuu”. Action (blue) is a textual command, and raw observation(black) is the textual feedback of previous actions.

B Interface

Fig. 6 shows an example of the raw interface of the game “ztuu”, where raw textual observationscontain only the feedback of taking an action.

C Interaction examples

In this section, we show the ﬁrst 15 interaction steps of two games: “zork1” and “ztuu”. Eachinteraction step consists of the current textual observation, triplets extracted from the current textualobservation, high-level and low-level attention values obtained via different aggregation methods,chosen actions and rewards.

C.1 zork1 −−−−− ===== S t e p 1 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : West o f House . You a r e s t a n d i n g i n an open f i e l d w e s t o f aw h i t e house , w i t h a b o a r d e d f r o n t d o o r . T h e r e i s a s m a l lm a i l b o x h e r e .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : C o p y r i g h t ( c ) 1 9 8 1 , 1 9 8 2 , 1983 Infocom , I n c . A l l r i g h t sr e s e r v e d . ZORK i s a r e g i s t e r e d t r a d e m a r k o f Infocom , I n c .R e v i s i o n 88 / S e r i a l number 840726 West o f House You a r es t a n d i n g i n an open f i e l d w e s t o f a w h i t e house , w i t h ab o a r d e d f r o n t d o o r . T h e r e i s a s m a l l m a i l b o x h e r e .a _ p a s t : l o o k===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ West o f House ’ , ’ h a s ’ , ’ e x i t t o w e s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ w e s t o fw h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ d o o r ’ , ’ i n ’ , ’ w e s t o fw h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ h o u s e ’ , ’ i n ’ , ’ w e s to f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ m a i l b o x ’ , ’ i n ’ , ’w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ you ’ , ’ i n ’ , ’f i e l d ’ ) , ( ’ you ’ , ’ i n ’ , ’ open f i e l d ’ ) , ( ’ you ’ , ’ i n ’ , ’ w e s t ’ ) , (’ you ’ , ’ i n ’ , ’ w e s t o f h o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’ w e s t o f h o u s ew i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ you ’ , ’ i n ’ , ’ w e s t o f h o u s e w i t hf r o n t d o o r ’ ) , ( ’ you ’ , ’ i n ’ , ’ w e s t o f w h i t e h o u s e ’ ) , ( ’ you ’ , ’i n ’ , ’ w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ you ’ , ’i n ’ , ’ w e s t o f w h i t e h o u s e w i t h f r o n t d o o r ’ ) , ( ’ you ’ , ’ i n ’ , ’w e s t w i t h b o a r d e d f r o n t d o o r ’ ) , ( ’ you ’ , ’ i n ’ , ’ w e s t w i t h f r o n td o o r ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 2 ’ , ’ 0 . 2 6 2 ’ , ’ 0 . 2 6 2 ’ , ’ 0 . 2 1 3 ’ ]a t t H _ m e a n : [ ’ 0 . 3 2 0 ’ , ’ 0 . 2 0 5 ’ , ’ 0 . 2 6 7 ’ , ’ 0 . 2 0 8 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 0 8 ’ , ’ 0 . 2 0 2 ’ , ’ 0 . 3 0 8 ’ , ’ 0 . 1 8 2 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 9 7 ’ , ’ 0 . 0 0 7 ’ , ’ 0 . 4 9 3 ’ , ’ 0 . 0 0 3 ’ ]15 t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 4 2 ’ , ’ 0 . 1 6 9 ’ , ’ 0 . 3 1 6 ’ , ’ 0 . 1 7 2 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 8 8 1 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 1 1 9 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 6 1 ’ , ’ 0 . 1 7 4 ’ , ’ 0 . 2 8 6 ’ , ’ 0 . 1 7 9 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : w e s tReward : 0 | S c o r e : 0−−−−− ===== S t e p 2 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t T h i s i s a f o r e s t , w i t h t r e e s i n a l l d i r e c t i o n s . Tot h e e a s t , t h e r e a p p e a r s t o be s u n l i g h t .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : F o r e s t T h i s i s a f o r e s t , w i t h t r e e s i n a l l d i r e c t i o n s . Tot h e e a s t , t h e r e a p p e a r s t o be s u n l i g h t .a _ p a s t : w e s t===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ F o r e s t ’ , ’ h a s ’ , ’ e x i t t o e a s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’f o r e s t ’ , ’ i s w i t h ’ , ’ t r e e s i n d i r e c t i o n s ’ ) , ( ’ f o r e s t t h i s ’ , ’i s ’ , ’ f o r e s t ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s f o r e s t w i t h ’ , ’ t r e e s i nd i r e c t i o n s ’ ) , ( ’ t r e e s ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’ t r e e s ’ , ’ i s i n ’ , ’d i r e c t i o n s ’ ) , ( ’ w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ,’ w e s t o f ’ , ’ F o r e s t ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 4 ’ , ’ 0 . 2 6 5 ’ , ’ 0 . 2 6 5 ’ , ’ 0 . 2 0 6 ’ ]a t t H _ m e a n : [ ’ 0 . 3 1 7 ’ , ’ 0 . 2 0 7 ’ , ’ 0 . 2 6 4 ’ , ’ 0 . 2 1 3 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 0 4 ’ , ’ 0 . 2 1 6 ’ , ’ 0 . 3 0 4 ’ , ’ 0 . 1 7 6 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 9 1 ’ , ’ 0 . 0 1 6 ’ , ’ 0 . 4 9 1 ’ , ’ 0 . 0 0 2 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 4 2 ’ , ’ 0 . 1 7 4 ’ , ’ 0 . 3 0 8 ’ , ’ 0 . 1 7 5 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 3 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 7 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 6 1 ’ , ’ 0 . 1 7 6 ’ , ’ 0 . 2 7 9 ’ , ’ 0 . 1 8 4 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : s o u t hReward : 0 | S c o r e : 0−−−−− ===== S t e p 3 ===== −−−−− 16==== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t T h i s i s a d i m l y l i t f o r e s t , w i t h l a r g e t r e e s a l la r o u n d .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : F o r e s t T h i s i s a d i m l y l i t f o r e s t , w i t h l a r g e t r e e s a l la r o u n d . You h e a r i n t h e d i s t a n c e t h e c h i r p i n g o f a s o n g b i r d .a _ p a s t : s o u t h===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s ’ , ’ d i m l y l i t f o r e s t ’) , ( ’ f o r e s t t h i s ’ , ’ i s ’ , ’ f o r e s t ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s ’ , ’ l i tf o r e s t ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s d i m l y l i t f o r e s t w i t h ’ , ’ l a r g et r e e s ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s d i m l y l i t f o r e s t w i t h ’ , ’ l a r g et r e e s a r o u n d ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s d i m l y l i t f o r e s t w i t h ’ , ’t r e e s ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s d i m l y l i t f o r e s t w i t h ’ , ’ t r e e sa r o u n d ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s f o r e s t w i t h ’ , ’ l a r g e t r e e s ’ ) , ( ’f o r e s t t h i s ’ , ’ i s f o r e s t w i t h ’ , ’ l a r g e t r e e s a r o u n d ’ ) , ( ’f o r e s t t h i s ’ , ’ i s f o r e s t w i t h ’ , ’ t r e e s ’ ) , ( ’ f o r e s t t h i s ’ , ’ i sf o r e s t w i t h ’ , ’ t r e e s a r o u n d ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s l i t f o r e s tw i t h ’ , ’ l a r g e t r e e s ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s l i t f o r e s t w i t h ’ , ’l a r g e t r e e s a r o u n d ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s l i t f o r e s t w i t h ’ , ’t r e e s ’ ) , ( ’ f o r e s t t h i s ’ , ’ i s l i t f o r e s t w i t h ’ , ’ t r e e s a r o u n d ’ ), ( ’ l a r g e ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’ l i t f o r e s t ’ , ’ i s w i t h ’ , ’ l a r g et r e e s a r o u n d ’ ) , ( ’ t h i s ’ , ’ i s ’ , ’ l i t ’ ) , ( ’ t r e e s ’ , ’ i n ’ , ’ F o r e s t’ ) , ( ’ w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ , ’ s o u t h o f ’, ’ F o r e s t ’ ) , ( ’ you ’ , ’ h e a r ’ , ’ c h i r p i n g ’ ) , ( ’ you ’ , ’ h e a r ’ , ’c h i r p i n g o f s o n g b i r d ’ ) , ( ’ you ’ , ’ h e a r c h i r p i n g i n ’ , ’ d i s t a n c e’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 1 ’ , ’ 0 . 2 6 1 ’ , ’ 0 . 2 6 0 ’ , ’ 0 . 2 1 8 ’ ]a t t H _ m e a n : [ ’ 0 . 3 0 9 ’ , ’ 0 . 2 0 8 ’ , ’ 0 . 2 5 5 ’ , ’ 0 . 2 2 8 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 9 4 ’ , ’ 0 . 2 1 5 ’ , ’ 0 . 2 9 2 ’ , ’ 0 . 1 9 9 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 5 0 2 ’ , ’ 0 . 0 2 2 ’ , ’ 0 . 4 6 5 ’ , ’ 0 . 0 1 0 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 3 4 ’ , ’ 0 . 1 7 6 ’ , ’ 0 . 2 9 0 ’ , ’ 0 . 1 9 9 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 7 3 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 2 7 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 5 0 ’ , ’ 0 . 1 7 9 ’ , ’ 0 . 2 6 3 ’ , ’ 0 . 2 0 8 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 5 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 5 5 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : s o u t hReward : 0 | S c o r e : 0−−−−− ===== S t e p 4 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t T h i s i s a d i m l y l i t f o r e s t , w i t h l a r g e t r e e s a l la r o u n d .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : Storm − t o s s e d t r e e s b l o c k y o u r way .a _ p a s t : s o u t h 17==== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’ l a r g e ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’ l i t f o r e s t ’, ’ i s w i t h ’ , ’ l a r g e t r e e s a r o u n d ’ ) , ( ’ t h i s ’ , ’ i s ’ , ’ l i t ’ ) , ( ’t r e e s ’ , ’ b l o c k ’ , ’ y o u r way ’ ) , ( ’ t r e e s ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ , ’ s o u t h o f ’ , ’F o r e s t ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 3 ’ , ’ 0 . 2 6 4 ’ , ’ 0 . 2 6 3 ’ , ’ 0 . 2 1 0 ’ ]a t t H _ m e a n : [ ’ 0 . 3 1 5 ’ , ’ 0 . 2 0 7 ’ , ’ 0 . 2 5 3 ’ , ’ 0 . 2 2 4 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 9 7 ’ , ’ 0 . 2 1 2 ’ , ’ 0 . 2 9 7 ’ , ’ 0 . 1 9 4 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 8 8 ’ , ’ 0 . 0 1 7 ’ , ’ 0 . 4 8 8 ’ , ’ 0 . 0 0 7 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 3 8 ’ , ’ 0 . 1 7 4 ’ , ’ 0 . 2 9 4 ’ , ’ 0 . 1 9 4 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 7 1 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 2 9 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 5 7 ’ , ’ 0 . 1 7 8 ’ , ’ 0 . 2 6 2 ’ , ’ 0 . 2 0 3 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 5 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 5 4 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : w e s tReward : 0 | S c o r e : 0−−−−− ===== S t e p 5 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t T h i s i s a f o r e s t , w i t h t r e e s i n a l l d i r e c t i o n s . Tot h e e a s t , t h e r e a p p e a r s t o be s u n l i g h t .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : F o r e s ta _ p a s t : w e s t===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ F o r e s t ’ , ’ h a s ’ , ’ e x i t t o e a s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t ’ ) , ( ’f o r e s t ’ , ’ i s w i t h ’ , ’ t r e e s i n d i r e c t i o n s ’ ) , ( ’ t r e e s ’ , ’ i n ’ , ’F o r e s t ’ ) , ( ’ t r e e s ’ , ’ i s i n ’ , ’ d i r e c t i o n s ’ ) , ( ’ w e s t o f w h i t eh o u s e w i t h b o a r d e d f r o n t d o o r ’ , ’ w e s t o f ’ , ’ F o r e s t ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 8 ’ , ’ 0 . 2 6 7 ’ , ’ 0 . 2 6 7 ’ , ’ 0 . 1 9 8 ’ ]a t t H _ m e a n : [ ’ 0 . 3 2 5 ’ , ’ 0 . 2 0 6 ’ , ’ 0 . 2 5 9 ’ , ’ 0 . 2 1 0 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 0 8 ’ , ’ 0 . 2 1 3 ’ , ’ 0 . 3 0 4 ’ , ’ 0 . 1 7 5 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 5 2 3 ’ , ’ 0 . 0 1 3 ’ , ’ 0 . 4 6 2 ’ , ’ 0 . 0 0 2 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 4 9 ’ , ’ 0 . 1 7 4 ’ , ’ 0 . 3 0 2 ’ , ’ 0 . 1 7 5 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 7 3 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 2 7 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 7 2 ’ , ’ 0 . 1 7 6 ’ , ’ 0 . 2 7 1 ’ , ’ 0 . 1 8 1 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]18 t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : e a s tReward : 0 | S c o r e : 0−−−−− ===== S t e p 6 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t P a t h T h i s i s a p a t h w i n d i n g t h r o u g h a d i m l y l i tf o r e s t . The p a t h h e a d s n o r t h − s o u t h h e r e . One p a r t i c u l a r l yl a r g e t r e e w i t h some low b r a n c h e s s t a n d s a t t h e e d g e o f t h ep a t h .o _ i n v : You a r e empty − h a n d e d . You h e a r i n t h e d i s t a n c e t h e c h i r p i n go f a s o n g b i r d .o _ f e e d : F o r e s t P a t h T h i s i s a p a t h w i n d i n g t h r o u g h a d i m l y l i tf o r e s t . The p a t h h e a d s n o r t h − s o u t h h e r e . One p a r t i c u l a r l yl a r g e t r e e w i t h some low b r a n c h e s s t a n d s a t t h e e d g e o f t h ep a t h .a _ p a s t : e a s t===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’ e x i t t o n o r t h ’ ) , ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’e x i t t o s o u t h ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ f o r e s t ’ , ’ i n ’, ’ F o r e s t P a t h ’ ) , ( ’ one l a r g e t r e e ’ , ’ i s w i t h ’ , ’ low b r a n c h e s ’) , ( ’ p a t h ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ p a t h ’ , ’ w i n d i n g t h r o u g h ’ ,’ d i m l y l i t f o r e s t ’ ) , ( ’ p a t h ’ , ’ w i n d i n g t h r o u g h ’ , ’ f o r e s t ’ ) , ( ’p a t h ’ , ’ w i n d i n g t h r o u g h ’ , ’ l i t f o r e s t ’ ) , ( ’ t r e e ’ , ’ i n ’ , ’F o r e s t P a t h ’ ) , ( ’ w e s t o f w h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ ,’ e a s t o f ’ , ’ F o r e s t P a t h ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 5 4 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 5 ’ , ’ 0 . 2 4 1 ’ ]a t t H _ m e a n : [ ’ 0 . 2 8 4 ’ , ’ 0 . 2 2 3 ’ , ’ 0 . 2 7 6 ’ , ’ 0 . 2 1 7 ’ ]a t t H _ s u m : [ ’ 0 . 9 3 2 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 6 8 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 8 1 ’ , ’ 0 . 2 3 1 ’ , ’ 0 . 2 8 7 ’ , ’ 0 . 2 0 1 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 1 5 ’ , ’ 0 . 0 5 9 ’ , ’ 0 . 5 1 1 ’ , ’ 0 . 0 1 5 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 0 2 ’ , ’ 0 . 2 0 3 ’ , ’ 0 . 3 1 3 ’ , ’ 0 . 1 8 2 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 2 9 4 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 7 0 6 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 0 7 ’ , ’ 0 . 2 0 3 ’ , ’ 0 . 2 9 9 ’ , ’ 0 . 1 9 1 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 7 7 5 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 2 2 5 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 7 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 7 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : upReward : 0 | S c o r e : 0−−−−− ===== S t e p 7 ===== −−−−−===== 1 . T e x t u a l o b s : 19 _ d e s c : Up a T r e e You a r e a b o u t 10 f e e t a b o v e t h e g r o u n d n e s t l e damong some l a r g e b r a n c h e s . The n e a r e s t b r a n c h a b o v e you i sa b o v e y o u r r e a c h . B e s i d e you on t h e b r a n c h i s a s m a l l b i r d ’ sn e s t . I n t h e b i r d ’ s n e s t i s a l a r g e egg e n c r u s t e d w i t hp r e c i o u s j e w e l s , a p p a r e n t l y s c a v e n g e d by a c h i l d l e s s s o n g b i r d .The egg i s c o v e r e d w i t h f i n e g o l d i n l a y , and o r n a m e n t e d i nl a p i s l a z u l i and mother − of − p e a r l . U n l i k e most eggs , t h i s onei s h i n g e d and c l o s e d w i t h a d e l i c a t e l o o k i n g c l a s p . The egga p p e a r s e x t r e m e l y f r a g i l e .o _ i n v : You a r e empty − h a n d e d .o _ f e e d : Up a T r e e You a r e a b o u t 10 f e e t a b o v e t h e g r o u n d n e s t l e damong some l a r g e b r a n c h e s . The n e a r e s t b r a n c h a b o v e you i sa b o v e y o u r r e a c h . B e s i d e you on t h e b r a n c h i s a s m a l l b i r d ’ sn e s t . I n t h e b i r d ’ s n e s t i s a l a r g e egg e n c r u s t e d w i t hp r e c i o u s j e w e l s , a p p a r e n t l y s c a v e n g e d by a c h i l d l e s s s o n g b i r d .The egg i s c o v e r e d w i t h f i n e g o l d i n l a y , and o r n a m e n t e d i nl a p i s l a z u l i and mother − of − p e a r l . U n l i k e most eggs , t h i s onei s h i n g e d and c l o s e d w i t h a d e l i c a t e l o o k i n g c l a s p . The egga p p e a r s e x t r e m e l y f r a g i l e .a _ p a s t : up===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ a l l ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e t a b o v e g r o u n d n e s t l e d amongl a r g e b r a n c h e s ’ ) , ( ’ b r a n c h ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e ta b o v e g r o u n d n e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’ egg ’ , ’ i n ’ , ’ upt r e e a b o u t 10 f e e t a b o v e g r o u n d n e s t l e d among l a r g e b r a n c h e s ’) , ( ’ g r o u n d ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e t a b o v e g r o u n dn e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’ n e s t ’ , ’ i n ’ , ’ up t r e e a b o u t10 f e e t a b o v e g r o u n d n e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’ w e s t o fw h i t e h o u s e w i t h b o a r d e d f r o n t d o o r ’ , ’ up o f ’ , ’ up t r e e a b o u t10 f e e t a b o v e g r o u n d n e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’ you ’ ,’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t a b o v eg r o u n d ’ ) , ( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t a b o v e g r o u n d n e s t l e d ’ ) ,( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t a b o v e g r o u n d n e s t l e d amongb r a n c h e s ’ ) , ( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t a b o v e g r o u n d n e s t l e damong l a r g e b r a n c h e s ’ ) , ( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t n e s t l e d ’ ), ( ’ you ’ , ’ i n ’ , ’ a b o u t 10 f e e t n e s t l e d among b r a n c h e s ’ ) , ( ’ you’ , ’ i n ’ , ’ a b o u t 10 f e e t n e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’ you ’, ’ i n ’ , ’ up t r e e a b o u t 10 f e e t ’ ) , ( ’ you ’ , ’ i n ’ , ’ up t r e e a b o u t10 f e e t a b o v e g r o u n d ’ ) , ( ’ you ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e ta b o v e g r o u n d n e s t l e d ’ ) , ( ’ you ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e ta b o v e g r o u n d n e s t l e d among b r a n c h e s ’ ) , ( ’ you ’ , ’ i n ’ , ’ up t r e ea b o u t 10 f e e t a b o v e g r o u n d n e s t l e d among l a r g e b r a n c h e s ’ ) , ( ’you ’ , ’ i n ’ , ’ up t r e e a b o u t 10 f e e t n e s t l e d ’ ) , ( ’ you ’ , ’ i n ’ , ’up t r e e a b o u t 10 f e e t n e s t l e d among b r a n c h e s ’ ) , ( ’ you ’ , ’ i n ’ ,’ up t r e e a b o u t 10 f e e t n e s t l e d among l a r g e b r a n c h e s ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 5 8 ’ , ’ 0 . 2 5 3 ’ , ’ 0 . 2 5 9 ’ , ’ 0 . 2 3 1 ’ ]a t t H _ m e a n : [ ’ 0 . 2 6 3 ’ , ’ 0 . 2 0 2 ’ , ’ 0 . 3 3 1 ’ , ’ 0 . 2 0 4 ’ ]a t t H _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 1 1 ’ , ’ 0 . 1 8 1 ’ , ’ 0 . 3 2 0 ’ , ’ 0 . 1 8 7 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 2 2 ’ , ’ 0 . 0 0 2 ’ , ’ 0 . 5 7 4 ’ , ’ 0 . 0 0 3 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 1 7 ’ , ’ 0 . 1 6 2 ’ , ’ 0 . 3 5 4 ’ , ’ 0 . 1 6 8 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 0 6 1 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 3 9 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 2 8 0 ’ , ’ 0 . 1 6 7 ’ , ’ 0 . 3 8 1 ’ , ’ 0 . 1 7 1 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]20 t t L _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : t a k e eggReward : 5 | S c o r e : 5−−−−− ===== S t e p 8 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : Up a T r e e You a r e a b o u t 10 f e e t a b o v e t h e g r o u n d n e s t l e damong some l a r g e b r a n c h e s . The n e a r e s t b r a n c h a b o v e you i sa b o v e y o u r r e a c h . B e s i d e you on t h e b r a n c h i s a s m a l l b i r d ’ sn e s t .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d eggo _ f e e d : Taken .a _ p a s t : t a k e egg===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ a l l ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ b r a n c h ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ), ( ’ egg ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ g r o u n d ’ , ’ i n ’ , ’ a b o u t 10f e e t ’ ) , ( ’ n e s t ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ yo u ’ , ’ h a v e ’ , ’ ’ ) ,( ’ yo u ’ , ’ h a v e ’ , ’ j e w e l − e n c r u s t e d egg ’ ) , ( ’ yo u ’ , ’ i n ’ , ’ a b o u t10 f e e t ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 3 ’ , ’ 0 . 2 1 1 ’ , ’ 0 . 2 6 3 ’ , ’ 0 . 2 6 2 ’ ]a t t H _ m e a n : [ ’ 0 . 3 4 8 ’ , ’ 0 . 2 0 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 0 3 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 1 8 ’ , ’ 0 . 1 8 4 ’ , ’ 0 . 3 1 0 ’ , ’ 0 . 1 8 8 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 5 5 7 ’ , ’ 0 . 0 0 2 ’ , ’ 0 . 4 3 8 ’ , ’ 0 . 0 0 3 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 6 2 ’ , ’ 0 . 1 6 9 ’ , ’ 0 . 2 9 6 ’ , ’ 0 . 1 7 3 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 9 3 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 7 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 9 5 ’ , ’ 0 . 1 7 3 ’ , ’ 0 . 2 5 7 ’ , ’ 0 . 1 7 5 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 5 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 4 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 5 ’ , ’ 0 . 2 4 0 ’ , ’ 0 . 2 6 3 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 6 ’ , ’ 0 . 2 4 5 ’ , ’ 0 . 2 3 7 ’ , ’ 0 . 2 6 2 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 5 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 4 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : p u t egg i n eggReward : 0 | S c o r e : 5−−−−− ===== S t e p 9 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : Up a T r e e You a r e a b o u t 10 f e e t a b o v e t h e g r o u n d n e s t l e damong some l a r g e b r a n c h e s . The n e a r e s t b r a n c h a b o v e you i sa b o v e y o u r r e a c h . B e s i d e you on t h e b r a n c h i s a s m a l l b i r d ’ sn e s t .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d egg You h e a r i n t h ed i s t a n c e t h e c h i r p i n g o f a s o n g b i r d .21 _ f e e d : The j e w e l − e n c r u s t e d egg i s n ’ t open .a _ p a s t : p u t egg i n egg===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ a l l ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ b r a n c h ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ), ( ’ egg ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ g r o u n d ’ , ’ i n ’ , ’ a b o u t 10f e e t ’ ) , ( ’ n e s t ’ , ’ i n ’ , ’ a b o u t 10 f e e t ’ ) , ( ’ yo u ’ , ’ h a v e ’ , ’ ’ ) ,( ’ yo u ’ , ’ h a v e ’ , ’ h e a r i n t h e d i s t a n c e t h e c h i r p i n g o f a s o n gb i r d . ’ ) , ( ’ yo u ’ , ’ h a v e ’ , ’ j e w e l − e n c r u s t e d egg ’ ) , ( ’ yo u ’ , ’ i n ’ ,’ a b o u t 10 f e e t ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 4 ’ , ’ 0 . 2 5 3 ’ , ’ 0 . 2 5 1 ’ ]a t t H _ m e a n : [ ’ 0 . 3 3 1 ’ , ’ 0 . 2 1 3 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 0 6 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 9 7 ’ , ’ 0 . 2 0 8 ’ , ’ 0 . 2 9 5 ’ , ’ 0 . 2 0 0 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 5 0 3 ’ , ’ 0 . 0 1 4 ’ , ’ 0 . 4 7 4 ’ , ’ 0 . 0 0 9 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 4 6 ’ , ’ 0 . 1 8 9 ’ , ’ 0 . 2 9 1 ’ , ’ 0 . 1 7 4 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 8 7 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 1 3 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 7 3 ’ , ’ 0 . 1 9 1 ’ , ’ 0 . 2 5 8 ’ , ’ 0 . 1 7 8 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 4 4 ’ , ’ 0 . 2 6 0 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 3 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 4 2 ’ , ’ 0 . 2 5 9 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : downReward : 0 | S c o r e : 5−−−−− ===== S t e p 10 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t P a t h T h i s i s a p a t h w i n d i n g t h r o u g h a d i m l y l i tf o r e s t . The p a t h h e a d s n o r t h − s o u t h h e r e . One p a r t i c u l a r l yl a r g e t r e e w i t h some low b r a n c h e s s t a n d s a t t h e e d g e o f t h ep a t h .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d egg You h e a r i n t h ed i s t a n c e t h e c h i r p i n g o f a s o n g b i r d .o _ f e e d : F o r e s t P a t ha _ p a s t : down===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’ e x i t t o n o r t h ’ ) , ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’e x i t t o s o u t h ’ ) , ( ’ a b o u t 10 f e e t ’ , ’ down o f ’ , ’ F o r e s t P a t h ’ ) ,( ’ a l l ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ egg ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’f o r e s t ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ one l a r g e t r e e ’ , ’ i s w i t h ’ , ’low b r a n c h e s ’ ) , ( ’ p a t h ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ p a t h ’ , ’w i n d i n g t h r o u g h ’ , ’ d i m l y l i t f o r e s t ’ ) , ( ’ p a t h ’ , ’ w i n d i n gt h r o u g h ’ , ’ f o r e s t ’ ) , ( ’ p a t h ’ , ’ w i n d i n g t h r o u g h ’ , ’ l i t f o r e s t ’ ), ( ’ t r e e ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ yo u ’ , ’ h a v e ’ , ’ ’ ) , ( ’ yo u ’ ,’ h a v e ’ , ’ h e a r i n t h e d i s t a n c e t h e c h i r p i n g o f a s o n g b i r d . ’ ) ,( ’ yo u ’ , ’ h a v e ’ , ’ j e w e l − e n c r u s t e d egg ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 5 4 ’ , ’ 0 . 2 5 3 ’ , ’ 0 . 2 5 5 ’ , ’ 0 . 2 3 8 ’ ]a t t H _ m e a n : [ ’ 0 . 3 2 2 ’ , ’ 0 . 2 2 4 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 0 6 ’ ]22 t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 9 7 ’ , ’ 0 . 2 2 2 ’ , ’ 0 . 2 9 6 ’ , ’ 0 . 1 8 5 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 9 3 ’ , ’ 0 . 0 2 7 ’ , ’ 0 . 4 7 5 ’ , ’ 0 . 0 0 4 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 3 7 ’ , ’ 0 . 2 1 5 ’ , ’ 0 . 2 7 5 ’ , ’ 0 . 1 7 3 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 9 4 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 6 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 5 6 ’ , ’ 0 . 2 1 2 ’ , ’ 0 . 2 5 2 ’ , ’ 0 . 1 8 1 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 5 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 5 4 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : blow o u t t r e e sReward : 0 | S c o r e : 5−−−−− ===== S t e p 11 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : F o r e s t P a t h T h i s i s a p a t h w i n d i n g t h r o u g h a d i m l y l i tf o r e s t . The p a t h h e a d s n o r t h − s o u t h h e r e . One p a r t i c u l a r l yl a r g e t r e e w i t h some low b r a n c h e s s t a n d s a t t h e e d g e o f t h ep a t h . You h e a r i n t h e d i s t a n c e t h e c h i r p i n g o f a s o n g b i r d .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d egg You h e a r i n t h ed i s t a n c e t h e c h i r p i n g o f a s o n g b i r d .o _ f e e d : You c a n ’ t t u r n t h a t o f f .a _ p a s t : blow o u t t r e e s===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’ e x i t t o n o r t h ’ ) , ( ’ F o r e s t P a t h ’ , ’ h a s ’ , ’e x i t t o s o u t h ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ egg ’ , ’ i n ’ , ’F o r e s t P a t h ’ ) , ( ’ f o r e s t ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ one l a r g et r e e ’ , ’ i s w i t h ’ , ’ low b r a n c h e s ’ ) , ( ’ p a t h ’ , ’ i n ’ , ’ F o r e s t P a t h’ ) , ( ’ p a t h ’ , ’ w i n d i n g t h r o u g h ’ , ’ d i m l y l i t f o r e s t ’ ) , ( ’ p a t h ’ ,’ w i n d i n g t h r o u g h ’ , ’ f o r e s t ’ ) , ( ’ p a t h ’ , ’ w i n d i n g t h r o u g h ’ , ’ l i tf o r e s t ’ ) , ( ’ t r e e ’ , ’ i n ’ , ’ F o r e s t P a t h ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ ’ ) ,( ’ you ’ , ’ h a v e ’ , ’ h e a r i n t h e d i s t a n c e t h e c h i r p i n g o f a s o n gb i r d . ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ j e w e l − e n c r u s t e d egg ’ ) , ( ’ you ’ , ’ h e a r’ , ’ c h i r p i n g ’ ) , ( ’ you ’ , ’ h e a r ’ , ’ c h i r p i n g o f s o n g b i r d ’ ) , ( ’you ’ , ’ h e a r c h i r p i n g i n ’ , ’ d i s t a n c e ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 5 4 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 4 ’ , ’ 0 . 2 4 4 ’ ]a t t H _ m e a n : [ ’ 0 . 3 2 5 ’ , ’ 0 . 2 1 5 ’ , ’ 0 . 2 5 3 ’ , ’ 0 . 2 0 7 ’ ]a t t H _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 0 0 ’ , ’ 0 . 2 0 3 ’ , ’ 0 . 3 0 0 ’ , ’ 0 . 1 9 7 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 9 1 ’ , ’ 0 . 0 1 0 ’ , ’ 0 . 4 9 2 ’ , ’ 0 . 0 0 7 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 4 1 ’ , ’ 0 . 1 9 2 ’ , ’ 0 . 2 9 2 ’ , ’ 0 . 1 7 5 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 9 8 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 2 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 6 1 ’ , ’ 0 . 1 9 5 ’ , ’ 0 . 2 6 3 ’ , ’ 0 . 1 8 1 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ ]23 t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 4 5 ’ , ’ 0 . 2 5 8 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 3 ’ , ’ 0 . 2 4 6 ’ , ’ 0 . 2 4 4 ’ , ’ 0 . 2 5 7 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 7 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : s o u t hReward : 0 | S c o r e : 5−−−−− ===== S t e p 12 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : N o r t h o f House You a r e f a c i n g t h e n o r t h s i d e o f a w h i t eh o u s e . T h e r e i s no d o o r h e r e , and a l l t h e windows a r e b o a r d e dup . To t h e n o r t h a n a r r o w p a t h w i n d s t h r o u g h t h e t r e e s .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d eggo _ f e e d : N o r t h o f House You a r e f a c i n g t h e n o r t h s i d e o f a w h i t eh o u s e . T h e r e i s no d o o r h e r e , and a l l t h e windows a r e b o a r d e dup . To t h e n o r t h a n a r r o w p a t h w i n d s t h r o u g h t h e t r e e s .a _ p a s t : s o u t h===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ N o r t h o f House ’ , ’ h a s ’ , ’ e x i t t o n o r t h ’ ) , ( ’ N o r t h o f House ’ , ’h a s ’ , ’ e x i t t o up ’ ) , ( ’ a b o u t 10 f e e t ’ , ’ s o u t h o f ’ , ’ n o r t h s i d eo f w h i t e h o u s e ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ n o r t h s i d e o f w h i t e h o u s e ’ ) ,( ’ egg ’ , ’ i n ’ , ’ n o r t h s i d e o f w h i t e h o u s e ’ ) , ( ’ h o u s e ’ , ’ i n ’ , ’n o r t h s i d e o f w h i t e h o u s e ’ ) , ( ’ n a r r o w ’ , ’ i n ’ , ’ n o r t h s i d e o fw h i t e h o u s e ’ ) , ( ’ windows ’ , ’ i n ’ , ’ n o r t h s i d e o f w h i t e h o u s e ’ ) ,( ’ you ’ , ’ h a v e ’ , ’ ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ j e w e l − e n c r u s t e d egg ’ ) ,( ’ you ’ , ’ i n ’ , ’ n o r t h s i d e ’ ) , ( ’ you ’ , ’ i n ’ , ’ n o r t h s i d e o fh o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’ n o r t h s i d e o f w h i t e h o u s e ’ ) , ( ’ you ’ , ’i n ’ , ’ s i d e ’ ) , ( ’ you ’ , ’ i n ’ , ’ s i d e o f h o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’s i d e o f w h i t e h o u s e ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 4 ’ , ’ 0 . 2 5 7 ’ , ’ 0 . 2 4 1 ’ ]a t t H _ m e a n : [ ’ 0 . 2 6 5 ’ , ’ 0 . 2 0 9 ’ , ’ 0 . 2 7 5 ’ , ’ 0 . 2 5 0 ’ ]a t t H _ s u m : [ ’ 0 . 0 2 8 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 7 2 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 6 7 ’ , ’ 0 . 1 9 6 ’ , ’ 0 . 2 8 7 ’ , ’ 0 . 2 4 9 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 2 7 8 ’ , ’ 0 . 0 1 3 ’ , ’ 0 . 5 7 1 ’ , ’ 0 . 1 3 8 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 2 7 1 ’ , ’ 0 . 1 7 3 ’ , ’ 0 . 3 0 7 ’ , ’ 0 . 2 4 9 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 0 4 1 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 5 4 ’ , ’ 0 . 0 0 5 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 2 7 3 ’ , ’ 0 . 1 8 3 ’ , ’ 0 . 2 9 6 ’ , ’ 0 . 2 4 8 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 0 1 7 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 8 3 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 8 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : e a s tReward : 0 | S c o r e : 5−−−−− ===== S t e p 13 ===== −−−−−===== 1 . T e x t u a l o b s : 24 _ d e s c : B e h i n d House You a r e b e h i n d t h e w h i t e h o u s e . A p a t h l e a d si n t o t h e f o r e s t t o t h e e a s t . I n one c o r n e r o f t h e h o u s e t h e r ei s a s m a l l window which i s s l i g h t l y a j a r .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d eggo _ f e e d : B e h i n d House You a r e b e h i n d t h e w h i t e h o u s e . A p a t h l e a d si n t o t h e f o r e s t t o t h e e a s t . I n one c o r n e r o f t h e h o u s e t h e r ei s a s m a l l window which i s s l i g h t l y a j a r .a _ p a s t : e a s t===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ B e h i n d House ’ , ’ h a s ’ , ’ e x i t t o e a s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ b e h i n dh o u s e w h i t e h o u s e ’ ) , ( ’ egg ’ , ’ i n ’ , ’ b e h i n d h o u s e w h i t e h o u s e ’ ), ( ’ h o u s e ’ , ’ i n ’ , ’ b e h i n d h o u s e w h i t e h o u s e ’ ) , ( ’ n o r t h s i d e o fw h i t e h o u s e ’ , ’ e a s t o f ’ , ’ b e h i n d h o u s e w h i t e h o u s e ’ ) , ( ’ p a t h ’, ’ i n ’ , ’ b e h i n d h o u s e w h i t e h o u s e ’ ) , ( ’ window ’ , ’ i n ’ , ’ b e h i n dh o u s e w h i t e h o u s e ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ ’ ) , ( ’ you ’ , ’ h a v e ’ , ’j e w e l − e n c r u s t e d egg ’ ) , ( ’ you ’ , ’ i n ’ , ’ b e h i n d h o u s e h o u s e ’ ) , ( ’you ’ , ’ i n ’ , ’ b e h i n d h o u s e w h i t e h o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’ h o u s e ’) , ( ’ you ’ , ’ i n ’ , ’ w h i t e h o u s e ’ ) ]===== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 6 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 6 0 ’ , ’ 0 . 2 3 1 ’ ]a t t H _ m e a n : [ ’ 0 . 2 8 8 ’ , ’ 0 . 2 0 9 ’ , ’ 0 . 2 7 7 ’ , ’ 0 . 2 2 6 ’ ]a t t H _ s u m : [ ’ 0 . 9 8 4 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 0 1 6 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 9 2 ’ , ’ 0 . 1 8 8 ’ , ’ 0 . 2 9 1 ’ , ’ 0 . 2 2 9 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 8 3 ’ , ’ 0 . 0 0 6 ’ , ’ 0 . 4 6 9 ’ , ’ 0 . 0 4 2 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 0 6 ’ , ’ 0 . 1 7 2 ’ , ’ 0 . 3 0 5 ’ , ’ 0 . 2 1 7 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 5 2 3 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 4 7 7 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 3 0 9 ’ , ’ 0 . 1 8 2 ’ , ’ 0 . 2 9 7 ’ , ’ 0 . 2 1 2 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 8 7 4 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 1 2 6 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 1 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 9 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 2 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 9 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : open windowReward : 0 | S c o r e : 5−−−−− ===== S t e p 14 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : B e h i n d House You a r e b e h i n d t h e w h i t e h o u s e . A p a t h l e a d si n t o t h e f o r e s t t o t h e e a s t . I n one c o r n e r o f t h e h o u s e t h e r ei s a s m a l l window which i s open .o _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d eggo _ f e e d : With g r e a t e f f o r t , you open t h e window f a r enough t o a l l o we n t r y .a _ p a s t : open window===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ B e h i n d House ’ , ’ h a s ’ , ’ e x i t t o e a s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ w h i t eh o u s e ’ ) , ( ’ egg ’ , ’ i n ’ , ’ w h i t e h o u s e ’ ) , ( ’ h o u s e ’ , ’ i n ’ , ’ w h i t eh o u s e ’ ) , ( ’ p a t h ’ , ’ i n ’ , ’ w h i t e h o u s e ’ ) , ( ’ s m a l l ’ , ’ i n ’ , ’ w h i t eh o u s e ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ j e w e l −e n c r u s t e d egg ’ ) , ( ’ you ’ , ’ i n ’ , ’ h o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’ w h i t eh o u s e ’ ) ] 25==== 3 . A t t e n t i o n v a l u e s :−−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 7 6 ’ , ’ 0 . 1 8 6 ’ , ’ 0 . 2 7 7 ’ , ’ 0 . 2 6 1 ’ ]a t t H _ m e a n : [ ’ 0 . 2 5 6 ’ , ’ 0 . 2 0 7 ’ , ’ 0 . 3 2 3 ’ , ’ 0 . 2 1 4 ’ ]a t t H _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 3 1 0 ’ , ’ 0 . 1 7 6 ’ , ’ 0 . 3 1 2 ’ , ’ 0 . 2 0 1 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 4 7 8 ’ , ’ 0 . 0 0 2 ’ , ’ 0 . 5 1 4 ’ , ’ 0 . 0 0 6 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 3 0 2 ’ , ’ 0 . 1 6 7 ’ , ’ 0 . 3 4 3 ’ , ’ 0 . 1 8 7 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 0 3 8 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 6 2 ’ , ’ 0 . 0 0 0 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 2 6 4 ’ , ’ 0 . 1 7 4 ’ , ’ 0 . 3 7 6 ’ , ’ 0 . 1 8 6 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 1 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : w e s tReward : 1 0 | S c o r e : 15−−−−− ===== S t e p 15 ===== −−−−−===== 1 . T e x t u a l o b s :o _ d e s c : K i t c h e n You a r e i n t h e k i t c h e n o f t h e w h i t e h o u s e . A t a b l eseems t o h a v e b e e n u s e d r e c e n t l y f o r t h e p r e p a r a t i o n o f f o o d .A p a s s a g e l e a d s t o t h e w e s t and a d a r k s t a i r c a s e c a n be s e e nl e a d i n g upward . A d a r k chimney l e a d s down and t o t h e e a s t i s as m a l l window which i s open . On t h e t a b l e i s an e l o n g a t e dbrown s a c k , s m e l l i n g o f h o t p e p p e r s . A b o t t l e i s s i t t i n g ont h e t a b l e . The g l a s s b o t t l e c o n t a i n s : A q u a n t i t y o f w a t e ro _ i n v : You a r e c a r r y i n g : A j e w e l − e n c r u s t e d eggo _ f e e d : K i t c h e n You a r e i n t h e k i t c h e n o f t h e w h i t e h o u s e . A t a b l eseems t o h a v e b e e n u s e d r e c e n t l y f o r t h e p r e p a r a t i o n o f f o o d .A p a s s a g e l e a d s t o t h e w e s t and a d a r k s t a i r c a s e c a n be s e e nl e a d i n g upward . A d a r k chimney l e a d s down and t o t h e e a s t i s as m a l l window which i s open . On t h e t a b l e i s an e l o n g a t e dbrown s a c k , s m e l l i n g o f h o t p e p p e r s . A b o t t l e i s s i t t i n g ont h e t a b l e . The g l a s s b o t t l e c o n t a i n s : A q u a n t i t y o f w a t e ra _ p a s t : w e s t===== 2 . Newly e x t r a c t e d t r i p l e t s[ ( ’ K i t c h e n ’ , ’ h a s ’ , ’ e x i t t o down ’ ) , ( ’ K i t c h e n ’ , ’ h a s ’ , ’ e x i t t oe a s t ’ ) , ( ’ K i t c h e n ’ , ’ h a s ’ , ’ e x i t t o up ’ ) , ( ’ K i t c h e n ’ , ’ h a s ’ , ’e x i t t o w e s t ’ ) , ( ’ a l l ’ , ’ i n ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’brown ’ , ’ i n ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ chimney ’ , ’ i n ’ , ’k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ egg ’ , ’ i n ’ , ’ k i t c h e n o f w h i t eh o u s e ’ ) , ( ’ g l a s s ’ , ’ i n ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ p a s s a g e ’, ’ i n ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ s t a i r c a s e ’ , ’ i n ’ , ’k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ t a b l e ’ , ’ i n ’ , ’ k i t c h e n o f w h i t eh o u s e ’ ) , ( ’ w a t e r ’ , ’ i n ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ w h i t eh o u s e ’ , ’ w e s t o f ’ , ’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ window ’ , ’ i n ’ ,’ k i t c h e n o f w h i t e h o u s e ’ ) , ( ’ you ’ , ’ h a v e ’ , ’ ’ ) , ( ’ you ’ , ’ h a v e’ , ’ j e w e l − e n c r u s t e d egg ’ ) , ( ’ you ’ , ’ i n ’ , ’ k i t c h e n ’ ) , ( ’ you ’ , ’i n ’ , ’ k i t c h e n o f h o u s e ’ ) , ( ’ you ’ , ’ i n ’ , ’ k i t c h e n o f w h i t eh o u s e ’ ) ]===== 3 . A t t e n t i o n v a l u e s : 26−−−− a t t H : o _ d e s c , o _ i n v , o _ f e e d , a _ p a s tattH_max : [ ’ 0 . 2 1 1 ’ , ’ 0 . 2 1 8 ’ , ’ 0 . 3 0 4 ’ , ’ 0 . 2 6 7 ’ ]a t t H _ m e a n : [ ’ 0 . 2 4 6 ’ , ’ 0 . 2 1 7 ’ , ’ 0 . 2 7 5 ’ , ’ 0 . 2 6 2 ’ ]a t t H _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 9 0 ’ , ’ 0 . 0 1 0 ’ ]a t t H _ t o p 1 0 _ m e a n : [ ’ 0 . 2 0 8 ’ , ’ 0 . 1 9 6 ’ , ’ 0 . 3 3 8 ’ , ’ 0 . 2 5 9 ’ ]a t t H _ t o p 1 0 _ s u m : [ ’ 0 . 0 0 7 ’ , ’ 0 . 0 0 4 ’ , ’ 0 . 9 2 5 ’ , ’ 0 . 0 6 4 ’ ]a t t H _ t o p 2 5 _ m e a n : [ ’ 0 . 2 2 1 ’ , ’ 0 . 1 9 8 ’ , ’ 0 . 3 1 9 ’ , ’ 0 . 2 6 3 ’ ]a t t H _ t o p 2 5 _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 9 2 ’ , ’ 0 . 0 0 8 ’ ]a t t H _ t o p 5 0 _ m e a n : [ ’ 0 . 2 3 5 ’ , ’ 0 . 2 0 4 ’ , ’ 0 . 2 9 7 ’ , ’ 0 . 2 6 5 ’ ]a t t H _ t o p 5 0 _ s u m : [ ’ 0 . 0 0 0 ’ , ’ 0 . 0 0 0 ’ , ’ 0 . 9 9 7 ’ , ’ 0 . 0 0 3 ’ ]−−−−− a t t L : c o n n e c t i v i t y , i t e m _ i n _ r o o m , i t e m _ i n _ i n v , h i s t o r ya t t L _ m a x : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 8 ’ ]a t t L _ t o p 1 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 1 0 _ s u m : [ ’ 0 . 2 5 1 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 2 5 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 2 5 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 4 8 ’ , ’ 0 . 2 4 7 ’ , ’ 0 . 2 5 3 ’ ]a t t L _ t o p 5 0 _ m e a n : [ ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 5 0 ’ ]a t t L _ t o p 5 0 _ s u m : [ ’ 0 . 2 5 2 ’ , ’ 0 . 2 5 0 ’ , ’ 0 . 2 4 9 ’ , ’ 0 . 2 4 8 ’ ]===== 4 . Chosen a c t i o n and r e w a r dA c t i o n : w e s tReward : 0 | S c o r e : 15