A Simple Language Model for Task-Oriented Dialogue
Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher
AA Simple Language Model forTask-Oriented Dialogue
Ehsan Hosseini-Asl [email protected] Research
Bryan McCann [email protected] Research
Chien-Sheng Wu [email protected] Research
Semih Yavuz [email protected] Research
Richard Socher [email protected] Research
Abstract
Task-oriented dialogue is often decomposed into three tasks: understanding user in-put, deciding actions, and generating a response. While such decomposition mightsuggest a dedicated model for each sub-task, we find a simple, unified approachleads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is asimple approach to task-oriented dialogue that uses a single, causal language modeltrained on all sub-tasks recast as a single sequence prediction problem. This allowsSimpleTOD to fully leverage transfer learning from pre-trained, open domain,causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art in joint goal accuracy for dialogue state tracking, and our analysis revealsrobustness to noisy annotations in this setting. SimpleTOD also improves the mainmetrics used to evaluate action decisions and response generation in an end-to-endsetting: inform rate by 8.1 points, success rate by 9.7 points, and combined scoreby 7.2 points.
Conversational AI has been a long-standing area of exploration in computer science, and has gainedmore attention recently in both academia and industries with the current advances of neural ap-proaches [15]. There are broadly two categories of dialogue. Open-domain dialogue systems focuson making chit-chat, open-ended conversations with humans more natural and engaging. They areusually trained end-to-end using large-scale data from social media [1]. Task-oriented dialogue(TOD) systems accomplish a goal described by a user in natural language. They often use a pipelineapproach [44, 56]. The pipeline requires natural language understanding (NLU) for belief statetracking, dialogue management (DM) for deciding which actions to take based on those beliefs, andnatural language generation (NLG) for generating responses [48].Traditionally, each component of task-oriented dialogue systems is trained independently withdifferent supervision. The NLU module is trained on domain and intent labels. The DM moduleemploys dialogue belief and dialogue act labels. The NLG module accesses templatized or naturalresponses. The modular dependencies of these components can lead to error propagation wheninformation is not provided to subsequent modules in the pipeline [27]. For example, many systemsdo not consider the entire dialogue history at every turn, but rather rely on the NLU module to passbelief states reliably to following module components [58].We propose recasting task-oriented dialogue as a simple, causal (unidirectional) language modelingtask. We show that such an approach can solve all the sub-tasks in a unified way using multi-task
Preprint. Under review. a r X i v : . [ c s . C L ] J u l igure 1: SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal languagemodel to generate all outputs given the dialogue context and retrieved database search results. Thedelexicalized response can then be lexicalized into a human-readable response by using informationfrom the belief state and DB search results.maximum likelihood training. The proposed Simple Task-Oriented Dialogue (SimpleTOD) approachenables modeling of the inherent dependencies between the sub-tasks of task-oriented dialogue, byoptimizing for all tasks in an end-to-end manner. SimpleTOD also opens the path towards fullyleveraging large language models such as GPT-2 [38] for task-oriented dialogue. The success ofSimpleTOD demonstrates a strong connection between the implicit language understanding in theopen domain required of high-quality causal language models and the kind of understanding requiredfor a full task-oriented dialogue system.Evaluation results demonstrate the advantages of SimpleTOD. It achieves 55.76 joint goal accuracyon MultiWOZ, which surpasses all prior work for the dialogue state tracking (i.e. belief state tracking)sub-task. In the setting closest to testing a full task-oriented dialogue system, in which belief statesand action decisions are generated rather than retrieved from an oracle, SimpleTOD performancesurpasses prior work on each individual action and response generation metric (+8.1 inform rate, +9.7success rate).The contributions of this work are summarized as follows: • SimpleTOD – a state-of-the-art generative model for dialogue state tracking. • SimpleTOD is also the first model to achieve state-of-the-art performance for dialogue statetracking, action decisions, and response generation metrics together in an end-to-end setting. • Analysis showing SimpleTOD is a robust dialogue state tracker in the presence of noisy-labeled annotations. • Ablations showing the importance of user/system and endof(segment) tokens. • Ablations showing the importance of pre-training that also show larger versions of Simple-TOD are not always better for end-to-end MultiWOZ. • A list of discovered noisy annotations in MultiWOZ 2.1 alongside a cleaned version ofthe test set, code for training and evaluation, are provided at https://github.com/salesforce/simpletod Related Work
Task-Oriented Dialogue
Much work on task-oriented dialogue focuses on a specific module andevaluates only for that module. These components include understanding user intent via intentdetection [26], tracking the constraints imposed by the user via dialogue state tracking [20, 32, 40,34, 51, 57, 62, 8, 19], determining system actions via dialogue policy [49], and using dedicatedresponse generation components [47].Some recent works have started to bridge multiple sub-tasks by connecting modules together andevaluating in settings that hand off generated results from one module to another. Chen et al. [9]proposed a joint action-response generation using oracle dialogue states. Peng et al. [36] used GPT-2to learn a response generator conditioned on oracle dialogue acts and do not evaluate on dialoguestate tracking.
Towards End-to-End Task-Oriented Dialogue
Dependencies between these independent mod-ules make pipeline approaches vulnerable to error propagation across components [28]. Recentapproaches have increasingly shifted towards end-to-end solutions, which aim to reduce human effortand task-specific design. Several works used both dialogue history and knowledge bases as input andoptimized neural encoder-decoder models to generate or retrieve system responses without modularsupervision [13, 60, 29, 52, 54]. Some systems are mostly end-to-end, but still call out to additionalAPIs or skip intermediate tasks like dialogue state tracking [5].Others have incorporated additional supervision and trained in multi-task settings. Lei et al. [24]and Shu et al. [43] incorporated dialogue state tracking and jointly trained with response generationusing a sequence-to-sequence approach. Liu et al. [28] proposed a hybrid imitation and reinforcementlearning method, jointly learning a policy for dialogue management with response generation. Wenet al. [48], Liang et al. [25] trained language understanding, dialogue state tracking, and dialoguepolicy modules with a shared encoder.Many other works fall somewhere in between by jointly training some tasks. Neelakantan et al.[33] modeled dialogue management and response generation jointly, incorporating latent knowledgereasoning through attention without using belief states. Zhao et al. [61] proposed to model systemactions as latent variables, inducing a latent action space with variational inference methods. Zhanget al. [58] proposed a domain-aware multi-decoder model and augmented dialogue data, whichachieved state-of-the-art combined score for dialogue management and response generation on theMultiWOZ dataset.Although all these approaches have come closer to unifying the stack, none are as simple as Simple-TOD: treating all of task-oriented dialogue as a single sequence prediction problem, using a singlemodel, trained with a single, joint, multi-task loss.
Unsupervised pre-training for natural language processing
Pre-training approaches for naturallanguage processing focus on transferable representations for contextualized word vectors [30, 37],generative models [38, 23], or a combination of both [12, 55]. Variants of pre-trained, bidirectionalTransformers like BERT [11] are often evaluated on classification tasks such as those in the GLUEbenchmark [46] or span-based question answering tasks [39]. Unidirectional (causal) pre-trainedlanguage models such as GPT-2 [38] or CTRL [23] resemble the decoder from the original Trans-former architecture [45]. They aim to learn a distribution for next-word prediction, which makesthem particularly useful for tasks that require text generation. In dialogue, Zhang et al. [59] built onGPT-2 by further pre-training it on Reddit data for open-domain response generation. Hendersonet al. [21] also pre-trained on Reddit data with a dual Transformer encoder for response selection.Bao et al. [3] used both Twitter and Reddit data to pre-train a Transformer model with discretelatent variables. Wu et al. [53] proposed a response selection model by pre-training BERT modelon multiple task-oriented corpora. Budzianowski and Vuli´c [6] employed GPT-2 to leverage thepre-trained language model for dialogue response generation. Ham et al. [17] fine-tuned GPT-2 onMultiWOZ dataset and achieved lower performance on DST and end-to-end evaluation compared tothe previous single-task and modularized models.3
Methods
This section describes task-oriented dialogue, how we frame it for SimpleTOD, the model architecture,training details, dataset details, and evaluation metrics.
Task-oriented dialogue (TOD) is evaluated on three sub-tasks: dialogue state (belief state) tracking,dialogue management (action/decision prediction) and response generation. This decomposition hasmade it possible to create dedicated models for each sub-task, which is the dominant approach. Bycontrast, we explore the possibility of using a single-model, end-to-end approach, SimpleTOD.Dialogues consist of multiple turns. In a turn t , the user provides input U t and the system generates aresponse S t . To generate a response during inference, SimpleTOD reads all previous turns as context, C t = [ U , S , . . . , U t ] . It generates a belief state B t , B t = SimpleTOD ( C t ) (1)which is a list of triplets recording values for slots in a particular domain: (domain, slot_name, value) .This belief state is used to query a database for information. The database search returns rows fromthe database that satisfy the conditions of the belief state. The rows returned can later be used tolexicalize the response (filling in generated placeholders), but SimpleTOD only takes as input theaggregated database search results, D t . D t includes how many rows were returned and, dependingon the experimental setting, whether booking status information. SimpleTOD then conditions on C t , B t , and D t concatenated together as a single sequence to decide actions, A t . A t = SimpleTOD ([ C t , B t , D t ]) (2)These actions are generated as another list of triplets: (domain, action_type, slot_name) . A delexical-ized response S t is generated conditioned on all prior information concatenated as a single sequence. S t = SimpleTOD ([ C t , B t , D t , A t ]) (3)When combined with information from the belief state and database search results, the response canbe lexicalized to recover human readable response text. A single training sequence consists of the concatenation x t = [ C t ; B t ; D t ; A t ; S t ] , allowingus to model the joint probability over the sequence x t . Given example sequences of the form x = ( x , . . . , x n ) where each x i comes from a fixed set of symbols, the goal of language modelingis to learn p ( x ) . It is natural to factorize this distribution using the chain rule of probability [4]and train a neural network with parameters θ to minimize the negative log-likelihood over a dataset D = { x , . . . , x | D | } where sequence x t has length n t : p ( x ) = n (cid:89) i =1 p ( x i | x
The input to the model is tokenized with pretrained BPE codes [42] associated with DistilGPT2 [41],a distilled version of GPT-2 [38]. According to experimental results, Experiments for SimpleTODuse default hyperparameters for GPT-2 and DistilGPT2 in Huggingface Transformers[50]. Sequenceslonger than tokens are truncated.
We evaluate on the Multi-domain Wizard-of-Oz (MultiWOZ) [7], a large-scale, multi-domain dialoguedataset of human-human conversations. It contains 10438 multi-turn dialogues with 13.68 averageturns, spanning over seven domains (restaurant, train, attraction, hotel, taxi, hospital, police). Policeand hospital domains are excluded from evaluation, since they do not have valid/test splits. This leaves30 domain-slot pairs for the remaining five domain with 4,500 possible values. SimpleTOD is trainedon delexicalized system responses according to the pre-processing explained in [7]. Recently, [14]released MultiWOZ 2.1 which removes some noisy state values from dialogue state (belief state)tracking annotations. For dialogue state tracking evaluation, we used 2.1 version in order to compareto recent state-of-the-art methods. To the best of our knowledge, all prior work on action and responsegeneration has evaluated on 2.0, so we include those results for direct comparison. But, we alsoinclude results for 2.1 so future work can compare to SimpleTOD on the improved version as well.
We follow the original MultiWOZ [7] guidance for all individual metrics and follow Mehri et al. [31]for the combined score. Joint goal accuracy is used to evaluate the performance of dialogue statetracking (i.e. belief state tracking). It measures the accuracy of the generated belief states as theycompare to oracle belief states. Model outputs are only counted as correct when all the predictedvalues exactly match the oracle values. Action and response generation uses three metrics. The firsttwo are inform and success rates. They are designed to capture how well the task was completed.Inform rate measures how often the entities provided by the system are correct. Success rate refersto how often the system is able to answer all the requested attributes by user. BLUE score [35] isused to measure the fluency of the generated responses. The combined score for action and responsegeneration is computed as (
BLEU + 0 . ∗ ( Inf orm + Success ) ).5 odel Decoder Context Encoder Extra Supervision Joint AccuracyTRADE ∗ Generative + Classifier Bidirectional - 45.6DSTQA ∗∗ Classifier Bidirectional knowledge graph 51.17DST-Picklist ∗ Classifier Bidirectional - 53.3SST ∗ Generative Bidirectional schema graph 55.23TripPy † Classifier Bidirectional action decision 55.3SimpleTOD o Generative Unidirectional - 55.72SimpleTOD ∗ Generative Unidirectional -
SimpleTOD + Generative Unidirectional - 57.47
Table 1: Evaluation of Dialogue State Tracking (DST) on MultiWOZ 2.1 using joint accuracy metric. ∗ uses test label cleaning proposed by Wu et al. [51] and recommended by MultiWOZ authors. † useslabel normalization and equivalent matching proposed in Heck et al. [19]. ∗∗ uses the cleaning of ∗ models plus additional accounting for label variants. + performs cleaning of Type 2 and partialcleaning of Type 4 noisy annotations as outlined in Section 5, which is currently non-standard and soleft unbolded. o no label-cleaning. SimpleTOD is a Unified System for Task-Oriented Dialogue
SimpleTOD is, to the best of ourknowledge, the first system that generates state-of-the-art results judged according to dialogue statetracking as well as end-to-end metrics for action and response generation for MultiWOZ.
Table 1 compares the joint goal accuracy to previous methods. We compare to TRADE [51],DSTQA [62], DST-Picklist [57], SST [8], and TripPy [19]. All previous models propose a bidi-rectional encoder to learn a better representation of the dialogue context, but SimpleTOD uses aunidirectional (causal) decoder and no additional bidirectional encoder. It also makes no use of extrasupervision. It nonetheless achieves state-of-the-art.Many models use some form of test-label cleaning. TRADE, DSTQA, DST-Picklist, and SST usethe script proposed by Wu et al. [51] . DSTQA also accounts for label variations that would haveoriginally been considered incorrect. TripPy apply their own format normalization, typo corrections,and process for accounting for label variations. SimpleTOD achieves the best performance withoutany cleaning or normalization, simply on the raw, original annotations. Applying the script from Wuet al. [51] improves the result to 55.76. Analysis of further noisy annotation is presented in section 5.Further cleaning those annotations more accurately reflects performance at 57.47. We will releasethe list of noisy annotations that need to be fixed along with their corrections, but we reiterate thatSimpleTOD does not need this cleaning to surpass prior methods. Table 2 and Table 3 demonstrate the effectiveness of SimpleTOD for action and response generationin the most realistic, fully end-to-end setting – when models must generate belief states, actions, andresponses. SimpleTOD targets replacing modularized and pipelined methods that evaluate differentcomponents evaluated with oracle information. For reference, oracle settings compare across avariety of settings against HDSA ([9]), ARDM ([54]), LaRL ([61]), PARG ([16]) can be found inthe Supplementary Materials, but these comparisons are not essential for end-to-end contributions.In fact, SimpleTOD is state-of-the-art in the end-to-end setting compared to the only prior work,DAMD [58], without achieving state-of-the-art in settings that partially utilize oracle information.This highlights that partial, oracle evaluation does not reliably transfer to the end-to-end evaluation offull systems – only end-to-end evaluation accurately describes the performance of a full system. https://github.com/jasonwu0731/trade-dst/blob/master/utils/fix_label.py The term "end-to-end" is overloaded in the literature. Evaluation that does not use oracle belief states,actions, or response is considered end-to-end even when the system itself is not trained end-to-end. SimpleTODis trained end-to-end and achieves state-of-the-art in end-to-end evaluation. odel Belief State DB Search Action Inform Success BLEU CombinedDAMD+augmentation generated oracle generated 76.3 60.4 16.6 85SimpleTOD (ours) generated oracle generated 78.1 63.4 16.91 87.66SimpleTOD (ours) generated dynamic generated 81.4 69.7 16.11 91.66SimpleTOD (ours) generated - generated Table 2: Action and response generation on MultiWOZ 2.0 reveals that SimpleTOD, a single, causallanguage model, is sufficient to surpass prior work.
Belief State DB Search Action Inform Success BLEU Combinedgenerated oracle generated 79.3 65.4 16.01 87.36generated dynamic generated 83.4 67.1 14.99 90.24generated - generated 85 70.5 15.23 92.98
Table 3: Action and response generation on MultiWOZ 2.1 for SimpleTOD.Prior work uses oracle DB Search results as supervision during training and as input during inference.We include directly comparable experiments using oracle DB Search results. We also includeexperiments that completely ignore the DB Search results to show the surprising effectiveness ofSimpleTOD without DB Search information. We also show a setting with dynamic DB Search results.In this setting, we train with the number of matched DB entries and compute this dynamically atinference from generated belief states. In all variations, SimpleTOD outperforms prior work.DAMD ([58]) is the only prior work that has evaluated with generated belief states from dialoguestate tracking during inference. We found in additional ablation experiments that we could increasescores for individual metrics like inform rate and success rate by training three separate SimpleTODlanguage models: one for dialogue state tracking, one for action generation, and one for responsegeneration. However, the combined scores remained nearly identical to the full end-to-end, singlemodel approach. For example, separating the models might improve inform rate, but hurt responsegeneration measured by BLEU. Regardless, in this most realistic setting SimpleTOD achieves state-of-the-art on inform and success metric. SimpleTOD performs lower only on BLEU by 1.59 points,perhaps due to lack of action/response augmentation employed by DAMD.
Regarding Oracle DB Search Results
In the case where we dynamically compute partial DBSearch results (number of entries matched only), the results are actually lower than ignoring thementirely. Using oracle DB information likewise leads to lower performance. The best result ignoresDB Search results entirely. We have found that in some cases, the generated belief states conflict insome way with the information in the database. For example, there can be discrepancies betweenthe two in the name of restaurants: ‘pizza hut fenditton’ in the target belief states but ‘pizza hut fenditton’ in the database. We have consulted with the authors of the dataset, but there is currently nocourse of action planned to remedy this.
The Role of Special Tokens
Table 4 evaluates SimpleTOD with different special tokens used toidentify components of the input corresponding to different sub-tasks. Analysis revealed that withoutend tokens, SimpleTOD tended to generate much longer belief state, action, and response generations.Even more important is clearly differentiating user and system text for SimpleTOD.
Pre-training
Table 5 highlights the importance of initializing SimpleTOD with pre-trained weights.A major advantage of recasting as single sequence prediction is the ability to leverage the understand-ing learned by these pre-trained models in the open-domain setting.
Robustness to Noisy Annotations
To understand the source of dialogue state tracking errors, weinvestigated MultiWOZ 2.1 annotations in depth. In the process, we have defined four primary typesof noisy-labels that could be considered mis-annotations:1. User provided multiple options, but context does not provide sufficient information todetermine the true belief state. 7 nd token User/System token Joint Acc Inform Success BLEU CombinedNo No 16.79 33.8 10.6 4.53 26.73Yes No 21.5 54.5 41.2 9.48 57.33No Yes 22.22 61.9 52.7 9.57 66.87Yes Yes 55.76 85 70.5 15.23 92.98
Table 4: Ablations on MultiWOZ 2.1 comparing the presence and absence of different specialtokens when representing TOD as a single sequence. Performance on all metrics drops without
Layers Pretrained Joint Acc Inform Success BLEU Combined6 Random 16.45 63.5 49.6 6.34 62.896 DistilGPT2 54.54 85 70.5 15.23 92.9812 Random 20.17 58.7 37.4 8.9 59.6512 GPT2 55.76 88 61.7 15.9 90.75
Table 5: Ablations on MultiWOZ 2.1 comparing the importance of pretraining. Recasting as singlesequence prediction enables fully leveraging pre-trained models for the language understanding theyhave gathered in an open-domain setting.2. Belief state is not labeled, but context provides sufficient information.3. Belief state is labeled, but context lacks necessary information.4. Belief state value is misspelled according to the context information.Together experimental results and this analysis indicate that SimpleTOD can track dialogue state andgenerate the correct output even in the presence of noisy labels. Concrete examples of noisy-labeledannotation in MultiWOZ can be found in the Supplementary Materials. All mis-annotated examplesalong with all code for replication are provided . Decoding
Initialized from pre-trained weights, SimpleTOD does not need to employ an advanced,more costly decoding strategy such as beam search, diverse beam search, and top-k sampling asopposed to HDSA ([9]) and DAMD ([58]). Our results are reported with simple greedy decoding. Ininitial experiments, we also tried nucleus sampling [22], but we found it degraded performance. Thisrelates to the observations in Keskar et al. [23] around controllable generation: when precision isrequired, sampling from the distribution is inherently less reliable than greedily sampling.
Full Dialogues, Multiple Turns, and Long Contexts
In further analysis, we found that Simple-TOD accurately tracks dialogue state over multiple turns and long contexts. In some cases, earlierbelief state errors are remedied later on when additional turns provide increased context. Examplesof full dialogues and those with many turns or especially long context can be found in SupplementaryMaterials, but we do not consider this further analysis as a primary contribution listed for the work.
We explored a simple approach to task-oriented dialogue (SimpleTOD) that uses a single, causallanguage model. To do this, during training we treat all inputs for dialogue state tracking, actionand response generation as a single sequence to the model. SimpleTOD can then directly leveragepre-trained models like GPT-2 to transfer language understanding from open-domain settings wheredata is more readily available. Empirical results on the multi-domain dialogue dataset (MultiWOZ)showed that the proposed approach outperformed all prior methods in dialogue state tracking as wellas in action and response generation in the end-to-end setting. We found that the pre-trained weightswere essential, but to leverage these weights fully we had to guide the system with special tokensthat mark user and system responses as well as different portions of the sequence related to differentsub-tasks. We found that SimpleTOD was effective at tracking dialogue state over long context withmany turns and required no more than greedy decoding to achieve new state-of-the-art results despitenoisy annotations. We hope that these results and the code, models, and discovered noisy annotationswill encourage further exploration of simple, unified approaches for dialogue systems. https://github.com/salesforce/simpletod Broader Impact
This work may have implications for the simplification of conversational agents. In the narrow sense,this work addresses task-oriented dialogue, but similar results might also hold for open-domainconversational systems. If so, the improvement of these systems and easier deployment wouldamplify both the positive and negative aspects of conversational AI. Positively, conversational agentsmight play a role in automating predictable communications, thereby increasing efficiency in areas ofsociety that currently lose time navigating the multitude of APIs, webpages, and telephonic systemsthat are used to achieve goals. Negatively, putting conversational agents at the forefront mightdehumanize communication that can be automated and might lead to frustration where human agentscould provide more efficient solutions – for example, when predicted solutions do not apply. Theseconsequences are not specific to this work, but should be considered by the field of conversational AImore broadly.
References [1] D. Adiwardana, M.-T. Luong, D. R. So, J. Hall, N. Fiedel, R. Thoppilan, Z. Yang, A. Kulshreshtha,G. Nemade, Y. Lu, et al. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977 ,2020.[2] J. Ba, R. Kiros, and G. E. Hinton. Layer normalization.
CoRR , abs/1607.06450, 2016.[3] S. Bao, H. He, F. Wang, and H. Wu. Plato: Pre-trained dialogue generation model with discrete latentvariable. arXiv preprint arXiv:1910.07931 , 2019.[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model.
Journal ofmachine learning research , 3(Feb):1137–1155, 2003.[5] A. Bordes, Y.-L. Boureau, and J. Weston. Learning end-to-end goal-oriented dialog. In
InternationalConference on Learning Representations , 2017.[6] P. Budzianowski and I. Vuli´c. Hello, it’s gpt-2–how can i help you? towards the use of pretrained languagemodels for task-oriented dialogue systems. arXiv preprint arXiv:1907.05774 , 2019.[7] P. Budzianowski, I. Casanueva, B.-H. Tseng, and M. Gasic. Towards end-to-end multi-domain dialoguemodelling. 2018.[8] L. Chen, B. Lv, C. Wang, S. Zhu, B. Tan, and K. Yu. Schema-guided multi-domain dialogue state trackingwith graph attention neural networks. 2020.[9] W. Chen, J. Chen, P. Qin, X. Yan, and W. Y. Wang. Semantically conditioned dialog response generationvia hierarchical disentangled self-attention. arXiv preprint arXiv:1905.12866 , 2019.[10] R. Child, S. Gray, A. Radford, and I. Sutskever. Generating long sequences with sparse transformers. arXivpreprint arXiv:1904.10509 , 2019.[11] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformersfor language understanding. arXiv preprint arXiv:1810.04805 , 2018.[12] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou, and H.-W. Hon. Unified languagemodel pre-training for natural language understanding and generation. In
Advances in Neural InformationProcessing Systems , pages 13042–13054, 2019.[13] M. Eric and C. D. Manning. Key-value retrieval networks for task-oriented dialogue. arXiv preprintarXiv:1705.05414 , 2017.[14] M. Eric, R. Goel, S. Paul, A. Sethi, S. Agarwal, S. Gao, and D. Hakkani-Tur. Multiwoz 2.1: Multi-domaindialogue state corrections and state tracking baselines. arXiv preprint arXiv:1907.01669 , 2019.[15] J. Gao, M. Galley, L. Li, et al. Neural approaches to conversational ai.
Foundations and Trends R (cid:13) inInformation Retrieval , 13(2-3):127–298, 2019.[16] S. Gao, Y. Zhang, Z. Ou, and Z. Yu. Paraphrase augmented task-oriented dialog generation. arXiv preprintarXiv:2004.07462 , 2020.[17] D. Ham, J.-G. Lee, Y. Jang, and K.-E. Kim. End-to-end neural pipeline for goal-oriented dialogue systemusing gpt-2. ACL, 2020.
18] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
Proceedings of theIEEE conference on computer vision and pattern recognition , pages 770–778, 2016.[19] M. Heck, C. van Niekerk, N. Lubis, C. Geishauser, H.-C. Lin, M. Moresi, and M. Gaši´c. Trippy: A triplecopy strategy for value independent neural dialog state tracking. arXiv preprint arXiv:2005.02877 , 2020.[20] M. Henderson, B. Thomson, and S. Young. Deep neural network approach for the dialog state trackingchallenge. In
Proceedings of the SIGDIAL 2013 Conference , 2013.[21] M. Henderson, I. Casanueva, N. Mrkši´c, P.-H. Su, I. Vuli´c, et al. Convert: Efficient and accurateconversational representations from transformers. arXiv preprint arXiv:1911.03688 , 2019.[22] A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi. The curious case of neural text degeneration. arXivpreprint arXiv:1904.09751 , 2019.[23] N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher. Ctrl: A conditional transformerlanguage model for controllable generation. arXiv preprint arXiv:1909.05858 , 2019.[24] W. Lei, X. Jin, M.-Y. Kan, Z. Ren, X. He, and D. Yin. Sequicity: Simplifying task-oriented dialoguesystems with single sequence-to-sequence architectures. In
Proceedings of the 56th Annual Meeting of theAssociation for Computational Linguistics , 2018.[25] W. Liang, Y. Tian, C. Chen, and Z. Yu. Moss: End-to-end dialog system framework with modularsupervision. arXiv preprint arXiv:1909.05528 , 2019.[26] B. Liu and I. Lane. Attention-based recurrent neural network models for joint intent detection and slotfilling. In
INTERSPEECH , 2016.[27] B. Liu and I. Lane. End-to-end learning of task-oriented dialogs. In
Proceedings of the 2018 Conference ofthe North American Chapter of the Association for Computational Linguistics: Student Research Workshop ,pages 67–73, 2018.[28] B. Liu, G. Tür, D. Hakkani-Tür, P. Shah, and L. Heck. Dialogue learning with human teaching and feedbackin end-to-end trainable task-oriented dialogue systems. In
Proceedings of the 2018 Conference of the NorthAmerican Chapter of the Association for Computational Linguistics: Human Language Technologies ,2018.[29] A. Madotto, C.-S. Wu, and P. Fung. Mem2seq: Effectively incorporating knowledge bases into end-to-endtask-oriented dialog systems. arXiv preprint arXiv:1804.08217 , 2018.[30] B. McCann, J. Bradbury, C. Xiong, and R. Socher. Learned in translation: Contextualized word vectors. In
Advances in Neural Information Processing Systems , pages 6294–6305, 2017.[31] S. Mehri, T. Srinivasan, and M. Eskenazi. Structured fusion networks for dialog. arXiv preprintarXiv:1907.10016 , 2019.[32] N. Mrkši´c, D. Ó Séaghdha, T.-H. Wen, B. Thomson, and S. Young. Neural belief tracker: Data-drivendialogue state tracking. In
Proceedings of the 55th Annual Meeting of the Association for ComputationalLinguistics , 2017.[33] A. Neelakantan, S. Yavuz, S. Narang, V. Prasad, B. Goodrich, D. Duckworth, C. Sankar, and X. Yan.Neural assistant: Joint action prediction, response generation, and latent knowledge reasoning. In
NeurIPS2019 Converstional AI Workshop , 2019.[34] E. Nouri and E. Hosseini-Asl. Toward scalable neural dialogue state tracking model. In
NeurIPS 2018Conversational AI Workshop , 2018.[35] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machinetranslation. In
ACL , 2002.[36] B. Peng, C. Zhu, C. Li, X. Li, J. Li, M. Zeng, and J. Gao. Few-shot natural language generation fortask-oriented dialog, 2020.[37] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep contextual-ized word representations. arXiv preprint arXiv:1802.05365 , 2018.[38] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervisedmultitask learners.
OpenAI Blog , 1(8):9, 2019.
39] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehensionof text. arXiv preprint arXiv:1606.05250 , 2016.[40] A. Rastogi, D. Hakkani-Tur, and L. Heck. Scalable multi-domain dialogue state tracking. In
Proceedingsof IEEE ASRU , 2017.[41] V. Sanh, L. Debut, J. Chaumond, and T. Wolf. Distilbert, a distilled version of bert: smaller, faster, cheaperand lighter. arXiv preprint arXiv:1910.01108 , 2019.[42] R. Sennrich, B. Haddow, and A. Birch. Neural machine translation of rare words with subword units. In
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers) , pages 1715–1725, Berlin, Germany, Aug. 2016. Association for Computational Linguistics. doi:10.18653/v1/P16-1162. URL .[43] L. Shu, P. Molino, M. Namazifar, B. Liu, H. Xu, H. Zheng, , and G. Tur. Incorporating the structure of thebelief state in end-to-end task-oriented dialogue systems. In
NeurIPS 2018 Converstional AI Workshop ,2018.[44] R. W. Smith and D. R. Hipp.
Spoken natural language dialog systems: A practical approach . OxfordUniversity Press on Demand, 1994.[45] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin.Attention is all you need. In
Advances in neural information processing systems , pages 5998–6008, 2017.[46] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. Glue: A multi-task benchmark andanalysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 , 2018.[47] T.-H. Wen, M. Gaši´c, N. Mrkši´c, P.-H. Su, D. Vandyke, and S. Young. Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In
Proceedings of the 2015 Conference onEmpirical Methods in Natural Language Processing , 2015.[48] T.-H. Wen, D. Vandyke, N. Mrksic, M. Gasic, L. M. Rojas-Barahona, P.-H. Su, S. Ultes, and S. Young.A network-based end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562 ,2016.[49] T.-H. Wen, Y. Miao, P. Blunsom, and S. Young. Latent intention dialogue models. In
Proceedings of the34th International Conference on Machine Learning , 2017.[50] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz,et al. Transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 , 2019.[51] C.-S. Wu, A. Madotto, E. Hosseini-Asl, C. Xiong, R. Socher, and P. Fung. Transferable multi-domain stategenerator for task-oriented dialogue systems. arXiv preprint arXiv:1905.08743 , 2019.[52] C.-S. Wu, R. Socher, and C. Xiong. Global-to-local memory pointer networks for task-oriented dialogue. arXiv preprint arXiv:1901.04713 , 2019.[53] C.-S. Wu, S. Hoi, R. Socher, and C. Xiong. Tod-bert: Pre-trained natural language understanding fortask-oriented dialogues. arXiv preprint arXiv:2004.06871 , 2020.[54] Q. Wu, Y. Zhang, Y. Li, and Z. Yu. Alternating recurrent dialog model with large-scale pre-trained languagemodels. arXiv preprint arXiv:1910.03756 , 2019.[55] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressivepretraining for language understanding. In
Advances in neural information processing systems , pages5754–5764, 2019.[56] S. Young, M. Gaši´c, B. Thomson, and J. D. Williams. Pomdp-based statistical spoken dialog systems: Areview.
Proceedings of the IEEE , 101(5):1160–1179, 2013.[57] J.-G. Zhang, K. Hashimoto, C.-S. Wu, Y. Wan, P. S. Yu, R. Socher, and C. Xiong. Find or classify? dualstrategy for slot-value predictions on multi-domain dialog state tracking. arXiv preprint arXiv:1910.03544 ,2019.[58] Y. Zhang, Z. Ou, and Z. Yu. Task-oriented dialog systems that consider multiple appropriate responsesunder the same context. arXiv preprint arXiv:1911.10484 , 2019.[59] Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, and B. Dolan. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536 ,2019.
60] T. Zhao, A. Lu, K. Lee, and M. Eskenazi. Generative encoder-decoder models for task-oriented spokendialog systems with chatting capability. In
Proceedings of the 18th Annual SIGdial Meeting on Discourseand Dialogue , 2017.[61] T. Zhao, K. Xie, and M. Eskenazi. Rethinking action spaces for reinforcement learning in end-to-enddialog agents with latent variable models. arXiv preprint arXiv:1902.08858 , 2019.[62] L. Zhou and K. Small. Multi-domain dialogue state tracking as dynamic knowledge graph enhancedquestion answering. arXiv preprint arXiv:1911.06192 , 2019. Input Representation and Method Overview
As described in Section 3, a single training sequence consists of the concatenation of context C t , belief states B t , database search results D t , action decisions A t , and system response S t . A schematic overview of eachsegment is shown in Table 6 together with special tokens marking transition points. SimpleTOD is optimizedby minimizing the negative likelihood over the joint sequence x t = [ C t ; B t ; D t ; A t ; S t ] . The output stateassociated with each input token is used to predict the next token, see Figure 2a.During inference, SimpleTOD generates this sequence token by token, but we stop after belief states aregenerated to query from a database. The outputs of the database are summarized and concatenated to the end ofthe input sequence and generation resumes token by token. This results in a delexicalized response, see Figure 2b.This response can then be lexicalized by replacing slots and values with information from the database results.This process is described more formally in the equations of Section 6.Context [context] [user] user input [system] system response . . . [user] user input [endofcontext]Belief State [belief] domain slot_name value, domain slot_name value, . . . [endofbelief]DB Search [db] [endofdb]Action [action] domain action_type slot_name, domain action_type slot_name, . . . [endofaction]Response [response] system delexicalized response [endofresponse] Table 6: A schematic representation of the different components of inputs/outputs in task-orienteddialogue. When training SimpleTOD, these are concatenated together into a single sequence.13 nput is a single sequenceoutput state for each token predicts the next token generate one token at a timeadd generated tokens to the input sequence for the next step of generation query the database with the generated belief statecontinue generating until the response is finished a) trainingb) inference user input user inputsystem response belief state db search results actions delex. response
SimpleTOD user inputuser input system response belief stateuser inputuser input user inputuser inputuser inputuser inputsystem responseuser inputuser input system responseuser inputuser input system response actions delexicalized responseuser inputuser input system responseuser inputuser input system response databasedb search results belief stateuser inputuser input system responseuser inputuser input system response db search resultsSimpleTODSimpleTODSimpleTODSimpleTOD
Figure 2: SimpleTOD is a simple approach to task-oriented dialogue that approaches all of task-oriented dialogue as a single sequence generation problem, querying a database for necessaryinformation. 14
SimpleTOD with Oracle information
This section reports the performance of SimpleTOD for action and response generation, in the presence ofdifferent oracle information, i.e. oracle belied and oracle action. These settings are not end-to-end as in themain text, and SimpleTOD is designed to be end-to-end. We report results in these settings for a completeunderstanding of SimpleTOD, but we note that only the end-to-end settings in the main text evaluate the fullsystem together. In these oracle settings, other methods can outperform SimpleTOD, but this simply highlightsthe importance of end-to-end evaluation: there is a disconnect between performance with oracle informationand performance without it. In practical use, oracle information is not available, and that is where SimpleTODexcels.We report results in Table 7 for two different settings regularly employed in the literature. These settings aredetermined by how much oracle information is used. The first setting uses oracle belief states and oracle actions.The second uses oracle belief states, but requires the system to generate its own actions.Note that all prior works use oracle DB Search results as supervision during training and as input during inferencein all these settings. We include directly comparable experiments using oracle DB Search results for all settings.We also include experiments that completely ignore the DB Search results in all settings to show the surprisingeffectiveness of SimpleTOD without DB Search results.The evaluation results on MultiWOZ 2.1, as shown in Table 8, also follow the same patterns as discussed insection 4.2. We provide these results for future comparisons on the improved version of the dataset.Model Belief State DB Search Action Inform Success BLEU CombinedDAMD+augmentation oracle oracle oracle
Table 7: SimpleTOD results on MultiWOZ 2.0 using oracle information.
Belief State DB Search Action Inform Success BLEU Combinedoracle oracle oracle 92.8 84.5 18.9 107.55oracle - oracle 92.6 86.1 17.67 107.2oracle oracle generated 85.1 73.5 16.22 95.52oracle - generated 89.6 68.6 15.46 94.56
Table 8: SimpleTOD results on MultiWOZ 2.1 using oracle information.15
Dialogue State Tracking Analysis
This section provides more detailed analysis of Dialogue State Tracking (DST) task performance by SimpleTOD,as mentioned in section 5.
Understanding Long, Multi-domain Context
Table 9 indicates the DST performance of SimpleTODin case of multi-domain and long context. This example also shows understanding dialogue states, where slotsacross domains are related. For example, taxi departure and taxi destination should be inferred from attractionname and hotel name . Context (’MUL1015’, turn 10) <|context|> <|user|> i am looking for a pool somewhere in the south of cambridge . <|system|> i amsorry , but it does not look like we have a pool that matches your criteria . <|user|> how about sometype of theater ? <|system|> the junction theatre is in the south area . would you like the address? <|user|> yes , please provide the address and postcode as well . <|system|> the junction theatre islocated on clifton way at postcode cb17gx . is there anything else i can assist you with ? <|user|>thanks . yes , i also need to find a guesthouse for my stay there . <|system|> are you needing theguesthouse in the south , also ? do you have a preference in price range ? <|user|> the area should bethe centre . i need it to be cheap and have free parking . <|system|> i have 2 guesthouses that fit yourcriteria . 1 is the alexander bed and breakfast and the other is el shaddai . <|user|> what is the starrating -s for them ? <|system|> the alexander bed and breakfast has a 4 star rating while the el shaddaihas 0 stars . would you like to book a room with either of these ? <|user|> ok great , thank you . ialso need to book a taxi to get back and forth . <|system|> where did you wan to leave from and go to? <|user|> i’ll be leaving the theatre at 14:45 and going to the guesthouse . <|system|> let s back up amoment . which lodgings did you prefer , so we know where the taxi will be taking you ? <|user|> lets go with the alexander bed and breakfast . <|system|> your booking is complete , a grey volkswagenwill be picking you up , contact number 07753429052 . can i help with anything else ? <|user|> coolbeans ! sorry i was so much trouble . thanks so much for all of your help , you need a raise dealingwith customers like me ! <|endofcontext|>
Target <|belief|> attraction area south, attraction name the junction, attraction type theatre, hotel area centre,hotel name alexander bed and breakfast, hotel parking yes, hotel pricerange cheap, hotel type guest-house, taxi departure the junction, taxi destination alexander bed and breakfast, taxi leaveat 14:45<|belief|> <|action|> general bye none <|endofaction|> <|response|> it was no trouble at all ! i hope youhave a fantastic trip and enjoy your stay ! <|endofresponse|>
SimpleTOD <|belief|> attraction area south, attraction name the junction, attraction type theatre, hotel area centre,hotel name alexander bed and breakfast, hotel parking yes, hotel pricerange cheap, hotel type guest-house, taxi departure the junction, taxi destination alexander bed and breakfast, taxi leaveat 14:45<|endofbelief|> <|action|> general bye none <|endofaction|> <|response|> thank you for using our ser-vice <|endofresponse|>
Table 9: Example of multi-domain dialogue state tracking for long context, where dialogue contextspanning over 3 domains, i.e. attraction, hotel, and taxi. The belief states for taxi departure anddestination (highlighted) should be inferred from hotel and attraction domains.
Regarding Noisy Annotations
Concrete examples of four types of noisy-labeled annotation defined insection 5, are shown in Tables 10, 11, 12, 13 and 14. The results indicate that SimpleTOD is robust to noisyannotation and can often generate the correct belief state in situations where annotation is incorrect. As mentionedin section 4.1, the list of noisy annotations is released with this paper.
Full Dialogue Example
Table 15 and Table 16 include several turns as part of a full dialogue for delexical-ized and lexicalized outputs. These are from our strongest SimpleTOD model in the setting that requires it togenerate all outputs, which means that it ignores all DB Search results. ontext (’SNG01608’, turn 2) <|context|> <|user|> are there any portuguese restaurant -s in cambridge ? <|system|> yes there isa portuguese restaurant in cambridge with 2 different location -s , would you like the addresses ?<|user|> if 1 of them has a moderate price range please give me that address . if not tell me aboutturkish restaurant -s instead . <|endofcontext|> Target <|belief|> restaurant food turkish, restaurant pricerange moderate <|endofbelief|>
SimpleTOD <|belief|> restaurant food portugese, restaurant pricerange moderate<|endofbelief|>
Context (’MUL1575’, turn 3) <|context|> <|user|> i would like to find a particular restaurant in cambridge . the name of the restaurantis restaurant 2 two . could you give me the location ? <|system|> restaurant 2 two is nice frenchrestaurant located at 22 chesterton road chesterton . would like me to book you a table ? <|user|> thatwould be great . i need it for 8 on friday . <|system|> do you have a time preference ? <|user|> yes at11:15 if that s not available i can do 10:15 <|endofcontext|>
Target <|belief|> restaurant book day friday, restaurant book people 8, restaurant book time 10:15, restaurantname restaurant 2 two<|endofbelief|>
SimpleTOD <|belief|> restaurant book day friday, restaurant book people 8, restaurant book time 11:15, restaurantname restaurant 2 two <|endofbelief|>
Context (’PMUL1194’, turn 6) <|context|> <|user|> hello , i need information about old schools . <|system|> there are 5 local colleges, do you have a particular area you are interested in ? <|user|> i do not have an area preference .<|system|> old schools is an architectural attraction in the center of town . they are located on trinitylane , in postcode cb21tt . <|user|> what is the phone number ? <|system|> the phone number is01223332320 . do you need any more information ? <|user|> yes , i am looking for a train that leaveson wednesday for cambridge <|system|> i have over 200 entries . let s try to narrow that down . whattime would you like to arrive ? <|user|> monday at 11:00 for 5 people please <|system|> just to confirm, you want to leave on monday , or wednesday ? and where are you departing from ? <|user|> leavingely anytime after 15:15 and i need to know how long the ride is and the specific arrival time please .<|endofcontext|>
Target <|belief|> attraction name old schools, train book people 5, train day monday, train departure ely, traindestination cambridge, train leaveat 15:15 <|endofbelief|> 0
SimpleTOD <|belief|> attraction name old schools, train book people 5, train day wednesday, train departure ely,train destination cambridge, train leaveat 15:15 <|endofbelief|>
Context (’MUL1024’, turn 7) <|context|> <|user|> i am looking for a theatre to visit in the centre of cambridge . can you help mewith this ? <|system|> i can recommend the adc theatre on park street or the cambridge arts theatrelocated at 6 st edward s passage <|user|> are either in the city center ? <|system|> both are locatedin the city centre . <|user|> can you give me the address of the adc theatre please ? <|system|> theaddress is park street . <|user|> what films are playing today ? <|system|> i am sorry . i do not havethat information . can i help you with anything else ? <|user|> i am looking for a particular hotel calledavalon . <|system|> avalon is located at 62 gilbert road . would you like me to book a reservation ?<|user|> great , can i book that for 6 people for 3 nights starting on tuesday . <|system|> i am sorry , thebooking was unsuccessful . would you like to try for another date or a shorter stay ? <|user|> wouldyou be interested in staying at the avalon for 1 night ? if not we can find another hotel available for a3 night stay for 6 guests . <|endofcontext|>
Target <|belief|> attraction area centre, attraction name adc theatre, attraction type theatre, hotel book daytuesday, hotel book people 6, hotel book stay 3, hotel name avalon <|endofbelief|>
SimpleTOD <|belief|> attraction area centre, attraction name adc theatre, attraction type theatre, hotel book daytuesday, hotel book people 6, hotel book stay 1, hotel name avalon <|endofbelief|>
Table 10: Examples of Type 1 noisy-labeled annotation, context lacks enough information to inferthe true belief state. 17 ontext (’MUL0088’, turn 2) <|context|> <|user|> i am looking for a cheap hotel with free parking near cambridge . <|system|> ihave multiple cheap hotel -s with free parking . what part of town are you interested in staying in ?<|user|> i would like to stay close to the center area , but the hotel should be 3 star . <|endofcontext|>
Target <|belief|> hotel parking yes, hotel pricerange cheap <|endofbelief|>
SimpleTOD <|belief|> hotel area centre, hotel parking yes, hotel pricerange cheap, hotel stars 3 <|endofbelief|>
Context (’PMUL2437’, turn 3) <|context|> <|user|> i want to find a moderate -ly priced restaurant . <|system|> i have many optionsavailable for you ! is there a certain area or cuisine that interests you ? <|user|> yes i would like therestaurant to be located in the center of the attractions . <|system|> there are 21 restaurant -s availablein the centre of town . how about a specific type of cuisine ? <|user|> i need to know the food type andpostcode and it should also have mutliple sports <|endofcontext|>
Target <|belief|> restaurant area centre, restaurant pricerange moderate <|endofbelief|>
SimpleTOD <|belief|> attraction type multiple sports, restaurant area centre, restaurant pricerange moderate <|end-ofbelief|>
Context (’PMUL2437’, turn 3) <|context|> <|user|> i want to find a moderate -ly priced restaurant . <|system|> i have many optionsavailable for you ! is there a certain area or cuisine that interests you ? <|user|> yes i would like therestaurant to be located in the center of the attractions . <|system|> there are 21 restaurant -s availablein the centre of town . how about a specific type of cuisine ? <|user|> i need to know the food type andpostcode and it should also have mutliple sports <|endofcontext|>
Target <|belief|> restaurant area centre, restaurant pricerange moderate <|endofbelief|>
SimpleTOD <|belief|> attraction type multiple sports, restaurant area centre, restaurant pricerange moderate <|end-ofbelief|>
Context (’MUL1060’, turn 4) <|context|> <|user|> hello , i would like to find a hotel that include -s free parking . <|system|> mostof the hotel -s in town offer free parking . is there a certain area you would like to stay in , or do youhave a price range in mind ? <|user|> yes . the centre would be nice and also free wifi . <|system|>the university arms is an expensive , 4 star hotel with free wifi . comparatively , the alexander bed andbreakfast is a cheap -ly priced guesthouse , also 4 stars . <|user|> please book me some rooms for theuniversity arms to accommodate 8 people for 3 nights starting on wednesday . can you also provideme the reference number after you book ? <|system|> your reference number is x5ny66zv . <|user|>thank you . can you please help me find a place to go in town in the same area as the hotel ? preferablya college . <|endofcontext|>
Target <|belief|> attraction area centre, attraction name college, hotel area centre, hotel book day wednesday,hotel book people 8, hotel book stay 3, hotel name university arms hotel, hotel parking yes <|endofbe-lief|>
SimpleTOD <|belief|> attraction area centre, attraction type college, hotel area centre, hotel book day wednesday,hotel book people 8, hotel book stay 3, hotel internet yes, hotel name university arms hotel, hotelparking yes <|endofbelief|>
Context (’MUL1642’, turn 5) <|context|> <|user|> hello , i am trying to find a train that goes from cambridge to london kings cross. can you help me book a ticket ? <|system|> i can help with that . can you tell me what day you willbe traveling ? <|user|> i need to leave on saturday after 18:45 . <|system|> the soonest departure timewould be at 19:00 on saturday , is that okay ? <|user|> yes , that s perfect . can you book that for 8people ? <|system|> you are all booked with reference number 144vdbrm . the cost of 151.04 gbp willbe payable at the station . can i be of further assistance today ? <|user|> i am looking for an expensiveplace to eat in the centre , what is there that fits that criteria ? <|system|> there 33 place -s that fit yourcriteria . do you have a particular cuisine type in mind so that i can narrow the results down ? <|user|>it does not matter what kind of food . what would you recommend for a large group of 8 people ?<|endofcontext|>
Target <|belief|> restaurant area centre, restaurant pricerange expensive, train book people 8, train day satur-day, train departure cambridge, train destination london kings cross, train leaveat 18:45 <|endofbelief|>
SimpleTOD <|belief|> restaurant area centre, restaurant book people 8, restaurant pricerange expensive, train bookpeople 8, train day saturday, train departure cambridge, train destination london kings cross, trainleaveat 18:45 <|endofbelief|>
Context (’MUL0088’, turn 7) <|context|> <|user|> i am looking for a cheap hotel with free parking near cambridge . <|system|> ihave multiple cheap hotel -s with free parking . what part of town are you interested in staying in ?<|user|> i would like to stay close to the center area , but the hotel should be 3 star . <|system|> we donot have any hotel -s that match your search . do you want to try something else ? <|user|> are thereany moderate 3 star hotel -s with free parking ? <|system|> we do not have any hotel -s that matchyour search . do you want to try something else ? <|user|> how about 1 in the moderate price range ?<|system|> we have 3 entries that match your preferences . would you prefer north , south , or west ?<|user|> i do not have a preference but i would also like ot find a restaurant called the cow pizza kitchenand bar . <|system|> before i book your restaurant would you like to book your lodging ? i think youwill like hamilton lodge . it meets your needs . <|user|> does it have internet ? <|system|> yes , thehamilton lodge has internet . <|user|> sounds great . what is the address and contact information ?<|endofcontext|>
Target <|belief|> hotel internet yes, hotel parking yes, hotel pricerange moderate, hotel stars 3, restaurantname cow pizza kitchen and bar <|endofbelief|>
SimpleTOD <|belief|> hotel internet yes, hotel name hamilton lodge, hotel parking yes, hotel pricerange moderate,hotel stars 3, restaurant name cow pizza kitchen and bar <|endofbelief|>
Table 11: Examples of Type 2 noisy-labeled annotation, belief state is not labeled, while contextcontains the information. 18 ontext (’MUL1926’, turn 1) <|context|> <|user|> yes hello ! i am looking for a train leaving on tuesday from norwich to cambridge. <|endofcontext|>
Target <|belief|> train departure norwich, train destination cambridge <|endofbelief|>
SimpleTOD <|belief|> train day tuesday, train departure norwich, train destination cambridge <|endofbelief|>
Context (’MUL1650’, turn 3) <|context|> <|user|> i am looking for a chinese restaurant in the centre of cambridge . <|system|> thereare quite a few chinese restaurant -s in the area ! do you have a price range you would like to staywithin ? <|user|> no , anything is fine . can you suggest a great 1 ? <|system|> jinling noodle bar lookslike a good place . <|user|> can i have their phone number and address ? also , are there any trains thatgo from peterborough to cambridge ? <|endofcontext|>
Target <|belief|> restaurant area centre, restaurant food chinese, restaurant name jinling noodle bar <|endofbe-lief|>
SimpleTOD <|belief|> restaurant area centre, restaurant food chinese, restaurant name jinling noodle bar, train de-parture peterborough, train destination cambridge <|endofbelief|>
Context (’MUL2376’, turn 3) <|context|> <|user|> where would i find the museum of archaeology and anthropology ? <|system|>the museum of archaelogy and anthropology is located at university of cambridge , downing street .their phone number is 01223333516 . do you need anything else ? <|user|> and the attraction type isa museum , right ? <|system|> yes , it is a museum . <|user|> sounds great . thank you for the info !<|endofcontext|>
Target <|belief|> attraction name museum of archaelogy and anthropology <|endofbelief|>
SimpleTOD <|belief|> attraction name museum of archaelogy and anthropology, attraction type museum <|endof-belief|>
Context (’MUL1076’, turn 9) <|context|> <|user|> yes i am looking for some hotel -s that include free parking and cheap rates in thecambridge area . <|system|> there are 10 hotel -s , do you have any other preferences to narrow it down? <|user|> sure . i would like a guesthouse . <|system|> there are 9 options available . do you have apreference for how many stars or what area specifically ? <|user|> i do not care . please book 1 thatyou recommend . <|system|> ok , how about the allenbell ? to book a stay i’ll need to know the firstday you want to stay , how many days , and how many guests . <|user|> i need it for 8 people , and for4 nights starting saturday . <|system|> great , i have booked your stay for 8 people for 4 nights . yourreference number is pb41wuot . may i help with anything else ? <|user|> i also would like to find amuseum in the west . <|system|> there are several museums there , any preferences ? <|user|> i wouldlike to find the museum of science , please . <|system|> i am sorry , but i am having trouble locatingyour requested attraction . can you please restate your request ? <|user|> is there a museum of science? <|system|> i am not finding 1 , sorry . <|user|> are there any other museums in that area ? <|system|>yes there are 7 museums in the west area of town for your pleasure . is there anything else you wouldlike to narrow them down with ? <|user|> the museum i want should be in the west side of town andcan i have the postcode for the museum . <|endofcontext|>
Target <|belief|> attraction area west, attraction type museum, hotel book day saturday, hotel book people 8,hotel book stay 4, hotel parking yes, hotel pricerange cheap, hotel type guesthouse <|endofbelief|>
SimpleTOD <|belief|> attraction area west, attraction type museum, hotel book day saturday, hotel book people8, hotel book stay 4, hotel name allenbell, hotel parking yes, hotel pricerange cheap, ’hotel typeguesthouse <|endofbelief|>
Table 12: Examples of Type 2 noisy-labeled annotation, belief state is not labeled, while contextcontains the information. 19 ontext (’SNG1036’, turn 1) <|context|> <|user|> i was not planning to stay in town tonight , but it looks like i am going to have to .can you help me find a pretty cheap room ? <|endofcontext|>
Target <|belief|> hotel pricerange cheap, hotel type hotel <|endofbelief|>
SimpleTOD <|belief|> hotel pricerange cheap <|endofbelief|>
Context (’PMUL1194’, turn 1) <|context|> <|user|> hello , i need information about old schools . <|endofcontext|>
Target <|belief|> attraction type college <|endofbelief|>
SimpleTOD <|belief|> attraction name old schools <|endofbelief|>
Context (’SNG0284’, turn 2) <|context|> <|user|> i want a train from london liverpool street arriving at 9:30 <|system|> which daywill you be traveling ? <|user|> i want to leave on thursday .
Target <|belief|> train arriveby 09:30, train book people 2, train day thursday, train departure london liverpoolstreet <|endofbelief|>
SimpleTOD <|belief|> train arriveby 09:30, train day thursday, train departure london liverpool street
Context (’PMUL0069’, turn 2) <|context|> <|user|> i am looking for some hungarian food restaurant -s near the centre , please . <|sys-tem|> i am sorry there are no hungarian restaurant -s near centre . <|user|> what kind of expensiverestaurant -s are in the center of town ?
Target <|belief|> restaurant area centre, restaurant food hungarian, restaurant pricerange expensive <|endofbe-lief|>
SimpleTOD <|belief|> restaurant area centre, restaurant pricerange expensive <|endofbelief|>
Context (’PMUL3688’, turn 3) <|context|> <|user|> i am looking for a place to go in the centre of town . <|system|> is there any typeof attraction you would like to see ? <|user|> any of your choice . get me the address and entrancefee <|system|> the holy trinity church is located on market street . the entrance fee is free . is thereanything else i can help with ? <|user|> i also need a train out to the airport on friday . <|endofcontext|>
Target <|belief|> attraction area centre, train day friday, train destination stansted airport<|endofbelief|>
SimpleTOD <|belief|> attraction area centre, train day friday <|endofbelief|>
Table 13: Examples of Type 3 noisy-labeled annotation, belief state is labeled, while context lacksthe information.
Context (’SNG02207’, turn 1) <|context|> <|user|> i need to book a tax departing from gandhi . <|endofcontext|>
Target <|belief|> taxi departure the gandhi <|endofbelief|>
SimpleTOD <|belief|> taxi departure gandhi <|endofbelief|>
Context (’MUL1811’, turn 2) <|context|> <|user|> i am looking for something to do in town . are there any interesting attraction inthe north part of town ? <|system|> yes , there are several places: i see a boat , a park , and 2 swimmingpools . would 1 of these work for you ? <|user|> ooh , a pool would be fun . can you give me theircontact info ? <|endofcontext|>
Target <|belief|> attraction area north, attraction type pool <|endofbelief|>
SimpleTOD <|belief|> attraction area north, attraction type swimming pool <|endofbelief|>
Context (’PMUL0548’, turn 4) <|context|> <|user|> hello , i would like some information on a restaurant . <|system|> do you have acertain cuisine or area in mind ? <|user|> i want british food . in the east , please . <|system|> thegraffton hotel restaurant is in the east and serves british food . it s location is 619 newmarket rd . fenditton . <|user|> i will need it for saturday , 7 people at 17:45 please . <|system|> i have successfullybooked a table for 7 on saturday at 17:45 for you . the table will be reserved for 15 minutes and yourreference number is r8cv0hu9 . <|user|> i am also looking for a train on sunday . <|endofcontext|>
Target <|belief|> restaurant area east, restaurant book day saturday, restaurant book people 7, restaurant booktime 17:45, restaurant food british, restaurant name grafton hotel restaurant, train day sunday <|endof-belief|>
SimpleTOD <|belief|> restaurant area east, restaurant book day saturday, restaurant book people 7, restaurant booktime 17:45, restaurant food british, restaurant name graffton hotel restaurant, train day sunday <|end-ofbelief|>
Context (’PMUL0550’, turn 7) <|context|> <|user|> hey i am looking for a train from cambridge to bishops stortford . mind helpingout ? <|system|> many trains are leaving on friday . mind choosing the time ? <|user|> i want to leaveon monday and arrive by 18:45 . <|system|> i have train tr4283 that leaves cambridge at 5:29 andarrives in bishops stortford at 6:07 . would you like to make reservations ? <|user|> could you give methe travel time and price of that train please ? <|system|> the cost is 10.10 pounds . the duration ofthe trip is 38 minutes , leaving cambridge at 5:29 and arriving at 6:07 in bishops stortford . <|user|> ialso want a cheap chinese restaurant . <|system|> there are 4 great restaurant -s to choose from . doyou prefer 1 on the south side of town or in the centre ? <|user|> i would like to be in the centre please. <|system|> i recommend charlie chan . would you like to reserve a table ? <|user|> yes . monday, 8 people , 10:30 . <|system|> booking was successful . the table will be reserved for 15 minutes .reference number is : ifjmbkli . <|user|> thank you very much . that will be all for today . i appreciateall your help . <|endofcontext|>
Target <|belief|> restaurant area centre, restaurant book day monday, restaurant book people 8, restaurant booktime 10:30, restaurant food chinese, restaurant name charlie, restaurant pricerange cheap, train arriveby18:45, train day monday, train departure cambridge, train destination bishops stortford <|endofbelief|>
SimpleTOD <|belief|> restaurant area centre, restaurant book day monday, restaurant book people 8, restaurantbook time 10:30, restaurant food chinese, restaurant name charlie chan, restaurant pricerange cheap,train arriveby 18:45, train day monday, train departure cambridge, train destination bishops stortford<|endofbelief|>
Table 14: Examples of Type 4 noisy-labeled annotation, belief state value is misspelled.20 urn 1
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|endofcontext|>Target <|belief|> train destination ely <|endofbelief|> <|action|> train inform choice, train inform day, traininform dest, train request arrive <|endofaction|> <|response|> we have [value_count] trains traveling to[value_place] on [value_day] . what time would you like to arrive ? <|endofresponse|>SimpleTOD <|belief|> train day sunday, train destination ely <|endofbelief|> <|action|> train inform choice, train in-form day, train inform leave, train inform depart, train inform dest, train inform arrive, train inform id,train offerbook none <|endofaction|> <|response|> there are [value_count] trains leaving [value_place]on [value_day]. [train_id] leaves [value_place] at [value_time] and arrives in [value_place] at[value_time]. would you like to book a ticket? <|endofresponse|>
Turn 2
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00departing from cambridge . <|endofcontext|>Target <|belief|> train destination ely, train day sunday, train arriveby 13:00, train departure cambridge <|end-ofbelief|> <|action|> train inform arrive, train inform day, train inform dest, train inform id, trainofferbook none <|endofaction|> <|response|> [train_id] will get you to [value_place] by [value_time]on [value_day] . should i book you a ticket ? <|endofresponse|>SimpleTOD <|belief|> train day sunday, train destination ely, train departure cambridge, train arriveby 13:00 <|end-ofbelief|> <|action|> train inform arrive, train inform leave, train inform id, train offerbook none <|end-ofaction|> <|response|> [train_id] leaves at [value_time] and arrives at [value_time]. would you likeme to book it for you? <|endofresponse|>
Turn 3
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00departing from cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i bookyou a ticket ? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a referencenumber ? <|endofcontext|>Target <|belief|> train destination ely, train day sunday, train arriveby 13:00, train departure cambridge, trainbook people 8 <|endofbelief|> <|action|> train offerbooked ref, train offerbooked ticket <|endofaction|><|response|> booking was successful , the total fee is [value_price] gbp payable at the station . referencenumber is [train_reference] . <|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, train day sunday, train departure cambridge, traindestination ely <|endofbelief|> <|action|> general reqmore none, train offerbooked ref, train offer-booked ticket <|endofaction|> <|response|> booking was successful, the total fee is [value_price] gbppayable at the station. reference number is [train_reference]. is there anything else i can help youwith? <|endofresponse|>
Turn 4
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction recommend name, gen-eral greet none <|endofaction|> <|response|> how about [attraction_name] ? it looks lovely <|endofre-sponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction recommend name, at-traction recommend area, attraction recommend fee, attraction recommend addr, attraction recommendphone <|endofaction|> <|response|> i would recommend the [attraction_name] in the [value_area]. itis located at [attraction_address]. the entrance fee is [value_count] pounds. the phone number is[attraction_phone] <|endofresponse|>
Turn 5
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|system|> howabout cafe jello gallery ? it looks lovely <|user|> that will work . could you get me the phone numberand area its in ? <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, train de-parture cambridge, train destination ely <|endofbelief|> <|action|> general greet none, general reqmorenone, attraction inform phone, attraction inform area <|endofaction|> <|response|> sure ! it s in the[value_area] area of town , and the phone number is [attraction_phone] . can i help you with anythingelse ? <|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction inform phone, attrac-tion inform area, attraction inform name<|endofaction|> <|response|> [attraction_name] is in the[value_area] area and the phone number is [attraction_phone] <|endofresponse|>
Turn 6
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|system|> howabout cafe jello gallery ? it looks lovely <|user|> that will work . could you get me the phone numberand area its in ? <|system|> sure ! it s in the west area of town , and the phone number is 01223312112. can i help you with anything else ? <|user|> you were great today . thanks so much for all the help .that is all . goodbye . <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, train de-parture cambridge, train destination ely <|endofbelief|> <|action|> general bye none, general welcomenone, general greet none <|endofaction|> <|response|> it was a pleasure to help . have a good day .goodbye . <|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> general bye none <|endofaction|><|response|> thank you for using our service <|endofresponse|>
Table 15:
SimpleTOD end-to-end generation (delexicalized) on MultiWOZ (dialogue ’PMUL3663’) .21 urn 1
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|endofcontext|>Target <|belief|> train destination ely <|endofbelief|> <|action|> train inform choice, train inform day, traininform dest, train request arrive <|endofaction|> <|response|> we have 10 trains traveling to ely onsunday . what time would you like to arrive ? <|endofresponse|>SimpleTOD <|belief|> train day sunday, train destination ely <|endofbelief|> <|action|> train inform choice, traininform day, train inform leave, train inform depart, train inform dest, train inform arrive, train informid, train offerbook none <|endofaction|> <|response|> there are 10 trains leaving cambridge on sun-day. tr3177 leaves cambridge at 9:50 and arrives in ely at 10:07. would you like to book a ticket?<|endofresponse|>
Turn 2
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00departing from cambridge . <|endofcontext|>Target <|belief|> train destination ely, train day sunday, train arriveby 13:00, train departure cambridge <|end-ofbelief|> <|action|> train inform arrive, train inform day, train inform dest, train inform id, trainofferbook none <|endofaction|> <|response|>tr1159 will get you to ely by 12:07 on sunday . should ibook you a ticket ? <|endofresponse|>SimpleTOD <|belief|> train day sunday, train destination ely, train departure cambridge, train arriveby 13:00 <|end-ofbelief|> <|action|> train inform arrive, train inform leave, train inform id, train offerbook none <|end-ofaction|> <|response|> tr1159 leaves at 11:50 and arrives at 12:07. would you like me to book it foryou? <|endofresponse|>
Turn 3
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00departing from cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i bookyou a ticket ? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a referencenumber ? <|endofcontext|>Target <|belief|> train destination ely, train day sunday, train arriveby 13:00, train departure cambridge, trainbook people 8 <|endofbelief|> <|action|> train offerbooked ref, train offerbooked ticket <|endofaction|><|response|>booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis hdx8ix63. <|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, train day sunday, train departure cambridge, traindestination ely <|endofbelief|> <|action|> general reqmore none, train offerbooked ref, train offer-booked ticket <|endofaction|> <|response|> booking was successful, the total fee is 28.16 gbp payableat the station. reference number is hdx8ix63. is there anything else i can help you with? <|endofre-sponse|>
Turn 4
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction recommend name, gen-eral greet none <|endofaction|> <|response|> how about cafe jello gallery ? it looks lovely <|endofre-sponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction recommend name, at-traction recommend area, attraction recommend fee, attraction recommend addr, attraction recommendphone <|endofaction|> <|response|> i would recommend the cafe jello gallery in the west. it is locatedat 13 magdalene street. the entrance fee is 0 pounds. the phone number is 01223464646 <|endofre-sponse|>
Turn 5
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|system|> howabout cafe jello gallery ? it looks lovely <|user|> that will work . could you get me the phone numberand area its in ? <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> general greet none, general req-more none, attraction inform phone, attraction inform area <|endofaction|> <|response|> sure ! it s inthe west area of town , and the phone number is 01223464646 . can i help you with anything else ?<|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> attraction inform phone, attractioninform area, attraction inform name<|endofaction|> <|response|> cafe jello gallery is in the west areaand the phone number is 01223464646 <|endofresponse|>
Turn 6
Context <|context|> <|user|> hello , i need a train to ely on sunday please <|system|> we have 10 trains travelingto ely on sunday . what time would you like to arrive ? <|user|> how about 1 arriving by 13:00 departingfrom cambridge . <|system|> tr1159 will get you to ely by 12:07 on sunday . should i book you a ticket? <|user|> yes , that would be great . i’ll need tickets for 8 people . do you have a reference number ?<|system|> booking was successful , the total fee is 28.16 gbp payable at the station . reference numberis : hdx8ix63 . <|user|> i am also looking for a museum . can you recommend 1 ? <|system|> howabout cafe jello gallery ? it looks lovely <|user|> that will work . could you get me the phone numberand area its in ? <|system|> sure ! it s in the west area of town , and the phone number is 01223312112. can i help you with anything else ? <|user|> you were great today . thanks so much for all the help .that is all . goodbye . <|endofcontext|>Target <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, train de-parture cambridge, train destination ely <|endofbelief|> <|action|> general bye none, general welcomenone, general greet none <|endofaction|> <|response|> it was a pleasure to help . have a good day .goodbye . <|endofresponse|>SimpleTOD <|belief|> train book people 8, train arriveby 13:00, attraction type museum, train day sunday, traindeparture cambridge, train destination ely <|endofbelief|> <|action|> general bye none <|endofaction|><|response|> thank you for using our service <|endofresponse|>
Table 16: