Learning Content Selection Rules for Generating Object Descriptions in Dialogue
JJournal of Arti(cid:12) ial Intelligen e Resear h 24 (2005) 157-194 Submitted 09/04; published 07/05Learning Content Sele tion Rules for Generating Obje tDes riptions in DialoguePamela W. Jordan pjordanpitt.eduLearning Resear h and Development Center & Intelligent Systems ProgramUniversity of Pittsburgh, LRDC Rm 744Pittsburgh, PA 15260Marilyn A. Walker M.A.Walkersheffield.a .ukDepartment of Computer S ien e, University of SheÆeldRegent Court, 211 Portobello StreetSheÆeld S1 4DP, U.K. Abstra tA fundamental requirement of any task-oriented dialogue system is the ability to gen-erate obje t des riptions that refer to obje ts in the task domain. The subproblem of ontent sele tion for obje t des riptions in task-oriented dialogue has been the fo us ofmu h previous work and a large number of models have been proposed. In this paper, weuse the annotated o onut orpus of task-oriented design dialogues to develop featuresets based on Dale and Reiter's (1995) in remental model, Brennan and Clark's (1996) on eptual pa t model, and Jordan's (2000b) intentional in(cid:13)uen es model, and use thesefeature sets in a ma hine learning experiment to automati ally learn a model of ontentsele tion for obje t des riptions. Sin e Dale and Reiter's model requires a representationof dis ourse stru ture, the orpus annotations are used to derive a representation based onGrosz and Sidner's (1986) theory of the intentional stru ture of dis ourse, as well as twovery simple representations of dis ourse stru ture based purely on re en y. We then applythe rule-indu tion program ripper to train and test the ontent sele tion omponent of anobje t des ription generator on a set of 393 obje t des riptions from the orpus. To ourknowledge, this is the (cid:12)rst reported experiment of a trainable ontent sele tion omponentfor obje t des ription generation in dialogue. Three separate ontent sele tion models thatare based on the three theoreti al models, all independently a hieve a ura ies signi(cid:12) antlyabove the majority lass baseline (17%) on unseen test data, with the intentional in-(cid:13)uen es model (42.4%) performing signi(cid:12) antly better than either the in remental model(30.4%) or the on eptual pa t model (28.9%). But the best performing models ombineall the feature sets, a hieving a ura ies near 60%. Surprisingly, a simple re en y-basedrepresentation of dis ourse stru ture does as well as one based on intentional stru ture. Toour knowledge, this is also the (cid:12)rst empiri al omparison of a representation of Grosz andSidner's model of dis ourse stru ture with a simpler model for any generation task.1. Introdu tionA fundamental requirement of any task-oriented dialogue system is the ability to generateobje t des riptions that refer to obje ts in the task domain. For example, onsider theex erpt of a task-oriented dialogue from the o onut orpus in Figure 1 (Di Eugenio,Jordan, Thomason, & Moore, 2000). The onversants in this dialogue are attempting to (cid:13)2005 AI A ess Foundation. All rights reserved.ordan & Walker(Partial solution to problem already agreed upon in prior dialogue: 3 green hairs for 50, 1 green hair for 100 and a green table in the dining room, and a blue sofa in the living room. The numbersin bra kets indi ate the utteran e breaks for analysis of the dialogue.)G: [36℄ That leaves us with 250 dollars. [37℄ I have a yellow rug for 150 dollars. [38℄ Do you haveany other furniture left that mat hes for 100 dollars?"S: [39℄ No, I have no furniture left that osts $100. [40℄ I guess you an buy the yellow rug for $150.G: [41℄ Okay. [42℄ I'll buy the rug for 150 dollars. [43℄ I have a green hair [44℄ that I an buy for100 dollars [45℄ that should leave us with no money.S: [46℄ That sounds good. [47℄ Go ahead and buy the yellow rug and the green hair.G: [48℄ I'll buy the green 100 dollar hair. [49℄ Design Complete?S: [50℄ Sounds good, [51℄ do you want the green hair in the dining room with the other hairs? [52℄I put the yellow rug in the living room. [53℄ Then the design is omplete.G: [54℄ Sounds good. [55℄ Hit the design ompleteFigure 1: Ex erpt of a o onut dialogue illustrating variable sele tion of attributes forobje t des riptions that refer to furniture items ollaboratively onstru t a solution for furnishing a two room house. Ea h onversant startsthe task with a separate set of furniture items that an be used in the solution. In the pro essof negotiating the solution, they generate obje t des riptions (shown in itali s) for items offurniture.Ea h furniture type in the o onut task domain has four asso iated attributes: olor,pri e, owner and quantity. As a (cid:12)rst step, an obje t des ription generator must de ide whi hof these four attributes to in lude in an utteran e, while subsequent surfa e generation stepsde ide where in the utteran e the attributes will be expressed. For example, the task domainobje ts under dis ussion in the dialogue in Figure 1 are a $150 yellow rug owned by Garrett(G) and a $100 dollar green hair owned by Steve (S). In the dialogue ex erpt in Figure1, the yellow rug is (cid:12)rst referen ed in utteran e 36 as a yellow rug for 150 dollars andthen subsequently as the yellow rug for 150 dollars, the rug for 150 dollars, the yellow rug,where the owner attribute is sometimes realized in a separate noun phrase within the sameutteran e. It ould also have been des ribed by any of the following: the rug, my rug, myyellow rug, my $150 yellow rug, the $150 rug. The ontent of these obje t des riptionsvaries depending on whi h attributes are in luded. How does the speaker de ide whi hattributes to in lude?The problem of ontent sele tion for subsequent referen e has been the fo us of mu hprevious work and a large number of overlapping models have been proposed that seek toexplain di(cid:11)erent aspe ts of referring expression ontent sele tion (Clark & Wilkes-Gibbs,1986; Brennan & Clark, 1996; Dale & Reiter, 1995; Passonneau, 1995; Jordan, 2000b) interalia. The fa tors that these models use in lude the dis ourse stru ture, the attributes andattribute values used in the previous mention, the re en y of last mention, the frequen y ofmention, the task stru ture, the inferential omplexity of the task, and ways of determiningsalient obje ts and the salient attributes of an obje t. In this paper, we use a set of fa tors onsidered important for three of these models, and empiri ally ompare the utility of these158earning Content Sele tion Rules for Generating Obje t Des riptionsfa tors as predi tors in a ma hine learning experiment in order to (cid:12)rst establish whether thesele ted fa tors, as we represent them, an make e(cid:11)e tive ontributions to the larger task of ontent sele tion for initial as well as subsequent referen e. The fa tor sets we utilize are:(cid:15) ontrast set fa tors, inspired by the in remental model of Dale and Reiter(1995);(cid:15) on eptual pa t fa tors, inspired by the models of Clark and olleagues (Clark &Wilkes-Gibbs, 1986; Brennan & Clark, 1996);(cid:15) intentional influen es fa tors, inspired by the model of Jordan (2000b).We develop features representing these fa tors, then use the features to represent exam-ples of obje t des riptions and the ontext in whi h they o ur for the purpose of learninga model of ontent sele tion for obje t des riptions.Dale and Reiter's in remental model fo uses on the produ tion of near-minimal sub-sequent referen es that allow the hearer to reliably distinguish the task obje t from similartask obje ts. Following Grosz and Sidner (1986), Dale and Reiter's algorithm utilizes dis- ourse stru ture as an important fa tor in determining whi h obje ts the urrent obje tmust be distinguished from. The model of Clark, Brennan and Wilkes-Gibbs is based onthe notion of a on eptual pa t, i.e. the onversants attempt to oordinate with oneanother by establishing a on eptual pa t for des ribing an obje t. Jordan's intentionalinfluen es model is based on the assumption that the underlying ommuni ative andtask-related inferen es are important fa tors in a ounting for non-minimal des riptions.We des ribe these models in more detail in Se tion 3 and explain why we expe t thesemodels to work well in ombination.Many aspe ts of the underlying ontent sele tion models are not well-de(cid:12)ned from animplementation point of view, so it may be ne essary to experiment with di(cid:11)erent de(cid:12)nitionsand related parameter settings to determine whi h will produ e the best performan e for amodel, as was done with the parameter setting experiments arried out by Jordan (2000b).1However, in the experiments we des ribe in this paper, we strive for feature representationsthat will allow the ma hine learner to take on more of the task of (cid:12)nding optimal settingsand otherwise use the results reported by Jordan (2000b) for guidan e. The only variationwe test here is the representation of dis ourse stru ture for those models that require it.Otherwise, expli it tests of di(cid:11)erent interpretations of the models are left to future work.We report on a set of experiments designed to establish the predi tive power of the fa -tors emphasized in the three models by using ma hine learning to train and test the ontentsele tion omponent of an obje t des ription generator on a set of 393 obje t des riptionsfrom the orpus of o onut dialogues. The generator goes beyond ea h of the models'a ounts for anaphori expressions to address the more general problem of generating bothinitial and subsequent expressions. We provide the ma hine learner with distin t sets offeatures motivated by these models, in addition to dis ourse features motivated by assumed1. Determining optimal parameter settings for a ma hine learning algorithm is a similar issue (Daelemans& Hoste, 2002) but at a di(cid:11)erent level. We use the same ma hine learner and parameter settings for allour experiments although sear hing for optimal ma hine learner parameter settings may be of value infurther improving performan e. 159ordan & Walkerfamiliarity distin tions (Prin e, 1981) (i.e. new vs. evoked vs. inferable dis ourse entities),and dialogue spe i(cid:12) features su h as the speaker of the obje t des ription, its absolutelo ation in the dis ourse, and the problem that the onversants are urrently trying tosolve. We evaluate the obje t des ription generator by omparing its predi tions againstwhat humans said at the same point in the dialogue and only ounting as orre t those thatexa tly mat h the ontent of the human generated obje t des riptions (Oberlander, 1998).2This provides a rigorous test of the obje t des ription generator sin e in all likelihood thereare other obje t des riptions that would have a hieved the speaker's ommuni ative goals.We also quantify the ontribution of ea h feature set to the performan e of the obje tdes ription generator. The results indi ate that the intentional influen es features, thein remental features and the on eptual pa t features are all independently signi(cid:12)- antly better than the majority lass baseline for this task, with the intentional influ-en es model (42.4%) performing signi(cid:12) antly better than either the in remental model(30.4%) or the on eptual pa t model (28.9%). However, the best performing models ombine features from all the models, a hieving a ura ies at mat hing human performan enear 60.0%, a large improvement over the majority lass baseline of 17% in whi h the gen-erator simply guesses the most frequent attribute ombination. Surprisingly, our resultsin experimenting with di(cid:11)erent dis ourse stru ture parameter settings show that featuresderived from a simple re en y-based model of dis ourse stru ture ontribute as mu h to thisparti ular task as one based on intentional stru ture.The o onut dataset is small ompared to those used in most ma hine learning ex-periments. Smaller datasets run a higher risk of over(cid:12)tting and thus spe i(cid:12) performan eresults should be interpreted with aution. In addition the o onut orpus represents onlyone type of dialogue; typed, ollaborative, problem solving dialogues about onstraint satis-fa tion problems. While the models and suggested features fo us on general ommuni ativeissues, we expe t variations in the task involved and in the ommuni ation setting to im-pa t the predi tive power of the feature sets. For example, the on eptual pa t modelwas developed using dialogues that fo us on identifying novel, abstra t (cid:12)gures. Be ausethe (cid:12)gures are abstra t it is not lear at the start of a series of exer ises what des riptionwill best help the dialogue partner identify the target (cid:12)gure. Thus the need to negotiatea des ription for the (cid:12)gures is more prominent than in other tasks. Likewise we expe t onstraint satisfa tion problems and the need for joint agreement on a solution to ause theintentional influen es model to be more prominent for the o onut dialogues. Butthe fa t that the on eptual pa t features show predi tive power that is signi(cid:12) antlybetter than the baseline suggests that while the prominen e of ea h model inspired featureset may vary a ross tasks and ommuni ation settings, we expe t ea h to have a signi(cid:12) ant ontribution to make to a ontent sele tion model.Clearly, for those of us whose ultimate goal is a general model of ontent sele tionfor dialogue, we need to arry out experiments on a wide range of dialogue types. Butfor those of us whose ultimate goal is a dialogue appli ation, one smaller orpus that isrepresentative of the anti ipated dialogues is probably preferable. Despite the two notes of2. Note that the more attributes a dis ourse entity has, the harder it is to a hieve an exa t mat h to ahuman des ription, i.e. for this problem the obje t des ription generator must orre tly hoose among16 possibilities represented by the power set of the four attributes.160earning Content Sele tion Rules for Generating Obje t Des riptions aution we expe t our feature representations to suggest a starting point for both largerendeavors.Previous resear h has applied ma hine learning to several problems in natural languagegeneration, su h as ue word sele tion (Di Eugenio, Moore, & Paolu i, 1997), a ent pla e-ment (Hirs hberg, 1993), determining the form of an obje t des ription (Poesio, 2000), ontent ordering (Malouf, 2000; Mellish, Knott, Oberlander, & O'Donnell, 1998; Duboue& M Keown, 2001; Ratnaparkhi, 2002), senten e planning (Walker, Rambow, & Rogati,2002), re-use of textual des riptions in automati summarization (Radev, 1998), and sur-fa e realization (Langkilde & Knight, 1998; Bangalore & Rambow, 2000; Varges & Mellish,2001).The only other ma hine learning approa hes for ontent sele tion are those of Oh andRudni ky (2002) and of Roy (2002). Oh and Rudni ky report results for automati allytraining a module for the CMU Communi ator system that sele ts the attributes that thesystem should express when impli itly on(cid:12)rming (cid:13)ight information in an ongoing dialogue.For example, if the aller said I want to go to Denver on Sunday, the impli it on(cid:12)rmationby the system might be Flying to Denver on Sunday. They experimentally ompared astatisti al approa h based on bigram models with a strategy that only on(cid:12)rms informationthat the system has just heard for the (cid:12)rst time, and found that the two systems performedequally well. Roy reports results for a spoken language generator that is trained to generatevisual des riptions of geometri obje ts when provided with features of visual s enes. Roy'sresults show that the understandability of the automati ally generated des riptions is only8.5% lower than human-generated des riptions. Unlike our approa h, neither of these on-sider the e(cid:11)e ts of ongoing dialogue with a dialogue partner, or the e(cid:11)e t of the dialogue ontext on the generated des riptions. Our work, and the theoreti al models it is basedon, expli itly fo us on the pro esses involved in generating des riptions and redes riptionsof obje ts in intera tive dialogue that allow the dialogue partners to remain aligned as thedialogue progresses (Pi kering & Garrod, 2004).The most relevant prior work is that of Jordan (2000b). Jordan implemented Dale andReiter's in remental model and developed and implemented the intentional influ-en es model, whi h in orporates the in remental model, and tested them both againstthe o onut orpus. Jordan also experimented with di(cid:11)erent parameter settings for vagueparts of the models. The results of this work are not dire tly omparable be ause Jordanonly tested rules for subsequent referen e, while here we attempt to learn rules for gener-ating both initial and subsequent referen es. However, using a purely rule-based approa h,the best a ura y that Jordan reported was 69.6% using a non-stringent s oring riterion(not an exa t mat h) and 24.7% using the same stringent exa t mat h s oring used here.In this paper, using features derived from Jordan's orpus annotations, and applying ruleindu tion to indu e rules from training data, we a hieve an exa t mat h a ura y of nearly47% when omparing to the most similar model and an a ura y of nearly 60% when om-paring to the best overall model. These results appear to be an improvement over thosereported by Jordan (2000b), given both the in reased a ura y and the ability to generateinitial as well as subsequent referen es.Se tion 2 des ribes the o onut orpus, de(cid:12)nitions of dis ourse entities and obje tdes riptions for the o onut domain, and the annotations on the orpus that we useto derive the feature sets. Se tion 3 presents the theoreti al models of ontent sele tion161ordan & Walker
Opal 1
End of TurnDesign CompletePARTNER’S INVENTORYTABLE-LOWTABLE-HIGHRUGSOFALAMP-TABLELAMP-FLOORCHAIRARMCHAIRDESKLIVING-ROOM DINING-ROOMYOUR INVENTORY1 TABLE-HIGH YELLOW $4000 SOFA GREEN $3501 SOFA YELLOW $4001 RUG RED $2001 LAMP-FLOOR BLUE $502 CHAIR BLUE $750 CHAIR GREEN $1000 CHAIR RED $100 Your budget is: $400 > we bought the green sofa 350,the green table 400,and 2 green chairs 100 each.> change the chairs,I have two red ones for the same price.As much as I like greenit looks ugly with red.> we bought the green sofa 350,the green table 400,and 2 green chairs 100 each.> change the chairs,I have two red ones for the same price.As much as I like greenit looks ugly with red.