Knowledge-driven Natural Language Understanding of English Text and its Applications
Kinjal Basu, Sarat Varanasi, Farhad Shakerin, Joaquin Arias, Gopal Gupta
KKnowledge-driven Natural Language Understandingof English Text and its Applications ∗ Kinjal Basu , Sarat Varanasi , Farhad Shakerin , Joaquin Arias and Gopal Gupta Department of Computer Science Artificial Intelligence Research GroupThe University of Texas at Dallas, USA Universidad Rey Juan Carlos, Madrid, Spain
Abstract
Understanding the meaning of a text is a funda-mental challenge of natural language understand-ing (NLU) research. An ideal NLU system shouldprocess a language in a way that is not exclusive toa single task or a dataset. Keeping this in mind, wehave introduced a novel knowledge driven seman-tic representation approach for English text. Byleveraging the VerbNet lexicon, we are able to mapsyntax tree of the text to its commonsense mean-ing represented using basic knowledge primitives.The general purpose knowledge represented fromour approach can be used to build any reasoningbased NLU system that can also provide justifica-tion. We applied this approach to construct twoNLU applications that we present here: SQuARE(Semantic-based Question Answering and Rea-soning Engine) and StaCACK (Stateful Conver-sational Agent using Commonsense Knowledge).Both these systems work by “truly understanding”the natural language text they process and bothprovide natural language explanations for their re-sponses while maintaining high accuracy.
Introduction
The long term goal of natural language understanding(NLU) research is to make applications, e.g., chatbotsand question answering (QA) systems, that act exactlylike a human assistant. A human assistant will under-stand the user’s intent and fulfill the task. The taskcan be answering questions about a story, giving direc-tions to a place, or reserving a table in a restaurant byknowing user’s preferences. Human level understand-ing of natural language is needed for an NLU applica-tion that aspires to act exactly like a human. To un-derstand the meaning of a natural language sentence,humans first process the syntactic structure of the sen-tence and then infer its meaning. Also, humans use com-monsense knowledge to understand the often complexand ambiguous meaning of natural language sentences.Humans interpret a passage as a sequence of sentences ∗ and will normally process the events in the story in thesame order as the sentences. Once humans understandthe meaning of a passage, they can answer questionsposed, along with an explanation for the answer. More-over, by using commonsense, a human assistant under-stands the user’s intended task and asks questions tothe user about the required information to successfullycarry-out the task. Also, to hold a goal oriented con-versation, a human remembers all the details given inthe past and most of the time performs non-monotonicreasoning to accomplish the assigned task. We believethat an automated QA system or a goal oriented closeddomain chatbot should work in a similar way.If we want to build AI systems that emulate humans,then understanding natural language sentences is theforemost priority for any NLU application. In an idealscenario, an NLU application should map the sentenceto the knowledge (semantics) it represents, augment itwith commonsense knowledge related to the conceptsinvolved–just as humans do—then use the combinedknowledge to do the required reasoning. In this pa-per, we introduce our novel algorithm for automaticallygenerating the semantics corresponding to each Englishsentence using the comprehensive verb-lexicon for En-glish verbs - VerbNet (Kipper et al. 2008). For eachEnglish verb, VerbNet gives the syntactic and seman-tic patterns. The algorithm employs partial syntacticmatching between parse-tree of a sentence and a verb’s frame syntax from VerbNet to obtain the meaning ofthe sentence in terms of VerbNet’s primitive predicates.This matching is motivated by denotational seman-tics of programming languages and can be thought of asmapping parse-trees of sentences to knowledge that isconstructed out of semantics provided by VerbNet. TheVerbNet semantics is expressed using a set of primi-tive predicates that can be thought of as the semanticalgebra of the denotational semantics.We also show two applications of our approach.SQuARE, a question answering system for reading com-prehension, is capable of answering various types of rea-soning questions asked about a passage. SQuARE usesknowledge in the passage augmented with commonsenseknowledge. Going a step further, we leverage SQuAREto build a general purpose closed-domain goal-oriented a r X i v : . [ c s . C L ] J a n hatbot framework - StaCACK (pronounced as stack ).Our work reported here builds upon our prior work innatural language QA as well as visual QA (Pendharkarand Gupta 2019; Basu, Shakerin, and Gupta 2020;Basu et al. 2020). Background and Contribution
We next describe some of the key technologies we em-ploy. Our effort is based on answer set programming(ASP) technology (Gelfond and Kahl 2014), specifically,its goal-directed implementation in the s(CASP) sys-tem. ASP supports nonmonotonic reasoning throughnegation as failure that is crucial for modeling com-monsense reasoning (via defaults, exceptions and pref-erences) (Gelfond and Kahl 2014). We assume that thereader is familiar with ASP (Gelfond and Kahl 2014),denotational semantics (Schmidt 1986) and English lan-guage parsers (Stanford CoreNLP (Manning et al. 2014)and spaCy (Honnibal and Montani 2017)). We give abrief overview of ASP and denotational semantics.
ASP:
Answer Set Programming (ASP) is a declar-ative logic programming language that extends itwith negation-as-failure. ASP is a highly expressiveparadigm that can elegantly express complex reason-ing methods, including those used by humans, such asdefault reasoning, deductive and abductive reasoning,counterfactual reasoning, constraint satisfaction (Baral2003; Gelfond and Kahl 2014). ASP supports bettersemantics for negation ( negation as failure ) than doesstandard logic programming and Prolog. An ASP pro-gram consists of rules that look like Prolog rules. Thesemantics of an ASP program Π is given in termsof the answer sets of the program ground( Π ) , where ground( Π ) is the program obtained from the substitu-tion of elements of the Herbrand universe for variablesin Π (Baral 2003). The rules in an ASP program are ofthe form: p :- q , ..., q m , not r , ..., not r n . where m ≥ and n ≥ . Each of p and q i ( ∀ i ≤ m )is a literal, and each not r j ( ∀ j ≤ n ) is a naf-literal ( not is a logical connective called negation-as-failure or default negation ). The literal not r j is true if proof of r j fails . Negation as failure allows us to take actionsthat are predicated on failure of a proof. Thus, the rule r :- not s. states that r can be inferred if we fail toprove s . Note that in the rule above, the head literal p is optional. A headless rule is called a constraint whichstates that conjunction of q i ’s and not r j ’s should yield false .The declarative semantics of an Answer Set Program P is given via the Gelfond-Lifschitz transform (Baral2003; Gelfond and Kahl 2014) in terms of the answersets of the program ground( Π ) . ASP also supportsclassical negation. A classically negated predicate (de-noted -p means that p is definitely false. Its defini-tion is no different from a positive predicate, in thatexplicit rules have to be given to establish -p . More NP V NPExample “She grabbed the rail”
Syntax Agent V ThemeSemantics
Continue(
E,Theme ), Cause(
Agent, E )Contact(
During(E),Agent,Theme ) Figure 1: VerbNet frame instance for the verb class grab details on ASP can be found elsewhere (Baral 2003;Gelfond and Kahl 2014).The goal in ASP is to compute an answer set given ananswer set program, i.e., compute the set that containsall propositions that if set to true will serve as a modelof the program (those propositions that are not in theset are assumed to be false). Intuitively, the rule abovesays that p is in the answer set if q , ..., q m are inthe answer set and r , ..., r n are not in the answerset. ASP can be thought of as Prolog extended witha sound semantics of negation-as-failure that is basedon the stable model semantics (Gelfond and Lifschitz1988). s(CASP) System: s(CASP) (Arias et al. 2018) is aquery-driven, goal-directed implementation of ASP thatincludes constraint solving over reals. Goal-directed ex-ecution of s(CASP) is indispensable for automatingcommonsense reasoning, as traditional grounding andSAT-solver based implementations of ASP may not bescalable. There are three major advantages of usingthe s(CASP) system: (i) s(CASP) does not ground theprogram, which makes our framework scalable, (ii) itonly explores the parts of the knowledge base that areneeded to answer a query, and (iii) it provides naturallanguage justification (proof tree) for an answer (Ariaset al. 2020). Denotational Semantics:
In programming languageresearch, denotational semantics is a widely used ap-proach to formalize the meaning of a programminglanguage in terms of mathematical objects (called do-mains , such as integers, truth-values, tuple of values,and, mathematical functions) (Schmidt 1986). Denota-tional semantics of a programming language has threecomponents (Schmidt 1986):1.
Syntax : specified as abstract syntax trees.2.
Semantic Algebra : these are the basic domainsalong with the associated operations; meaning of aprogram is expressed in terms of these basic opera-tions applied to the elements in the domain.3.
Valuation Function : these are mappings from ab-stract syntax trees (and possibly the semantic alge-bra) to values in the semantic algebra.Given a program P written in language L , P ’s deno-tation (meaning), expressed in terms of the semanticalgebra, is obtained by applying the valuation functionof L to program P ’s syntax tree. Details can be foundelsewhere (Schmidt 1986). VerbNet:
Inspired by Beth Levin’s classification ofverbs and their syntactic alternations (Levin 1993),erbNet (Kipper et al. 2008) is the largest online net-work of English verbs. A verb class in VerbNet is mainlyexpressed by syntactic frames , thematic roles , and se-mantic representation . The VerbNet lexicon identifiesthematic roles and syntactic patterns of each verb classand infers the common syntactic structure and seman-tic relations for all the member verbs. Figure 1 showsan example of a VerbNet frame of the verb class grab .This paper makes the following novel contributions:(i) it presents a domain independent English text toanswer set program generator, (ii) demonstrates tworobust, scalable, and interpretable applications of oursemantic-driven approach, SQuARE and StaCACK,both of which demonstrate improved performance overmachine learning (ML) based systems with regards toaccuracy and explainability, and (iii) shows how thes(CASP) query-driven ASP system is crucial for com-monsense reasoning as it guarantees a correct answerif the knowledge representation is accurate. Our workis based purely on reasoning and does not require anymanual intervention other than providing (reusable)commonsense knowledge coded in ASP. It paves theway for developing advanced NLU systems based on“truly understanding” text or human dialog. Semantics driven ASP Code Generation
Similar to the denotational approach for meaning repre-sentation of a programming language, an ideal NLU sys-tem should use denotational semantics to composition-ally map text syntax to its meaning. Knowledge primi-tives should be represented using the semantic algebra (Schmidt 1986) of well understood concepts. Then thesemantics along with the commonsense knowledge rep-resented using the same semantic algebra can be usedto construct different NLU applications, such as QAsystem, chatbot, information extraction system, textsummarization, etc. The ambiguous nature of naturallanguage is the main hurdle in treating it as a program-ming language. English is no exception and the mean-ing of an English word or sentence may depend on thecontext. The algorithm we present takes the syntacticparse tree of an English sentence and uses VerbNet toautomatically map the parse tree to its denotation, i.e.,the knowledge it represents.
Stanford CoreNLPParser Semantic Generator(Valuation Function contact(during(grab),agent(john),theme(the_apple)).continue(event(grab),theme(the_apple)).transfer(during(grab),theme(the_apple)).cause(agent(john),event(grab)).......
Sentence ( John grabbed the apple there ) VerbNet
VerbNetFramesVerb( grab )Sentence SemanticsRepresented in ASP
Parse Tree
Figure 2: English to ASP translation process An English sentence that consists of an action verb(i.e., not a be verb) always describes an event. The verbalso constrains the relation among the event partici-pants. VerbNet encapsulates all of this information us-ing verb classes that represent a verb set with similarmeanings. So each verb is a part of one or more classes.For each class, it provides the skeletal parse tree (framesyntax) for different usage of the verb class and the re-spective semantics (frame semantic). The semantic defi-nition of each frame uses pre-defined predicates of Verb-Net that have thematic-roles (AGENT, THEME, etc.)as arguments. Thus, we can imagine VerbNet as a verylarge valuation (semantic) function that maps syntaxtree patterns to their respective meanings. As we useASP to represent the knowledge, the algorithm gen-erates the sentence’s semantic definition in ASP. Ourgoal is to find the partial matching between the sentenceparse tree and the VerbNet frame syntax and ground thethematic-role variables so that we can get the semanticsof the sentence from the frame semantics and representit in ASP.
The illustration of the process of semantic knowledgegeneration from a sentence is described in the Figure 2.We have used Stanford’s CoreNLP parser (Manning etal. 2014) to generate the parse tree, p t , of an Englishsentence. The semantic generator component consists ofthe valuation function to map the p t to its meaning. Toaccomplish this, we have introduced Semantic Knowl-edge Generation algorithm (Algorithm 1). First, the al-gorithm collects the list of verbs mentioned in the sen-tence and for each verb it accumulates all the syntactic(frame syntax) and corresponding semantic information(thematic roles and predicates) from VerbNet using theverb-class of the verb. The algorithm finds the groundedthematic-role variables by doing a partial tree match-ing (described in Algorithm 2) between each gatheredframe syntax and p t . From the verb node of p t , thepartial tree matching algorithm performs a bottom-upsearch and, at each level through a depth-first traversal,it tries to match the skeletal parse tree of the frame syn-tax. If the algorithm finds an exact match or a partialmatch (by skipping words, e.g., prepositions), it returnsthe thematic roles to the parent Algorithm 1. Finally,Algorithm 1 grounds the pre-defined predicate with thevalues of thematic roles and generates the ASP code.The ASP code generated by the above mentioned ap-proach represents the meaning of a sentence comprisedof an action verb. Since VerbNet does not cover the se-mantics of the ‘be’ verbs (i.e., am, is, are, have, etc.), forsentences containing ‘be’ verbs, the semantic generator uses pre-defined handcrafted mapping of the parsed in-formation (i.e., syntactic parse tree, dependency graph,etc.) to its semantics. Also, this semantics is representedas ASP code. The generated ASP code can now be usedin various applications, such as natural language QA,summarization, information extraction, CA, etc. lgorithm 1 Semantic Knowledge Generation
Input: p t : constituency parse tree of a sentence Output: semantics : sentence semantics1: procedure
GetSentenceSemantics ( p t )2: verbs ← getVerbs( p t ) (cid:46) returns list of verbspresent in the sentence3: semantics ← {} (cid:46) initialization4: for each v ∈ verbs do classes ← getVNClasses(v) (cid:46) get the VerbNetclasses of the verb6: for each c ∈ classes do frames ← getVNFrames(c) (cid:46) get theVerbNet frames of the class8: for each f ∈ frames do thematicRoles ← getThematicRoles( p t , f.syntax, v) (cid:46) see Algorithm 210: semantics ← semantics ∪ getSemantics(thematicRoles, f.semantics) (cid:46) map the thematic roles into the frame semantics12: end for end for end for return semantics16: end procedure Algorithm 2
Partial Tree Matching
Input: p t : constituency parse tree of a sentence; s :frame syntax; v : verb Output: tr : thematic role set or empty-set : {}1: procedure GetThematicRoles ( p t , s, v)2: root ← getSubTree(node(v), p t ) (cid:46) returns thesub-tree from the parent of the verb node3: while root do tr ← getMatching ( root, s ) (cid:46) if s matches thetree return thematic-roles, else {}5: if tr (cid:54) = {} then return tr6: end if root ← getSubTree(root, p t ) (cid:46) returns false ifroot equals p t end while return {}10: end procedure The SQuARE System
Question answering system for reading comprehensionis a challenging task for the NLU research community.In recent times with the advancement of ML appliedto NLU, researchers have created more advance QAsystems that show outstanding performance in QA forreading-comprehension tasks. However, for these highperforming neural-networks based agents, the questionrises whether they really “understand” the text or not.These systems are outstanding in learning data patternsand then predicting the answers that require shallow orno reasoning capabilities. Moreover, for some QA task,if a system claims that it performs equal or better thana human in terms of accuracy, then the system mustalso show human level intelligence in explaining its an-swers. Taking all this into account, we have created ourSQuARE QA system that uses ML based parser to gen- erate the syntax tree and uses Algorithm 1 to translate asentence into its knowledge in ASP. By using the ASP-coded knowledge along with pre-defined generic com-monsense knowledge, SQuARE outperforms other MLbased systems by achieving 100% accuracy in 18 tasks(99.9% accuracy in all 20 tasks) of the bAbI QA dataset(note that the 0.01% inaccuracy is due to the dataset’sflaw, not of our system). SQuARE is also capable ofgenerating English justification of its answers.
Natural LanguageProcessor(
CoreNLP & spaCy ) Text Question
SemanticGenerator ASP QueryGenerator
Valuation Function s(CASP) Engine
SyntacticParse TreeSyntacticParse TreeSemanticKnowledge inASP ASP Query
Answer (Text) (Question)
Commonsense Knowledge
Figure 3: SQuARE Framework
Architecture:
SQuARE is composed of two main subsystems: the semantic generator and the
ASP querygenerator . Both subsystems inside the SQuARE archi-tecture (illustrated in Figure 3) share the common val-uation function. To parse the passage and the ques-tion asked, the CoreNLP and spaCy parsers are usedto generate the syntactic parse tree as well as the nec-essary parsing information such as NER, POS tags,lemmas, etc. Our semantic generator employs the pro-cess described earlier for ASP code generation from En-glish sentences. The ASP query generator uses the samesemantic generation algorithm with minor changes toadapt to the fact that the sentence is a question. Itidentifies the query variable along with the questiontype and other necessary details (e.g., NER, POS, de-pendency graph, etc.) from the parsed question, andthen ASP query is formulated. For a given task, a gen-eral query rule is defined that is a collection of sub-queries needed to answer the question. Such a rule canbe considered as the process-template that a humanwill follow to achieve the same task, thus such processtemplates are a part of commonsense knowledge. To getthe answer, the s(CASP) executes the query against theknowledge generated from the semantic-generator andthe pre-defined, reusable commonsense knowledge.SQuARE is also capable of reasoning over time usingthe order of the events in the passage. A passage or astory is a collection of sentences and the meaning of thewhole passage comprises of meanings of the individualsentences. Knowledge is represented using defaults, ex-ceptions and preferences which permits non-monotonicreasoning, needed because conclusion drawn early in astory may have to be revised later. SQuARE assumeshe events in the story occur sequentially unless it en-counters an exception.
Commonsense Concepts:
Just like humans,SQuARE uses predefined commonsense knowledgeto understand the context. The dataset independentcommonsense knowledge is written using genericpredicate names so that it can be reused without anychanges. This knowledge is presented either as facts oras default rules with exceptions. The facts are concreteglobal truth of the world, such as property(color,white).
Whereas, default rules capture the normalrelationship among two entities in the world or thegeneral laws of the world, such as the law-of-inertia.Using negation as failure (NAF), these rules captureexceptions to defaults as well. NAF helps us to reasoneven if information is missing. Following default rule(used in task 20) illustrates that a thirsty personnormally drinks unless there is an exception (e.g., amedical test needs to be performed which requiresperson does not drink any fluid in last 1 hour). action(X,drink) :- person(X), emotional_state(X,thirsty), not ab_action(X,drink).
Similarly, to have rudimentary human reasoning capa-bilities, SQuARE employs basic commonsense compu-tational rules for counting objects, filtering, searching,etc. An example is given in the next section.
Example:
To demonstrate the power of the SQuAREsystem, we next discuss a full-fledged example showingthe data-flow and the intermediate results.
Story:
A customized segment of a story from the bAbIQA dataset about counting objects (Task-7) is taken.
Parsed Output:
CoreNLP and spaCy parsers parseeach sentence of the story and passes the parsed infor-mation to the semantic generator. Details are omitteddue to lack of space, however, parsing can be easily doneat https://corenlp.run/ . Semantics:
From the parsed information, the seman-tic generator generates the semantic knowledge in ASP.We only give a snippet of knowledge (due to space con-straint) generated from the third sentence of the story(VerbNet details of the verb - grab is given in Figure 1).
Question and ASP Query:
For the question - “Howmany objects is John carrying?” , the
ASP query genera-tor generates a generic query-rule and the specific ASPquery (it uses the process template for counting). count_object(T,Per,Count) :-findall(O, property(possession,T,Per,O), Os),set(Os,Objects), list_length(Objects,Count).?- count_object(t6,john,Count).
Answer:
The s(CASP) finds the correct answer - Justification:
The s(CASP) generated justificationfor this answer is shown below:
The total count of all the objects that john is possessing at time t6 is 1, because[the_milk] is the list of all the objects that are possessed by john at time t6, becausethe_milk is possessed by john at time t6, becausetime t6 comes after time t5, andthe_milk is possessed by john at time t5, becausetime t5 comes after time t4, andthe_milk is possessed by john at time t4, andthere is no evidence that the_milk is not possessed by john at time t5.there is no evidence that the_milk is not possessed by john at time t6.The list [the_milk] is generated after removing duplicates from the list [the_milk], becauseThe list [] is generated after removing duplicates from the list [].1 is the length of the list [the_milk], because0 is the length of the list [].
StaCACK Framework
Conversational AI has been an active area of research,starting from a rule-based system, such as ELIZA(Weizenbaum 1966) and PARRY (Colby, Weber, andHilf 1971), to the recent open domain, data-driven CAslike Amazon’s Alexa, Google Assistant, or Apple’s Siri.Early rule-based bots were based on just syntax analy-sis, while the main challenge of modern ML based chat-bots is the lack of “understanding” of the conversation.A realistic socialbot should be able to understand andreason like a human. In human to human conversations,we do not always tell every detail, we expect the listenerto fill gaps through their commonsense knowledge. Also,our thinking process is flexible and non-monotonic innature, which means “what we believe today may becomefalse in the future with new knowledge” . We can modelthis human thinking process with (i) default rules, (ii)exceptions to defaults, and (iii) preferences over multi-ple defaults (Gelfond and Kahl 2014).Following the discussion above, we have created Sta-CACK, a general closed-domain chatbot framework.StaCACK is a stateful framework that maintains statesby remembering every past dialog between the user anditself. The main difference between StaCACK and theother stateful or stateless chatbot models is the useof commonsense knowledge for understanding user ut-terances and generating responses. Moreover, it is ca-pable of doing non-monotonic reasoning by using de-faults with exception and preferences in ASP. StaCACKachieves 100% accuracy on the Facebook bAbI dialogdataset suit (Bordes, Boureau, and Weston 2016) (in-cluding OOV: out-of-vocabulary datasets) of five taskscreated for a restaurant reservation dialog system. Inaddition, StaCACK can answer questions that ML chat-bots cannot without proper training (details are givenin following sections). We focus on agents that are de-signed for a specific tasks (e.g., restaurant reservation). ser Unsa sfied
Start
Understand user intentAsk preferences based on the intentVerify and update query No fy No ResultsThank you gree ngsProvide result(s)Execute queryComplete task and give details
End
Intent
Ask other details
Complete Informa onNo UpdatesResultsUser Sa sfied No ResultsNo more details Has Preference Updates No-UpdatesNoMore detailsHas UpdatesIncomplete Informa on Yes
Figure 4: FSM for StaCACK framework
Finite State Machine:
Task-specific CAs follow acertain scheme in their inquiry that can be modeled asa finite state machine (FSM). The FSM is illustratedin Figure 4. However, the tasks in each state transi-tion are not simple as in every level it requires differenttypes of (commonsense) reasoning. We have tested ourStaCACK framework on bAbI dialog dataset that dealswith only one user intent - restaurant table reservation .The whole end-to-end conversation, starting from un-derstanding the user-intent to completing the reserva-tion, is divided into four different tasks of the dataset- (a) issuing API calls (to actually make the reserva-tion), (b) updating API calls, (c) Displaying options,and, (d) providing extra information. Therefore, if asystem can complete these tasks, then it should be ableto hold the whole conversation, that is given in the fifthdataset. However, using StaCACK, the implementationof the agent for these tasks become straightforward. Weneed to add commonsense knowledge about the domain(discussed later). Parsing of the sentences is done withspaCy and CoreNLP parsers and the parse tree is trans-lated into knowledge using our Algorithm 1.
Commonsense knowledge:
Similar to the SQuAREsystem, StaCACK augments knowledge generated fromuser utterances with commonsense knowledge. For atask based, general purpose CA, the commonsenseknowledge or the background knowledge helps in un-derstanding user-intent, missing information, and userpreferences. For example, following rules illustrate howan agent can find all the missing parameter’s value inthe knowledge base, which later on can be used to askcounter-questions to the user to facilitate reasoning. missing_parameter(X) :- query_parameter(X),not query_parameter_value(X,_).all_missing_parameter(ParaList) :-findall(P,missing_parameter(P),ParaList).
Due to the basis of StaCACK in commonsense knowl-edge, its reasoning can be more sophisticated and it can hold conversation in a more natural manner (describedin the next section) than ML based systems. Followingis an example. Rule-1 depicts that normally a personwho has consumed a type of cuisine (e.g., Mexican) theday before, would not like to eat it again today. How-ever, there can be exceptions to this rule.
Example:
As we have already mentioned that the Sta-CACK framework shows excellent performance on thebAbI dialog dataset based on restaurant reservationconversation. Also, it is able to hold the conversation ina more natural way by using commonsense knowledge,which may not be possible with a rule-based systembased on a monotonic logic. Following example showshow StaCACK can understand the cuisine preference ofa user, just by performing reasoning over commonsenseinformation about a cuisine (that curry is predominantin
Indian and
Thai cuisine).
User: Good morningStaCACK: Hello what can i help you with todayUser: Can you make a restaurant reservation inLondon in a cheap price rangeStaCACK: I’m on itUser:
Please note that due to the space constraint, we haveshown a conversation snippet here and how the end-to-end conversation looks like can be found elsewhere(Bordes, Boureau, and Weston 2016).
Experiments and Results
Datasets:
The SQuARE and the StaCACK systemhave been tested on the bAbI QA (Weston and others2015) and the bAbI dialog dataset respectively (Bordes,Boureau, and Weston 2016). With the aim of improvingNLU research, Facebook researchers have created thebAbI datasets suit that comprise with different NLUapplication oriented simple task based datasets. Thedatasets are designed in such a way that it becomeseasy for human to reason and reach an answer withproper justification whereas difficult for machines dueto the lack of understanding about the language.These datasets are mainly created to train and testdeep-learning based NLU applications. The bAbI QAdataset not only expects the ML models to train andtest on the provided question answer pair for 20 reason-ing based tasks but also presumes the model will learnthe pattern of supporting facts given with each answer.Taking this into account, we choose this dataset to testQuARE system to show its “true understanding” bygenerating natural language justification for each an-swer. Similarly, we have evaluated the StaCACK ap-proach on the bAbI dialog dataset to exhibit how wecan build a closed domain task based chatbot with com-monsense knowledge. Another reason to choose thesedatasets is its simplicity that helps us to concentratemore on knowledge representation and modeling thanpre-processing or parsing of English sentences.
Experiments:
In the SQuARE system, the accuracyhas been calculated by matching the generated answerwith the actual answer given in the bAbI QA dataset.Whereas, StaCACK’s accuracy is calculated on the ba-sis of per-response as well as per-dialog . Table 1 sum-marizes the testing statistics and performance metricsof the SQuARE system (benchmarks are tested on a intel i9-9900 CPU with
16G RAM ). Table 2 and table3 compares our results in terms of accuracy with theexisting state-of-the-art results for SQuARE and Sta-CACK system respectively.
Error Analysis:
SQuARE is not able to reach 100%accuracy on two tasks - the three argument relations (Task 5) and the indefinite knowledge (Task 10). Wehave studied the error stories and the particular ques-tions where the system goes wrong. Thanks to the in-terpretable nature of the SQuARE system, we are ableto identify the scenarios.
Task 5 : This error occurs because of the multiple cor-rect answers present in the story with no indicationof which one is the preferred one. SQuARE system iscapable of finding all the answers and all are correct,though bAbI (erroneously) assumes there is only onecorrect answer. Illustration of the scenario is given be-low, where, clearly both milk and apple are answers,but bAbI only accepts milk . Task 10 : The erroneous stories have two identicalplaces with an or as a person’s location. So, for thequestion about that person’s location, SQuARE an-swers correctly, whereas the actual (erroneous) answeris ‘maybe’. SQuARE correctly infers that p ∨ p equals p . Following example illustrates the scenario. These errors show another advantage of our explainablelogic-based approach: SQuARE is able to identify errorsin the dataset. On the contrary, the ML systems mimicthe dataset by learning the errors as well and show theirshallow understanding.
Comparison of Results and Related Works:
Us-ing VerbNet to study the semantic relations of verbsand their semantic roles are not new. Text2DRS system(Ling 2018) uses VerbNet to understand the discourseof a passage and represent it in the Neo-Davidsonianform. Schmitz et al. have also used VerbNet to de-sign open information extraction system (Schmitz etal. 2012). Researchers have also used VerbNet to ex-tend the commonsense knowledge about verbs. For in-stance, McFate (McFate 2010) describes how VerbNetcan be used to expand the CYC ontology by addingverb semantic frames. However all these approaches arelimited to a particular domain or an application, andnot scalable as well, whereas our semantic driven En-glish sentence to ASP code generation approach is fullyautomatic process and independent of any application.Table 2 compares our result in terms of accuracywith other models for all the 20 bAbI tasks for theSQuARE system. Note that due to the space constraint,we have shown comparison with two best performing
Tasks Attributes
No. ofStories Tot. No. ofQuestions Avg. Questionsper Story Tot. No. ofSentences Avg. Story Size(No. of Sentences) Max Story Size(No. of Sentences)Accuracy(%) Avg. Timeper Question(Seconds)Single Supporting Facts
Two Supporting Facts
Three Supporting Facts
Two Argument Relations
Three Argument Relations
Yes/No Questions
Counting
Lists/Sets
Simple Negation
Indefinite Knowledge
Basic Coreference
Conjunction
Compound Coreference
Time Reasoning
Basic Deduction
Basic Induction
Positional Reasoning
Size Reasoning
Path Finding
Agent’s Motivations
Totals 68,928 220,000 735,439
Table 1: Dataset Statistics and Performance Resultsystems (memory-neural-network (MemNN) with adap-tive memory (AM), N-grams (NG), and non-linearity(NL); and the system by Mitra and Baral) and the ac-curacies on other models (i.e., N-gram classifier, LSTM,SVM, DMN, etc.) can be found elsewhere (Weston andothers 2015; Kumar and others 2016). The SQuAREsystem beats all these ML based systems in terms ofaccuracy and explainability.Mitra’s system is the one closest to ours and moti-vated us to work on the bAbI QA dataset. Their workachieves high accuracy, however, the work is still a MLbased inductive logic programming system that requiresmanual annotation of data, such as mode declaration,finding group size, etc. Conversely, the SQuARE sys-tem relies fully on automatic reasoning with only man-ual encoding of reusable commonsense knowledge. Anaction language based QA methodology using VerbNethas been developed by Lierler et al (Lierler, Inclezan,and Gelfond 2017). The project aims to extend framesemantics with ALM, an action language (Lierler, In-clezan, and Gelfond 2017), to provide interpretable se-mantic annotations. Unlike SQuARE, it is not an end-to-end automated QA system.Table 3 shows the accuracy of our proposal, Sta-CACK, against the best models on the bAbI datasetin terms of per-response and in parenthesis in terms ofper-dialog: Mem2Seq (Madotto, Wu, and Fung 2018),and BoSsNET (Raghu, Gupta, and others 2018). Otherresults can be found elsewhere (Bordes, Boureau, andWeston 2016). Unsurprisingly, similar to the rule-basedsystem, StaCACK surpasses all the ML based modelsby showing 100% accuracy. Nevertheless, due to thecommonsense reasoning, StaCACK can hold better nat-ural conversation (shown in the
Example section of
Sta-CACK ) that is not possible with a standard rule-basedsystem based on monotonic logic.
Tasks Model
MemNN(AM+NG+NL) Mitraet al. SQuARESingle Supporting Fact
100 100 100
Two Supporting Facts
98 100 100
Three Supporting Facts
95 100 100
Two Arg. Relation
100 100 100
Three Arg. Relation
99 100 99.8
Yes/No Questions
100 100 100
Counting
97 100 100
Lists/Sets
97 100 100
Simple Negation
100 100 100
Indefinite Knowledge
98 100 98.2
Basic Coreference
100 100 100
Conjunction
100 100 100
Compound Coreference
100 100 100
Time Reasoning
100 100 100
Basic Deduction
100 100 100
Basic Induction
99 93.6 100
Positional Reasoning
60 100 100
Size Reasoning
95 100 100
Path Finding
35 100 100
Agent’s Motivations
100 100 100
MEAN ACCURACY 94 100 100
Table 2: SQuARE accuracy (%) comparison
Mem2Seq BossNet StaCACKTask 1
100 (100) 100 (100) 100 (100)
Task 2
100 (100) 100 (100) 100 (100)
Task 3
Task 4
100 (100) 100 (100) 100 (100)
Task 5
Task 1 (OOV)
Task 2 (OOV)
Task 3 (OOV)
Task 4 (OOV)
100 (100) 100 (100) 100 (100)
Task 5 (OOV)
Table 3: Accuracy per response (per dialog) in %.
Discussion
Our goal is to create NLU applications that mimicsthe way human understand natural language. Humansunderstand a passage’s meaning and use commonsenseknowledge to logically reason to find a response to aquestion. We believe that this is the most effective pro-cess to create an NLU application.
Learning and reason-ing both are integral parts of human intelligence. Today,ML research dominates AI. Most state-of-the-art CA orQA systems have been developed using ML techniques(e.g., LSTM, GRU, Attention Network, Transformers,etc.). The systems that are built with these techniqueslearn the patterns of the training text remarkably welland shows promising results on test data. With the re-cent advancements in the language model research, thepre-trained models such as BERT (Devlin et al. 2018)and GPT-3 (Brown et al. 2020) have outstanding ca-pability of generating natural languages. These rapidevolutions in the NLU world showcase some extremelysophisticated text predictor. These are used to build achatbot or a QA system that can generate correct re-sponses by exploiting the correlation among words andwithout properly understanding the content. These MLtechniques are extremely powerful in tasks where learn-ing of hidden data patterns is needed, such as machinetranslation, sentiment analysis, syntactic parsing, etc.However, they fail to generate proper responses wherereasoning is required and they mostly do not employcommonsense knowledge. Also, the black-box nature oftheses models makes their response non-explainable. Inother words, these models do not possess any internalmeaning representation of a sentence or a word and haveno semantically-grounded model of the world. So, it willbe an injustice to say that they understand their in-puts and outputs in any meaningful way. Our semanticknowledge generation approach and its two applicationsare a step toward mimicking a human assistant. We be-lieve that, to obtain truly intelligent behavior, ML andcommonsense reasoning should work in tandem.Compared to ML-based QA systems and CAs, ourapproach has many advantages. It produces correct re-sponses by truly understanding the text and reasoningabout it, rather than by using patterns learned fromtraining examples. Also, ML-based systems are moreikely to produce incorrect response, if not trained ap-propriately, resulting in vulnerability under adversarialattacks (Chan et al. 2018). We believe that our com-monsense reasoning based systems are more resilient.Scalability is also an issue due to the dependence ontraining data. Explainability is a necessary feature thata truly intelligent system must possess. Both SQuAREand StaCACK are capable of generating natural lan-guage justifications for the responses they produce.
Future Work and Conclusion
We presented our novel semantics-driven English textto answer set program generator. Also, we showed howcommonsense reasoning coded in ASP can be lever-aged to develop advanced NLU applications, such asSQuARE and StaCACK. We make use of the s(CASP)engine, a query-driven implementation of ASP, to per-form reasoning while generating an natural languageexplanation for any computed answer. As part of fu-ture work, we plan to extend the SQuARE system tohandle more complex sentences and eventually handlecomplex stories. Our goal is also to develop an open-domain conversational AI chatbot based on automatedcommonsense reasoning that can “converse” with a hu-man based on “truly understanding” that person’s dia-log.
References [Arias et al. 2018] Arias, J.; Carro, M.; Salazar, E.;Marple, K.; and Gupta, G. 2018. Constraint answer setprogramming without grounding.
TPLP arXiv preprintarXiv:2009.10238 .[Baral 2003] Baral, C. 2003.
Knowledge representation,reasoning and declarative problem solving . CambridgeUniversity Press.[Basu et al. 2020] Basu, K.; Varanasi, S. C.; Shakerin,F.; and Gupta, G. 2020. SQuARE: Semantics-basedQuestion Answering and Reasoning Engine. In
Proc.36th ICLP (Technical Communications) , volume 325 of
EPTCS , 73–86.[Basu, Shakerin, and Gupta 2020] Basu, K.; Shakerin,F.; and Gupta, G. 2020. AQuA: ASP-Based VisualQuestion Answering. In
Practical Aspects of Declara-tive Languages , 57–72. Cham: Springer InternationalPublishing.[Bordes, Boureau, and Weston 2016] Bordes, A.;Boureau, Y.-L.; and Weston, J. 2016. Learningend-to-end goal-oriented dialog. arXiv preprintarXiv:1605.07683 .[Brown et al. 2020] Brown, T. B.; Mann, B.; Ryder, N.;Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.;Shyam, P.; Sastry, G.; Askell, A.; et al. 2020. Lan-guage models are few-shot learners. arXiv preprintarXiv:2005.14165 . [Chan et al. 2018] Chan, A.; Ma, L.; Juefei-Xu, F.; Xie,X.; Liu, Y.; and Ong, Y. S. 2018. Metamorphic rela-tion based adversarial attacks on differentiable neuralcomputer. arXiv preprint arXiv:1809.02444 .[Colby, Weber, and Hilf 1971] Colby, K. M.; Weber, S.;and Hilf, F. D. 1971. Artificial paranoia.
ArtificialIntelligence arXiv preprint arXiv:1810.04805 .[Gelfond and Kahl 2014] Gelfond, M., and Kahl, Y.2014.
Knowledge representation, reasoning, and the de-sign of intelligent agents: The answer-set programmingapproach . Cambridge University Press.[Gelfond and Lifschitz 1988] Gelfond, M., and Lifschitz,V. 1988. The stable model semantics for logic program-ming. In
ICLP/SLP , volume 88, 1070–1080.[Honnibal and Montani 2017] Honnibal, M., and Mon-tani, I. 2017. spaCy 2: Natural language understandingwith Bloom embeddings, convolutional neural networksand incremental parsing. To appear.[Kipper et al. 2008] Kipper, K.; Korhonen, A.; Ryant,N.; and Palmer, M. 2008. A large-scale classificationof english verbs.
Language Resources and Evaluation
ICML , 1378–1387.[Levin 1993] Levin, B. 1993.
English verb classes andalternations: A preliminary investigation . U. ChicagoPress.[Lierler, Inclezan, and Gelfond 2017] Lierler, Y.; In-clezan, D.; and Gelfond, M. 2017. Action languagesand question answering. In
IWCS 2017—12th Interna-tional Conference on Computational Semantics—Shortpapers .[Ling 2018] Ling, G. 2018. From Narrative Text toVerbNet-based DRSes: System Text2DRS.[Madotto, Wu, and Fung 2018] Madotto, A.; Wu, C.-S.;and Fung, P. 2018. Mem2Seq: Effectively incorporat-ing knowledge bases into end-to-end task-oriented dia-log systems. arXiv preprint arXiv:1804.08217 .[Manning et al. 2014] Manning, C. D.; Surdeanu, M.;Bauer, J.; Finkel, J.; Bethard, S. J.; and McClosky, D.2014. The Stanford CoreNLP NLP toolkit. In
ACLSystem Demonstrations , 55–60.[McFate 2010] McFate, C. 2010. Expanding verb cover-age in Cyc with VerbNet. In
Proceedings of the ACL2010 Student Research Workshop , 61–66.[Pendharkar and Gupta 2019] Pendharkar, D., andGupta, G. 2019. An ASP Based Approach to Answer-ing Questions for Natural Language Text. In
Proc.Practical Aspects of Declarative Languages,
SpringerNCS 11372, pp. 46-63, volume 11372 of
Lecture Notesin Computer Science , 46–63. Springer.[Raghu, Gupta, and others 2018] Raghu, D.; Gupta, N.;et al. 2018. Disentangling language and knowledge intask-oriented dialogs. arXiv preprint arXiv:1805.01216 .[Schmidt 1986] Schmidt, D. A. 1986.
Denotationalsemantics: a methodology for language development,William C . Brown Publishers, Dubuque, IA, USA.[Schmitz et al. 2012] Schmitz, M.; Soderland, S.; Bart,R.; Etzioni, O.; et al. 2012. Open language learningfor information extraction. In
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Lan-guage Processing and Computational Natural LanguageLearning , 523–534.[Weizenbaum 1966] Weizenbaum, J. 1966. ELIZA—acomputer program for the study of natural languagecommunication between man and machine.
CACM arXiv preprint arXiv:1502.05698arXiv preprint arXiv:1502.05698