Towards Structural Natural Language Formalization: Mapping Discourse to Controlled Natural Language
aa r X i v : . [ c s . C L ] D ec Towards Structural Natural Language Formalization:Mapping Discourse to Controlled Natural Language
Nicholas H. Kirk
Computer Science DepartmentTechnische Universit¨at M ¨unchen [email protected]
Abstract
The author describes a conceptual studytowards mapping grounded natural lan-guage discourse representation structuresto instances of controlled language state-ments. This can be achieved via a pipelineof preexisting state of the art technolo-gies, namely natural language syntax tosemantic discourse mapping, and a reduc-tion of the latter to controlled languagediscourse, given a set of previously learntreduction rules. Concludingly a descrip-tion on evaluation, potential and limita-tions for ontology-based reasoning is pre-sented.
Work towards the formalization of natural lan-guage has been pursued on both syntactic andsemantic levels. Controlled Natural Languages(CNL) for instance provide an unambiguous setof syntactic rules and a controlled vocabulary(Wyner et al., 2010), while sharing human intelli-gibility with the original Natural Language (NL)from which it derives (Kuhn, 2013). Approachesto pure semantic formalization have been donevia symbolic and distributional characterizations(Blackburn et al., 2001; Harris, 1981), to variousextents of compositionality (Clarke, 2012).An important and structural approach to-wards formalization of discourse is DiscourseRepresentation Theory (DRT) (Kamp, 1981;Kamp and Reyle, 1993), which makes useof inter- and intra-sentence discourse refer-ents for anaphoric referencing and meaningpreservation, and a set of semantic-levelconstraints over them. DRT maintains trans-formations to and from logic formalisms(Kamp and Reyle, 1993), and has direct ap- plications within the automated sentence con-struction domain (Guenthner and Lehmann, 1984;Fuchs et al., 2010). Given the logical and lin-guistic properties of CNL (e.g. reasoning,paraphrasability, human- and machine- readabil-ity) the author stresses that a successful mappingbetween NL and CNL can enable languagebased cognition of simple autonomous softwareassistants, for reasoning and as interface to bothpeers and humans.
Given such rationale, the community should for-mulate a methodology for operating a reduction ofsentence-level natural language discourse, to a dis-course representation formulated in a target con-trolled natural language.The author presents a possible pipeline abstrac-tion of preexisting state-of-the-art means, as de-scribed in Figure 1. In particular, source chan-nel text normalization (C1) to regularize erro-neous phonetic transcriptions and spelling; a textto grounded Discourse Representation Structures(DRS) parser (C2) which works thanks to Combi-natory Categorial Grammar (CCG), i.e. a gram-mar formalism that allows a computationally effi-cient interface between syntax and structural se-mantics (Curran et al., 2007). The implementedform has already achieved optimal results and canproduce Discourse Representation Structures asoutput (Bos, 2008); a previously trained sentence-level Support Vector Machine (SVM) rule classi-fier, which identifies the types of NL to CNL re-ductions that should be operated (C3). A simi-larly implemented classifier is present in literature(Naughton et al., 2010). We then have a syntac-tic manipulation engine to transform the naturallanguage input DRS into a set of compliant CNLDRS instances (C4), subject to the previously ob-tained classification results. Such classification(C3) should account for, for instance: ource text nor-malization (C1) text to
DRS NL (C2) colloquialismjargonworkaround ambiguous . . . classification (C3) DRS NL to DRS
CNL
ManipulationEngine (C4) proveparaphrasereason onstore . . .
CNL statement
Figure 1: Representation of an abstract structure-level only NL to CNL manipulator• intrinsically ambiguous natural languagesyntactic constructs• ambiguous anaphoric reference resolution• conscious constraining decisions on the ex-pressiveness of specific CNL constructsThe full enumeration of reduction case reasonsis application domain-dependent and require anaprioristic study that can be performed online andin a supervised manner, for instance with ac-tive learning techniques. A possible target CNLwhich has proven robustness and reliability isACE (Fuchs et al., 2006), which has DRS to CNLverbalization functionalities, as well as paraphras-ing, proving and inference reasoning capabilities.Figure 2 shows a simple instance of the presentedpipeline, which requires manipulation via sostitu-tion of the unigram ”linguistics” with the trigram”a linguistic class”.
NL: ”Harris can teach linguistics on Tuesdays.” ⇓ ACE: ”Harris can teach a linguistic class on Tuesday.”
Figure 2: Example of an NL sentence instance anda possible semantic-preserving reduction to ACE
Evaluation
Evaluation should mainly assess,via the use of human evaluation, if given anarbitrary sentence related to the application do-main, the meaning of this has been success-fully conveyed to the target controlled sen-tence. For instance, a threshold of satisfac-tory quality in action-oriented tasking domains(Nyga and Beetz, 2012) can be if arguments ofintra-, mono-, di- transitive verb arguments havebeen preserved, together with correct anaphoricresolution. Evaluation will also assess domain-specific classification rates and computational ef-ficiency.
Limitations
The presented architecture does notmake assumptions on the content of the predicates that are represented by words, given that the ma-nipulation is operated only at a structural level,i.e. within the boundaries of DRS expressiveness.For a deeper predicate-related alignment, furtherconsiderations regarding lexicon should be made,to provide word sense and Part-Of-Speech (POS)mappings between source vocabulary and targetcontrolled vocabulary.
Potential
Current statistic-based web search ap-proaches that make use of word n-gram modelscan exploit a more structural, discourse orientedapproach. Formalization enables logic satisfiabil-ity check of manipulated NL questions via reduc-tion and reasoning on First Order Logic (FOL)clauses. The expressiveness of the latter wouldalso allow reasoning as Constraint SatisfactionProblems (CSP), i.e. a widely adopted mathemat-ical formalism that expresses real-world decisionproblems as unary and binary constraints over fi-nite variable domains. To pursue the example inFigure 2, admitting other ontological knowledgeof lecturers’ availability and ability, we could for-mulate an NL question (that becomes a formalACE question) to ask for solutions to a simpletimetable scheduling CSP problem, where the do-mains are the possible lecture days and types, andthe constraints are the required lecture types andtime precedence relations between them.
This concept-only presentation hopes to havebriefly highlighted the potential that such abstractCNL-based architecture can have, above all withinthe context of artificial assistants, as a means ofinterface, logic and combinatorial problem rea-soning in ontology-based applications. If com-pliant with CNL rules, a specific set of syntacti-cally reduced NL statements can seamlessly in-terface humans and machines while maintainingintelligibility and logical properties, such as en-tailment verification and inference. Future workshould focus on implementation and efficiencyerification of the stated architecture, to then in-vestigate predicate-level (lexical) semantic align-ment, to step towards (quasi-) complete sentence-level natural language formalization.
References [Blackburn et al.2001] Patrick Blackburn, Johan Bos,Michael Kohlhase, and Hans De Nivelle. 2001. In-ference and computational semantics. In
ComputingMeaning , pages 11–28. Springer.[Bos2008] Johan Bos. 2008. Wide-coverage semanticanalysis with boxer. In Johan Bos and Rodolfo Del-monte, editors,
Semantics in Text Processing. STEP2008 Conference Proceedings , Research in Compu-tational Semantics, pages 277–286. College Publi-cations.[Clarke2012] Daoud Clarke. 2012. A context-theoreticframework for compositionality in distributional se-mantics.
Computational Linguistics , 38(1):41–71.[Curran et al.2007] James R Curran, Stephen Clark, andJohan Bos. 2007. Linguistically motivated large-scale nlp with c&c and boxer. In
Proceedings ofthe 45th Annual Meeting of the ACL on InteractivePoster and Demonstration Sessions , pages 33–36.Association for Computational Linguistics.[Fuchs et al.2006] Norbert E. Fuchs, Kaarel Kaljurand,and Gerold Schneider. 2006. Attempto ControlledEnglish Meets the Challenges of Knowledge Repre-sentation, Reasoning, Interoperability and User In-terfaces. In
FLAIRS 2006 .[Fuchs et al.2010] Norbert E. Fuchs, Kaarel Kaljurand,and Tobias Kuhn. 2010. Discourse Representa-tion Structures for ACE 6.6. Technical Report ifi-2010.0010, Department of Informatics, Universityof Zurich, Zurich, Switzerland.[Guenthner and Lehmann1984] Franz Guenthner andHubert Lehmann. 1984. Automatic constructionof discourse representation structures. In
Proceed-ings of the 10th international conference on Compu-tational linguistics , pages 398–401. Association forComputational Linguistics.[Harris1981] Zellig S Harris. 1981.
Distributionalstructure . Springer.[Kamp and Reyle1993] Hans Kamp and Uwe Reyle.1993.
From discourse to logic: Introduction to mod-eltheoretic semantics of natural language, formallogic and discourse representation theory . Num-ber 42. Springer.[Kamp1981] Hans Kamp. 1981. A theory of truth andsemantic representation.
Formal semantics-the es-sential readings , pages 189–222.[Kuhn2013] Tobias Kuhn. 2013. The understandabilityof owl statements in controlled english.
SemanticWeb , 4(1):101–115. [Naughton et al.2010] Martina Naughton, NicolaStokes, and Joe Carthy. 2010. Sentence-level eventclassification in unstructured texts.
Informationretrieval , 13(2):132–156.[Nyga and Beetz2012] Daniel Nyga and Michael Beetz.2012. Everything robots always wanted to knowabout housework (but were afraid to ask). In , Vilamoura, Portugal,October, 7–12.[Wyner et al.2010] Adam Wyner, Krasimir Angelov,Guntis Barzdins, Danica Damljanovic, Brian Davis,Norbert Fuchs, Stefan Hoefler, Ken Jones, KaarelKaljurand, Tobias Kuhn, et al. 2010. On con-trolled natural languages: Properties and prospects.In