Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction
aa r X i v : . [ c s . C L ] A p r Logician: A Unified End-to-End Neural Approach forOpen-Domain Information Extraction
Mingming Sun [email protected] Computing Lab (CCL)Baidu Research, Beijing, China
Xu Li [email protected] Computing Lab (CCL)Baidu Research, Beijing, China
Xin Wang [email protected] Computing Lab (CCL)Baidu Research, Beijing, China
Miao Fan [email protected] Computing Lab (CCL)Baidu Research, Beijing, China
Yue Feng [email protected] Computing Lab (CCL)Baidu Research, Beijing, China
Ping Li [email protected] Computing Lab (CCL)Baidu Research, Bellevue, WA, USA
Abstract In this paper, we consider the problem of open information ex-traction (OIE) for extracting entity and relation level intermedi-ate structures from sentences in open-domain. We focus on fourtypes of valuable intermediate structures (Relation, Attribute, De-scription, and Concept), and propose a unified knowledge expres-sion form, SAOKE, to express them. We publicly release a data setwhich contains more than forty thousand sentences and the cor-responding facts in the SAOKE format labeled by crowd-sourcing.To our knowledge, this is the largest publicly available human la-beled data set for open information extraction tasks. Using this la-beled SAOKE data set, we train an end-to-end neural model usingthe sequence-to-sequence paradigm, called Logician, to transformsentences into facts. For each sentence, different to existing algo-rithms which generally focus on extracting each single fact with-out concerning other possible facts, Logician performs a globaloptimization over all possible involved facts, in which facts notonly compete with each other to attract the attention of words, butalso cooperate to share words. An experimental study on varioustypes of open domain relation extraction tasks reveals the consis-tent superiority of Logician to other states-of-the-art algorithms.The experiments verify the reasonableness of SAOKE format, thevaluableness of SAOKE data set, the effectiveness of the proposedLogician model, and the feasibility of the methodology to applyend-to-end learning paradigm on supervised data sets for the chal-lenging tasks of open information extraction. SAOKE : https://ai.baidu.com/broad/subordinate?dataset=saoke
Keywords
Knowledge expression; open information extraction; end-to-endlearning; sequence-to-sequence learning; deep learning
Semantic applications typically work on the basis of intermediatestructures derived from sentences. Traditional word-level interme-diate structures, such as POS-tags, dependency trees and semanticrole labels, have been widely applied. Recently, entity and relationlevel intermediate structures attract increasingly more attentions. This paper was initially submitted to EMNLP 2017. The authors sincerely thank thehelpful comments from the Program Committee.
In general, knowledge based applications require entity and re-lation level information. For instance, in [36], the lexicalized de-pendency path between two entity mentions was taken as the sur-face pattern facts. In distant supervision [31], the word sequenceand dependency path between two entity mentions were takenas evidence of certain relation. In Probase [50], candidates of tax-onomies were extracted by Hearst patterns [21]. The surface pat-terns of relations extracted by Open Information Extraction (OIE)systems [3, 14, 15, 38, 49] worked as the source of question answer-ing systems [16, 25]. In addition, entity and relation level interme-diate structures have been proven effective in many other taskssuch as text summarization [10, 11, 30], text comprehension, wordsimilarity, word analogy [41], and more.The task of entity/relation level mediate structure extractionstudies how facts about entities and relations are expressed by nat-ural language in sentences, and then expresses these facts in an in-termediate (and convenient) format. Although entity/relation levelintermediate structures have been utilized in many applications,the study of learning these structures is still in an early stage.Firstly, the problem of extracting different types of entity/relationlevel intermediate structures has not been considered in a unifiedfashion. Applications generally need to construct their own hand-crafted heuristics to extract required entity/relation level interme-diate structures, rather than consulting a commonly available NLPcomponent, as they do for word level intermediate structures. OpenIE-v4 system (http://knowitall.github.io/openie/) attempted to buildsuch components by developing two sub-systems, with each ex-tracting one type of intermediate structures, i.e., SRLIE [9] for verbbased relations, and ReNoun [33, 52] for nominal attributes. How-ever, important information about descriptive tags for entities andconcept-instance relations between entities were not considered.Secondly, existing solutions to the task either used pattern match-ing technique [3, 38, 49, 50], or were trained in a self-supervisedmanner on the data set automatically generated by heuristic pat-terns or info-box matching [3, 15, 49]. It is well-understood thatpattern matching typically does not generalize well and the auto-matically generated samples may contain lots of noises.This paper aims at tackling some of the well-known challengingproblems in OIE systems, in a supervised end-to-end deep learn-ing paradigm. Our contribution can be summarized as three majorcomponents:
SAOKE format , SAOKE data set , and
Logician . ymbol Aided Open Knowledge Expression (SAOKE) is aknowledge expression form with several desirable properties: (i)SAOKE is literally honest and open-domain. Following the philos-ophy of OIE systems, SAOKE uses words in the original sentenceto express knowledge. (ii) SAOKE provides a unified view overfour common types of knowledge: relation , attribute , description and concept . (iii) SAOKE is an accurate expression. With the aid ofsymbolic system, SAOKE is able to accurately express facts withseparated relation phrases, missing information, hidden informa-tion, etc. SAOKE Data Set is a human annotated data set containing48,248 Chinese sentences and corresponding facts in the SAOKEform. We publish the data set for research purpose. To the best ofour knowledge, this is the largest publicly available human anno-tated data set for open-domain information extraction tasks.
Logician is a supervised end-to-end neural learning algorithmwhich transforms natural language sentences into facts in the SAOKEform. Logician is trained under the attention-based sequence-to-sequence paradigm, with three mechanisms: restricted copy mech-anism to ensure literally honestness, coverage mechanism to alle-viate the under extraction and over extraction problem, and gateddependency attention mechanism to incorporate dependency in-formation. Experimental results on four types of open informationextraction tasks reveal the superiority of the Logician algorithm.Our work will demonstrate that SAOKE format is suitable forexpressing various types of knowledge and is friendly to end-to-end learning algorithms. Particularly, we will focus on showingthat the supervised end-to-end learning is promising for OIE tasks,to extract entity and relation level intermediate structures.The rest of this paper is organized as follows. Section 2 presentsthe details of SAOKE. Section 3 describes the human labeled SAOKEdata set. Section 4 describes the Logician algorithm and Section 5evaluates the Logician algorithm and compares its performancewith the state-of-the-art algorithms on four OIE tasks. Section 6discusses the related work and Section 7 concludes the paper.
When reading a sentence in natural language, humans are able torecognize the facts involved in the sentence and accurately expressthem. In this paper, Symbolic Aided Open Knowledge Expression(SAOKE) is proposed as the form for honestly recording these facts.SAOKE expresses the primary information of sentences in n-ary tu-ples ( subject , predicate , object , · · · , object N ) , and (in this paper)neglects some auxiliary information. In the design of SAOKE, wetake four requirements into consideration: completeness, accurate-ness, atomicity and compactness. After having analyzed a large number of sentences, we observethat the majority of facts can be classified into the following classes:(1)
Relation:
Verb/preposition based n-ary relations betweenentity mentions [9, 38];(2)
Attribute:
Nominal attributes for entity mentions [33, 52];(3)
Description:
Descriptive phrases of entity mentions [5]; (4)
Concept:
Hyponymy and synonym relations among con-cepts and instances [44].SAOKE is designed to express all these four types of facts. Ta-ble 1 presents an example sentence and the involved facts of thesefour classes in the SAOKE form. We should mention that the sen-tences and facts in English are directly translated from the cor-responding Chinese sentences and facts, and the facts in Englishmay not be the desired outputs of OIE algorithms for those Eng-lish sentences due to the differences between Chinese and Englishlanguages.
SAOKE adopts the ideology of “literally honest”. That is, as muchas possible, it uses the words in the original sentences to expressthe facts. SAOKE follows the philosophy of OIE systems to expressvarious relations without relying on any predefined schema sys-tem. There are, however, exceptional situations which are beyondthe expression ability of this format. Extra symbols will be intro-duced to handle these situations, which are explained as follows.
Separated relation phrase:
In some languages such as Chi-nese, relation phrases may be divided into several parts residingin discontinued locations of the sentences. To accurately expressthese relation phrases, we add placeholders ( X , Y , Z , etc) to buildcontinuous and complete expressions. “ 深 受 X 影 响 ” (“deeply in-fluenced by X” in English) in the example of Table 1 is an instanceof relation phrase after such processing. Abbreviated expression:
We explicitly express the informa-tion in abbreviated expressions by introducing symbolic predicates.For example, the expression of “Person (birth date - death date)” istransformed into facts: (Person, BIRTH, birth date) (Person, DEATH,death date), and the synonym fact involved in “NBA (National Bas-ketball Association)” is expressed in the form of (NBA, = , NationalBasketball Association) .
Hidden information:
Description of an entity and hyponymyrelation between entities are in general expressed implicitly in sen-tences, and are expressed by symbolic predicates “DESC” and “ISA”respectively, as in Table 1. Another source of hidden information isthe address expression. For example, “ 法 国 巴 黎 ” (“Paris, France”in English) implies the fact ( 巴 黎 , LOC, 法 国 ) ((Paris, LOC, France)in English), where the symbol “LOC” means “location”. Missing information:
A sentence may not tell us the exactrelation between two entities, or the exact subject/objects of a re-lation, which are required to be inferred from the context. We useplaceholders like “ X , Y , Z ” to denote the missing subjects/objects,and “ P ” to denote the missing predicates. Atomicity is introduced to eliminate the ambiguity of knowledgeexpressions. In SAOKE format, each fact is required to be atomic,which means that: (i) it is self-contained for an accurate expression;(ii) it cannot be decomposed into multiple valid facts. We provideexamples in Table 2 to help understand these two criteria.Note that the second criterion implies that any logical connec-tions (including nested expressions) between facts are neglected(e.g., the third case in Table 2 ). This problem of expression relationsbetween facts will be considered in the future version of SAOKE. able 1: Expected facts of an example sentence.
Chinese English TranslationSentence 李 白 (701 年 - 年 ), 深 受 庄 子 思想 影 响 , 爽 朗 大 方 , 爱 饮 酒 作 诗 , 喜 交 友 , 代 表 作 有 《 望 庐 山 瀑 布 》 等 著 名 诗 歌 。 Li Bai (701 - 762), with masterpieces of famous poetries such as"Watching the Lushan Waterfall", was deeply influenced by Zhuangzi’sthought, hearty and generous, loved to drink and write poetry, and likedto make friends.Relations ( 李 白 , 深 受 X 影 响 , 庄 子 思想 )( 李 白 , 爱 , [ 饮 酒 | 作 诗 ]) ( 李 白 , 喜 , 交 友 ) (Li Bai, deeply influenced by, Zhuangzi’s thought)(Li Bai, loved to, [drink| write poetry]) (Li Bai, liked to, make friends)Attribute ( 李 白 , BIRTH, 701 年 ) ( 李 白 , DEATH, 762 年 )( 李 白 , 代 表 作 , 《 望 庐 山 瀑 布 》 ) (Li Bai, BIRTH, 701)(Li Bai, DEATH, 762)(Li Bai, masterpiece, "Watching the Lushan Waterfall")Description ( 李 白 , DESC, 爽 朗 大 方 ) (Li Bai, DESC, hearty and generous)Concept ( 《 望 庐 山 瀑 布 》 , ISA, 著 名 诗 歌 ) ("Watching the Lushan Waterfall", ISA, famous poetry) Table 2: Example sentence and corresponding wrong/correct facts.
Sentence Wrong Facts Correct FactsChinese 山 东 的 GDP 高 于 所 有 西 部 的 省 份 。 ( 山 东 的 GDP, 高 于 , 所 有 省 份 ) ( 所 有 省 份 ,DESC, 西 部 的 ) ( 山 东 的 GDP, 高 于 , 西 部 的 所 有 省 份 )English Shandong’s GDP is higher than in allwestern provinces. (Shandong’s GDP, is higher than, allprovinces) (all provinces, DESC, western) (Shandong’s GDP, is higher than, all westernprovinces)Chinese 李 白 游 览 了 雄 奇 灵 秀 的 泰 山 。 ( 李 白 , 游 览 , 雄 奇 灵 秀 的 泰 山 ) ( 李 白 , 游 览 , 泰 山 ) ( 泰 山 , DESC, 雄 奇 灵 秀 )English Li Bai visited the magnificent MountTai. (Li Bai, visited, the magnificent Mount Tai) (Li Bai, visited, the Mount Tai)(the Mount Tai,DESC, magnificent)Chinese 在 美 国 的 帮 助 下 , 英 国 抵 挡 住 了 德 国 的 进 攻 。 ( 英 国 , 在 X 的 帮 助 下 抵 挡 住 了 Y 的 进 攻 , 美 国 , 德 国 ) ( 美 国 , 帮 助 , 英 国 )( 英 国 , 抵 挡 住 了 X 的 进 攻 , 德 国 )English With the help of the US, the Britishresisted the attack from German. (the British, resisted the attack from X withthe help of Y, German, the US) (the US, helped, the British)( the British,resisted the attack from, German) Natural language may express several facts in a compact form. Forexample, in a sentence “ 李 白 爱 饮 酒 作 诗 ” (“Li Bai loved to drinkand write poetry” in English ), according to atomicity, two factsshould be extracted: ( 李 白 , 爱 , 饮 酒 )( 李 白 , 爱 , 作 诗 ) ( (Li Bai,loved to, drink)(Li Bai, loved to, write poetry) in English ). In thissituation, SAOKE adopts a compact expression to merge these twofacts into one expression: ( 李 白 , 爱 , [ 饮 酒 | 作 诗 ]) ( (Li Bai, lovedto, [drink| write poetry]) in English ).The compactness of expressions is introduced to fulfill, but notto violate the rule of “literally honest”. SAOKE does not allow merg-ing facts if facts are not expressed compactly in original sentences.By this means, the differences between the sentences and the cor-responding knowledge expressions are reduced, which may helpreduce the complexity of learning from data in SAOKE form.With the above designs, SAOKE is able to express various kindsof facts, with each historically considered by different open infor-mation extraction algorithms, for example, verb based relationsin SRLIE [9] and nominal attributes in ReNoun [33, 52], descrip-tive phrases for entities in EntityTagger [5], and hypernyms in Hy-peNet [44]. SAOKE introduces the atomicity to eliminate the ambi-guity of knowledge expressions, and achieves better accuracy andcompactness with the aid of the symbolic expressions. We randomly collect sentences from Baidu Baike (http://baike.baidu.com),and send those sentences to a crowd sourcing company to label the involved facts. The workers are trained with labeling examples andtested with exams. Then the workers with high exam scores areasked to read and understand the facts in the sentences, and ex-press the facts in the SAOKE format. During the procedure, onesentence is only labeled by one worker. Finally, more than fortythousand sentences with about one hundred thousand facts arereturned to us. The manual evaluation results on 100 randomlyselected sentences show that the fact level precision and recallis 89.5% and 92.2% respectively. Table 3 shows the proportions offour types of facts (described in Section 2.1) contained in the dataset. Note that the facts with missing predicates represented by“P” are classified into “Unknown”. We publicize the data set athttps://ai.baidu.com/broad/subordinate?dataset=saoke.
Table 3: Ratios of facts of each type in SAOKE data set.
Relation Attribute Description Concept UnknownRatio 76.02% 7.25% 9.89% 3.64% 3.20%
Prior to the SAOKE data set, an annotated data set for OIE taskswith 3,200 sentences in 2 domains was released in [40] to evaluateOIE algorithms, in which the data set was said [40] “13 times largerthan the previous largest annotated Open IE corpus”. The SAOKEdata set is 16 times larger than the data set in [40]. To the best ofour knowledge, SAOKE data set is the largest publicly availablehuman labeled data set for OIE tasks. Furthermore, the data set re-leased in [40] was generated from a QA-SRL data set [20], whichndicates that the data set only contains facts that can be discov-ered by SRL (Semantic Role Labeling) algorithms, and thus is bi-ased, whereas the SAOKE data set is not biased to an algorithm.Finally, the SAOKE data set contains sentences and facts from alarge number of domains.
Given a sentence S and a set of expected facts (with all the pos-sible types of facts) F = { F , · · · , F n } in SAOKE form, we joinall the facts in the order that annotators wrote them into a charsequence F as the expected output. We build Logician under theattention-based sequence-to-sequence learning paradigm, to trans-form S into F , together with the restricted copy mechanism, thecoverage mechanism and the gated dependency mechanism. The attention-based sequence-to-sequence learning [2] have beensuccessfully applied to the task of generating text and patterns.Given an input sentence S = [ w S , · · · , w SN S ] , the target sequence F = [ w F , · · · , w FN F ] and a vocabulary V (including the symbolsintroduced in Section 2 and the OOV (out of vocabulary) tag )with size N v , the words w Si and w Fj can be represented as one-hot vectors v Si and v Fj with dimension N v , and transformed into N e -dimensional distributed representation vectors by an embed-ding transform x i = Ev Si and y j = Ev Fj respectively, where E ∈ R ( N e × N v ) . Then the sequence of { x i } N s i = is transformed into a se-quence of N h -dimensional hidden states H S = [ h S , · · · , h SN S ] us-ing bi-directional GRU (Gated Recurrent Units) network [8], andthe sequence of { y j } N F j = is transformed into a sequence of N h -dimensionalhidden states H F = [ h F , · · · , h FN F ] using GRU network.For each position t in the target sequence, the decoder learnsa dynamic context vector c t to focus attention on specific loca-tion l in the input hidden states H S , then computes the probabilityof generated words by p ( w Ft |{ w F , · · · , w Ft − } , c t ) = д ( h Ft − , s t , c t ) ,where s t is the hidden state of the GRU decoder, д is the word se-lection model (details could be found in [2]), and c t is computed as c t = Í N S j = α t j h j , where α t j = exp ( e tj ) Í NSk = exp ( e tk ) and e t j = a ( s t − , h Sj ) = v Ta tanh ( W a s t − + U a h Sj ) is the alignment model to measure thestrength of focus on the j -th location. W a ∈ R ( N h × N h ) , U a ∈ R ( N h × N h ) ,and v a ∈ R N h are weight matrices. The word selection model employed in [2] selects words from thewhole vocabulary V , which evidently violates the “literal honest”requirement of SAOKE. We propose a restricted version of copymechanism [17] as the word selection model for Logician:We collect the symbols introduced in Section 2 into a keywordset K = { “ ISA ”, “
DESC ”, “
LOC ”, “
BIRT H ”, “
DEAT H ”, “ = ”, “ ( ”, “)”,“$”,“ [ ”, “ ] ”, “ | ”, “ X ”, “ Y ”, “ Z ”, “ P ” } where “$” is the separator of ele-ments of fact tuples. “ X ”, “ Y ”, “ Z ”, “ P ” are placeholders . When thedecoder is considering generating a word w Ft , it can choose w Ft from either S or K . p ( w Ft | w Ft − , s t , c t ) = p X ( w Ft | w Ft − , s t , c t ) + p K ( w Ft | w Ft − , s t , c t ) , (1) where p X is the probability of copying from S and p K is the prob-ability of selecting from K . Since S ∩ K = ϕ and there are no un-known words in this problem setting, we compute p X and p K in asimpler way than that in [17], as follows: p X ( w Ft = w Sj ) = Z exp ( σ (( h Sj ) T W c ) s t ) , p K ( w Ft = k i ) = Z exp ( v Ti W o s t ) , where the (generic) Z is the normalization term, k i is one of key-words, v i is the one-hot indicator vector for k i , W o ∈ R (| K |× N h ) , W c ∈ R ( N h × N h ) , and σ is a nonlinear activation function. In practice, Logician may forget to extract some facts ( under-extraction )or extract the same fact many times ( over-extraction ). We incorpo-rate the coverage mechanism [43] into Logician to alleviate theseproblems. Formally, when the decoder considers generating a word w Ft , a coverage vector m tj is introduced for each word w Sj , and up-dated as follows: m tj = µ ( m t − j , α t j , h Sj , s t − ) = ( − z i ) ◦ m t − j + z j ◦ ˜ m tj , ˜ m tj = tanh ( W h h Sj + u α α t j + W s s t − + U m [ r i ◦ m t − j ]) , where ◦ is the element-wise multiplication operator. The updategate z j and the reset gate r j are defined as, respectively, z j = σ ( W zh h Sj + u zα α t j + W zs s t − + U zm m t − j ) , r j = σ ( W rh h Sj + u rα α t j + W rs s t − + U rm m t − j ) , where σ is a logistic sigmoid function. The coverage vector m tj con-tains the information about the historical attention focused on w Sj ,and is helpful for deciding whether w Sj should be extracted or not.The alignment model is updated as follows [43]: e t j = a ( s t − , h Sj , m t − j ) = v Ta tanh ( W a s t − + U a h Sj + V a m t − j ) , where V a ∈ R ( N h × N h ) . The semantic relationship between candidate words and the previ-ously decoded word is valuable to guide the decoder to select thecorrect word. We introduce the gated dependency attention mech-anism to utilize such guidance.For a sentence S , we extract the dependency tree using NLPtools such as CoreNLP [29] for English and LTP [6] for Chinese,and convert the tree into a graph by adding reversed edges with arevised labels (for example, adding w Sj − SBV −−−−−→ w Si for edge w Si SBV −−−−→ w Sj in the dependency tree). Then for each pair of words ( w Si , w Sj ) ,the shortest path with labels L = [ w L , · · · , w LN L ] in the graphis computed and mapped into a sequence of N e -dimensional dis-tributed representation vectors [ l , · · · , l N L ] by the embedding op-eration. One can employ RNN network to convert this sequence ofvectors into a feature vector, but RNN operation is time-consuming.We simply concatenate vectors in short paths ( N L ≤
3) into a 3 N e dimensional vector and feed the vector into a two-layer feed for-ward neural network to generate an N h -dimensional feature vec-tor n ij . For long paths with N L > n ij is set to a zero vector.e define dependency attention vector ˜ u tj = Í i p ∗ ( w Ft = w Si ) n ij ,where p ∗ is the sharpened probability p defined in Equation (1). If w Ft ∈ S , ˜ u tj represents the semantic relationship between w Ft and w Sj . If w Ft ∈ K , then ˜ u tj is close to zero. To correctly guide thedecoder, we need to gate ˜ u tj to remember the previous attentionvector sometimes (for example, when $ is selected), and to forgetit sometimes (for example, when a new fact is started). Finally, wedefine u tj = д ( ˜ u tj ) as the gated dependency attention vector, where д is the GRU gated function, and update the alignment model asfollows: e t j = a ( s t − , h Sj , m t − j , u t − j ) = v Ta tanh ( W a s t − + U a h j + V a m t − j + D a u t − j ) where D a ∈ R ( N h × N h ) . For each sequence generated by Logician, we parse it into a set offacts, remove tuples with illegal format or duplicated tuples. Theresultant set is taken as the output of the Logician.
We first measure the utility of various components in Logicianto select the optimal model, and then compare this model to thestate-of-the-art methods in four types of information extractiontasks: verb/preposition-based relation, nominal attribute, descrip-tive phrase and hyponymy relation. The SAOKE data set is splitinto training set, validating set and testing set with ratios of 80%,10%, 10%, respectively. For all algorithms involved in the experi-ments, the training set can be used to train the model, the validat-ing set can be used to select an optimal model, and the testing setis used to evaluate the performance.
For each instance pair ( S , F ) in thetest set, where S is the input sentence and F is the formatted stringof ground truth of facts, we parse F into a set of tuples F = { F i } Mj = .Given an open information extraction algorithm, it reads S andproduces a set of tuples G = { G i } Nj = . To evaluate how well the G approximates F , we need to match each G i to a ground truth fact F j and check whether G i tells the same fact as F j . To conduct thematch, we compute the similarity between each predicted fact in G and each ground truth fact in F , then find the optimal matchingto maximize the sum of matched similarities by solving a linearassignment problem [47]. In the procedure, the similarity betweentwo facts is defined as Sim ( G i , F j ) = Í min ( n ( G i ) , n ( F j )) l = g ( G i ( l ) , F j ( l )) max ( n ( G i ) , n ( F j )) , where G i ( l ) and F j ( l ) denote the l -th element of tuple G i and F j respectively, g (· , ·) denotes the gestalt pattern matching [35] mea-sure for two strings and n (· ) returns the length of the tuple.Given a matched pair of G i and F j , we propose an automaticapproach to judge whether they tell the same fact. They are judgedas telling the same fact if one of the following two conditions issatisfied: • n ( G i ) = n ( F j ) , and g ( G i ( l ) , F j ( l )) ≥ . , l = , · · · , n ( G i ) ; • n ( G i ) = n ( F j ) , and g (S( G i ) , S( F j ) ≥ . S is a function formatting a fact into a string by filling thearguments into the placeholders of the predicate.With the automatic judgment, the precision ( P ), recall ( R ) and F -score over a test set can be computed. By defining a confidencemeasure and ordering the facts by their confidences, a precision-recall curve can be drawn to illustrate the overall performance ofthe algorithm. For Logician, the confidence of a fact is computedas the average of log probabilities over all words in that fact.Beyond the automatic judgment, human evaluation is also em-ployed. Given an algorithm and the corresponding fact confidencemeasure, we find a threshold that produces approximately 10% re-call (measured by automatic judgment) on the validation set ofSAOKE data set. A certain number of sentences (200 for verb/prepositionbased relation extraction task, and 1000 for other three tasks) arerandomly chosen from the testing set of SAOKE data set, and thefacts extracted from these sentences are filtered with that thresh-old. Then we invite three volunteers to manually refine the labeledset of facts for each sentence and vote to decide whether each fil-tered fact is correctly involved in the sentence. The standard pre-cision, recall and F -score are reported as the human evaluationresults. For each instance pair ( S , F ) in the training set of SAOKE data set, we split S and F intowords using LTP toolset [6], and words appearing in more than 2sentences are added to the vocabulary. By adding the OOV (out ofvocabulary) tag, we finally obtain a vocabulary V with size N V = , N e = N h = { x i } N S i = into hidden states { h Si } N S i = , and a two-layer GRU withhidden-dimension 256 to encode the sequence of { y j } N F j = into hid-den states { h Fj } N F j = . Finally, the Logician network is constructed asstated in Section 4. The Logician is then trained using stochasticgradient descent (SGD) with RMSPROP [22] strategy for 20 epochswith batch size 10 on the training set of SAOKE data set. The modelwith best F -score by automatic judgment on the validation set isselected as the trained model. When the model is trained, given asentence, we employ the greedy search procedure to produce thefact sequences. In this section, we analyze the effects of components involved inLogician: restricted copy, coverage, and gated dependency. Sincethe restricted copy mechanism is the essential requirement of Lo-gician in order to achieve the goal of literally honest, we take theLogician with only copy mechanism (denoted by
Copy ) as the base-line, and analyze the effeteness of coverage mechanism (denotedby
Copy + Coveraдe ), gated dependency mechanism (denoted by
Copy + GatedDep ) and both (denoted by
All ). Furthermore, thereis another option of whether or not to involve shallow seman-tic information such as POS-tag and NER-tag into the model. Formodels involving such information, the POS-tag and NER-tag ofeach word in sentence S are annotated using LTP. For each wordin F that is not any keyword in K , the POS-tag and NER-tag are able 4: Analysis of Components involved in Logician. With Shallow Tag No Shallow Tag
P R F P R F Copy
Copy + Coveraдe
Copy + GatedDep
All S . For each key-word in K , a unique POS-tag and a unique NER-tag are assignedto it. Finally, for each word in S or F , the POS-tag and NER-tagare mapped into N e -dimensional distributed representation vec-tors and are concatenated into x i or y j to attend the training.All models are trained using the same settings described in abovesection, and the default output facts (without any confidence fil-tering) are evaluated by the automatic judgment. The results arereported in Table 4. From the results, we can see that the model in-volving all the components and shallow tag information archivesthe best performance. We use that model to attend the comparisonswith existing approaches. In thetask of extracting verb/preposition based facts, we compare our Lo-gician with the following state-of-the-art Chinese OIE algorithms:
SRLIE : our implementation of SRLIE [9] for the Chinese lan-guage, which first uses LTP tool set to extract the semantic role la-bels, and converts the results into fact tuples using heuristic rules.The confidence of each fact is computed as the ratio of the num-ber of words in the fact to the number of words in the shortestfragment of source sentence that contains all words in the fact.
ZORE : the Chinese Open Relation Extraction system [34], whichbuilds a set of patterns by bootstrapping based on dependency pars-ing results, and uses the patterns to extract relations. We used theprogram provided by the author of ZORE system [34] to generatethe extraction results in XML format, and developed an algorithmto transform the facts into n-ary tuples, where auxiliary informa-tion extracted by ZORE is removed. The confidence measure forZORE is the same as that for SRLIE.
SRL
SAOKE : our implementation of the states-of-the-art SRL al-gorithm proposed in [19] with modifications to fit OIE tasks. SRL
SAOKE extracts facts in two steps: (i) Predicate head word detection: de-tects head word for predicate of each possible fact, where headword of a predicate is the last word in the predicate dependingon words outside the predicate in the dependency tree. (ii) Ele-ment phrase detection: For each detected head word, detects thesubject phrase, predicate phrase and object phrases by tagging thesentence with an extended BIOE tagging scheme, which tags theword neighboring the separation point of the phrase by “M” tocope with the separated phrase. We modify the code provided bythe author of [19] to implement above strategy, and then train amodel with the same parameter setting in [19] on the training setof SAOKE data set. The confidence measure for SRL
SAOKE is com-puted as the average of log probabilities over all tags of words infacts. Note that SRL
SAOKE can extract both verb/preposition based
Recall P r e c i s i on LogicianSRL
SAOKE
ZORESRLIE ( a) Verb/preposition-based Relation Recall P r e c i s i on LogicianSRL
SAOKE (b) Nominal Attribute
Recall P r e c i s i on LogicianSDDE (c) Descriptive Phrase
Recall P r e c i s i on LogicianHypeNet
PhraseExtra
HypeNet
Phrase (d) Hyponymy
Figure 1: Performance comparison on four types of informa-tion extraction tasks. relation and nominal attributes, but in this section, we only evalu-ate the results of the former type of facts.The precision-recall curves of Logician and above three compar-ison algorithms are shown in Figure 1a, and the human evaluationresults are shown in the first section of Table 5.
The state-of-the-artnominal attribute extraction method is ReNoun [33, 52]. However,it relies on a pre-constructed English attribute schema system [18]which is not available for Chinese, so it is not an available base-line for Chinese. Since SRL
SAOKE can extract nominal attributes,we compare Logician with SRL
SAOKE on this task. The precision-recall curves of Logician and SRL
SAOKE on the nominal attributeextraction task are shown in Figure 1b, and the human evaluationresults are shown in the second section of Table 5.
Descriptive phrase ex-traction has been considered in [5], in which domain names arerequired to develop patterns to extract candidates for descriptivephrases, so this method is not applicable to open domain tasks.We develop a baseline algorithm (called Semantic Dependency De-scription Extractor, SDDE) to extract descriptive phrase. It extractssemantic dependency relation between words using LTP toolset,and for each noun w n which is the parent of some semantic “Desc”relations, identifies a noun phrase N with w n as its heading word,assembles a descriptive phrase D containing all words with “Desc”relation to w n , and finally outputs the fact “( N , DESC , D )”. Theconfidence of fact in SDDE is computed as the ratio of the num-ber of adverbs and adjectives in D to the number of words in D .he precision-recall curves of Logician and SDDE on the descrip-tive phrase extraction task are shown in Figure 1c, and the humanevaluation results are shown in the third section of Table 5. HypeNet [44] is the state-of-the-art algorithm recommended for hyponymy extraction [46], whichjudges whether hyponymy relation exists between two given words.To make it capable of judging hyponymy relation between twophrases, we replace the word embedding vector component in Hy-peNet by an LSTM network. Two modified HypeNet models arebuilt using different training data sets: (i) HypeNet
Phrase : using thepairs of phrases with ISA relation in the training set of SAOKEdata set (9,407 pairs after the compact expression expansion); (ii)HypeNet
ExtraPhrase : besides the training set for HypeNet
Phrase , addingtwo Chinese hyponymy data sets (1.4 million pair of words in to-tal in hyponymy relation): Tongyici Cilin (Extended) (CilinE forshort) [6] and cleaned Wikipedia Category data [27]. In both cases,the sentences from both Chinese Wikipedia pages and training setof SAOKE data set are taken as the background corpus for the Hy-peNet algorithm. In the testing phase, the trained models are usedto predict whether the hyponymy relation exists for each pair ofnoun phrases/words in sentences of the testing set of SAOKE dataset. The confidence of a judgment is the predicted probability ofthe existence of hyponymy relation. The precision-recall curvesof Logician, HypeNet
Phrase and HypeNet
ExtraPhrase are shown in Fig-ure 1d, and the human evaluation results in the fourth section ofTable 5.
Table 5: Human evaluation results on four types of informa-tion extraction tasks.
Task Method
P R F
1. Relation SRLIE 0.166 0.119 0.139ZORE 0.300 0.136 0.187SRL
SAOKE
2. Attribute SRL
SAOKE
3. Description SDDE 0.135 0.146 0.140Logician 0.392 0.109 0.
4. Hyponymy HypeNet
Phrase
ExtraPhrase
The experimental results reveal that, Logician outperforms the com-parison methods with large margin in first three tasks. For hy-ponymy detection tasks, Logician overwhelms the HypeNet
Phrase using the same training data, and produces comparable results toHypeNet
ExtraPhrase with much less training data. Table 6 exhibits sev-eral example sentences and the facts extracted by these algorithms.The poor performance of pattern-based methods is plausiblydue to the noise in SAOKE data set. The sentences in SAOKE dataset are randomly selected from a web encyclopedia, with free andcasual writing style, are thus more noisy than the training dataof NLP toolset used by these methods. In this situation, the NLP toolset may produce poor results, so do the pattern-based meth-ods.Models learned from the SAOKE data set archive much betterperformance. Nevertheless, SRL
SAOKE extracts each fact withoutknowing whether a candidate word has been used in other facts,which results in the misleading overlap of the word “ 学 ” (“Learn”in English) between two facts in the first case of Table 6. Similarly,HypeNet Phrase and HypeNet
ExtraPhrase focus on the semantic vectorsof pairs of phrases and their dependency paths in the backgroundcorpus. They extract each fact independently from other facts andhence do not know whether there have been any other relationsextracted about these two phrases. In other words, for those com-parison methods, an important source of information is neglectedand a global optimization for all facts involved in sentences is ab-sent.On the contrary, Logician performs global optimization over thefacts involved in each sentence by the sequence-to-sequence learn-ing paradigm with the help of the coverage mechanism, in whichfacts compete each other to attract the attention of words, but alsocooperate to share words. Valuable information is shared betweenthese multiple tasks, which makes Logician consistently superiorto other algorithms in these tasks.Furthermore, SRL
SAOKE and HypeNet methods suffer from theOOV problem, such as unfamiliar words/phrases like the personname and school name in the last case of Table 6. In this situa-tion they may fail to produce a reasonable result. Logician is ableto cope with unfamiliar words/phrases by exploiting the contextinformation using deep RNN network with the help of copy mech-anism.
We do a preliminary analysis for the results produced by the Logi-cian model. The most notable problem is that it is unable to recallsome facts for long or complex sentences. The last case in Table 6exhibits such situation, where the fact ( 蔡 竞 ,ISA, 经 济 学 博 士 )((CaiJing, ISA, Ph. D. in economics) in English) is not recalled. This phe-nomenon indicates that the coverage mechanism may lose effec-tiveness in this situation. The second class of error is incompleteextraction, as exhibited in the third case in Table 6. Due to theincomplete extraction, the left parts may interfere the generationof other facts, and result in nonsense results, which is the thirdclass of error. We believe it is helpful to introduce extra rewardsinto the learning procedure of Logician to overcome these prob-lems. For example, the reward could be the amount of remaininginformation left after the fact extraction, or the completeness ofextracted facts. Developing such rewards and reinforcement learn-ing algorithms using those rewards to refine Logician belongs toour future works. Tuple is the most common knowledge expression format for OIEsystems to express n-ary relation between subject and objects. Be-yond such information, ClausIE [12] extracts extra information inthe tuples: a complement, and one or more adverbials, and OL-LIE [38] extracts additional context information. SAOKE is ableto express n-ary relations, and can be easily extended to support able 6: Example of extraction from a sentence for each task.
Chinese English Translation LabelSentence 学 道 访 仙 , 晚 年 修 道 于 石 笋 山 Learn the Tao, visit the immortal, practice Taoism in theStalagmite Hill in the old ageSRLIE (X, 学 , 道 )(X, 访 , 仙 ) (X, learn, the Tao)(X, visit, the immortal) ✓
1. Relation ZORE ( 学 道 , 访 , 仙 )( 学 道 , 修 道 于 , 石 笋 山 ) (Learn the Tao, visit, the immortal)(Learn the Tao, practice Taoism in, the Stalagmite Hill) ✗✗ SRL
SAOKE (X, 学 , 道 访 )( 学 , 修 道 于 , 石 笋 山 ) (X, learn, Tao visit)(Learn, practice Taoism in, the Stalagmite Hill) ✗ Logician (X, 学 , 道 )(X, 访 , 仙 )(X, 修 道 于 , 石 笋 山 ) (X, learn, the Tao)(X, visit, the immortal)(X, practice Taoism in, the Stalagmite Hill) ✓✓ Sentence 全 村 辖 区 总 面 积 约 平 方 公 里 , 其 中 耕 地 公 顷 。 The whole village area of about 10 square kilometers, of which132 hectares of arable land.2. Attribute SRL
SAOKE ( 全 村 辖 区 , 总 面 积 , 约 平 方 公 里 ) (The village, whole area, about 10 square kilometers) ✓ Logician ( 全 村 辖 区 , 总 面 积 , 约 平 方 公 里 )( 全 村 , 耕 地 ,132 公 顷 ) (The village, whole area, about 10 square kilometers)(The village, arable land, 132 hectares) ✓✓ Sentence 硫 酸 钙 较 为 常 用 , 其 性 质 稳 定 , 无 嗅 无 味 , 微 溶 于 水 Calcium sulfate is commonly used, its properties are stable,odorless and tasteless, slightly soluble in water3. Description SDDE No Recall No RecallLogician ( 硫 酸 钙 ,DESC,[ 性 质 稳 定 | 无 嗅 无 味 ])( 硫 酸 钙 ,DESC, 稳 定 ) (Calcium sulfate, DESC, [properties stable | odorless andtasteless])(Calcium sulfate, DESC, stable) ✓✗ Sentence 蔡 竞 , 男 , 汉 族 , 四 川 射 洪 人 , 西 南 财 经 大 学 经 济 学 院 毕 业 , 经 济 学 博 士 。 Cai Jing, male, Han Chinese, a Sichuan Shehong native,graduated from the economics school of Southwestern Universityof Finance and Economics (SUFE), and a Ph. D. in economics.4. Hyponymy HypeNet
Phrase ( 经 济 学 博 士 ,ISA, 蔡 ) (Ph. D. in economics,ISA,Cai) ✗ HypeNet
ExtraPhrase ( 西 南 财 经 大 学 经 济 学 院 ,ISA, 经 济 学 博 士 )( 西 南 财 经 大 学 经 济 学 院 ,ISA, 四 川 )( 经 济 学 博 士 ,ISA, 西 南 财 经 大 学 经 济 学 院 ) (the economics school of SUFE, ISA, Ph. D. in economics)(the economics school of SUFE,ISA,Sichuan)(Ph. D. in economics,ISA,the economics school of SUFE) ✗✗✗ Logician ( 蔡 竞 ,ISA,[ 男 | 汉 族 | 四 川 射 洪 人 ]) (Cai Jing,ISA,[male |Han Chinese | Sichuan Shehong native]) ✓ the knowledge extracted by ClausIE, but needs to be redesigned tosupport context information, which belongs to our future work.However, there is a fundamental difference between SAOKE andtuples in traditional OIE systems. In traditional OIE systems, knowl-edge expression is generally not directly related to the extractionalgorithm. It is a tool to reorganize the extracted knowledge into aform for further easy reading/storing/computing. However, SAOKEis proposed to act as the direct learning target of the end-to-end Lo-gician model. In such end-to-end framework, knowledge represen-tation is the core of the system, which decides what informationwould be extracted and how complex the learning algorithm wouldbe. To our knowledge, SAOKE is the first attempt to design a knowl-edge expression friendly to the end-to-end learning algorithm forOIE tasks. Efforts are still needed to make SAOKE more powerfulin order to express more complex knowledge such as events. Relation extraction is the task to identify semantic connections be-tween entities. Major existing relation extraction algorithms canbe classified into two classes: closed-domain and open-domain. Closed-domain algorithms are learnt to identify a fixed and finite set ofrelations, using supervised methods [23, 32, 54, 56] or weakly su-pervised methods [28, 31], while the open-domain algorithms, rep-resented by aforementioned OIE systems, discover open-domainrelations without predefined schema. Beyond these two classes,methods like universal schema [37] are able to learn from both data with fixed and finite set of relations, such as relations in Free-base, and data with open-domain surface relations produced byheuristic patterns or OIE systems.Logician can be used as an OIE system to extract open-domainrelation between entities, and act as sub-systems for knowledgebase construction/completion with the help of schema mapping [39].Compared with existing OIE systems, which are pattern-based orself-supervised by labeling samples using patterns [30], to our knowl-edge Logician is the first model trained in a supervised end-to-end approach for OIE task, which has exhibited powerful abilityin our experiments. There are some neural based end-to-end sys-tems [28, 32, 56] proposed for relation extraction, but they all aimto solve the close-domain problem.However, Logician is not limited to relation extraction task. First,Logician extracts more information beyond relations. Second, Logi-cian focuses on examining how natural languages express facts [14],and producing helpful intermediate structures for high level tasks.
Efforts had been made to map natural language sentences intological form. Some approaches such as [13, 24, 53, 55] learn themapping under the supervision of manually labeled logical forms,while others [4, 26] are indirectly supervised by distant informa-tion, system rewards, etc. However, all previous works rely on apre-defined, domain specific logical system, which limits their abil-ity to learn facts out of the pre-defined logical system.Logician can be viewed as a system that maps language to nat-ural logic , in which the majority of information is expressed byatural phrase. Other than systems mentioned above which aim atexecution using the logical form, Logician focuses on understand-ing how the fact and logic are expressed by natural language. Fur-ther mapping to domain-specific logical system or even executorcan be built on the basis of Logician’s output, and we believe that,with the help of Logician, the work would be easier and the overallperformance of the system may be improved.
The problem of generating sentences from a set of facts has at-tracted a lot of attentions [1, 7, 45, 48]. These models focus onfacts with a predefined schema from a specific problem domain,such as people biographies and basketball game records, but couldnot work on open domain. The SAOKE data set provides an oppor-tunity to extend the ability of these models into open domain.
As mentioned in above sections, the SAOKE data set provides ex-amples of dual mapping between facts and sentences. Duality hasbeen verified to be useful to promote the performance of agentsin many NLP tasks, such as back-and-forth translation [51], andquestion-answering [42]. It is a promising approach to use the dual-ity between knowledge and language to improve the performanceof Logician.
In this paper, we consider the open information extraction (OIE)problem for a variety of types of facts in a unified view. Our solu-tion consists of three components:
SAOKE format , SAOKE dataset , and
Logician . SAOKE form is designed to express differenttypes of facts in a unified manner. We publicly release the largestmanually labeled data set for OIE tasks in SAOKE form. Using thelabeled SAOKE data set, we train an end-to-end neural sequence-to-sequence model, called Logician, to transform sentences in nat-ural language into facts. The experiments reveal the superiority ofLogician in various open-domain information extraction tasks tothe state-of-the-art algorithms.Regarding future work, there are at least three promising direc-tions. Firstly, one can investigate knowledge expression methodsto extend SAOKE to express more complex knowledge, for taskssuch as event extraction. Secondly, one can develop novel learningstrategies to improve the performance of Logician and adapt the al-gorithm to the extended future version of SAOKE. Thirdly, one canextend SAOKE format and Logician algorithm in other languages.
References [1] Shubham Agarwal and Marc Dymetman. A Surprisingly Effective Out-of-the-Box Char2char Model on the E2E NLG Challenge Dataset. In
Proceedings of the18th Annual SIGdial Meeting on Discourse and Dialogue , number August, pages158–163, 2017.[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. NeuralMachine Trans-lation By Jointly Learning To Align and Translate. In
Proceedings of ICLR , sep2015.[3] Michele Banko, Mj Cafarella, and Stephen Soderland. Open information extrac-tion for the web. In
IJCAI , pages 2670–2676, 2007.[4] Qingqing Cai and Alexander Yates. Large-scale Semantic Parsing via SchemaMatching and Lexicon Extension. In
Proceedings of the 51st Annual Meeting ofACL , pages 423–433, 2013.[5] Kaushik Chakrabarti, Surajit Chaudhuri, Tao Cheng, and Dong Xin. Entitytag-ger: automatically tagging entities with descriptive phrases. In
Proceedings ofthe 20th International Conference Companion on WWW , pages 19–20, 2011. [6] Wanxiang Che, Zhenghua Li, and Ting Liu. LTP: A Chinese Language Technol-ogy Platform. In
Proceedings of COLING , pages 13–16, 2010.[7] Andrew Chisholm, Will Radford, and Ben Hachey. Learning to Generate One-sentence Biographies from Wikidata. In
Proceedings of the 15th Conference ofthe European Chapter of the Association for Computational Linguistics , volume 1,pages 633–642, 2017.[8] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau,Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning Phrase Repre-sentations using RNN Encoder-Decoder for Statistical Machine Translation. In
Proceedings of the 2014 Conference on EMNLP , pages 1724–1734, 2014.[9] Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. An analysisof open information extraction based on semantic role labeling. In
Proceedingsof the sixth International Conference on Knowledge Capture , pages 113–120, 2011.[10] Janara Christensen, Mausam, Stephen Soderland, Oren Etzioni, Mausam,Stephen Soderland, and Oren Etzioni. Towards Coherent Multi-Document Sum-marization. In
Proceedings of the 2013 Conference of NAACL: HLT , number Sec-tion 3, pages 1163–1173, 2013.[11] Janara Christensen, Stephen Soderland, and Gagan Bansal. Hierarchical Sum-marization: Scaling Up Multi-Document Summarization. In
Proceedings of the52nd Annual Meeting of ACL , pages 902–912, 2014.[12] Luciano Del Corro and Rainer Gemulla. ClausIE: Clause-Based Open Informa-tion Extraction. In
Proceedings of the 22nd International Conference on WWW ,pages 355–366, 2013.[13] Li Dong and Mirella Lapata. Language to Logical Form with Neural Attention.In
In Proceedings of the Annual Meeting of ACL. , pages 33–43, 2016.[14] Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, andMausam. Open information extraction: The second generation. In
Proceedingsof IJCAI , pages 3–10, 2011.[15] Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying Relations forOpen Information Extraction. In
Proceedings of the Conference on EMNLP , pages1535—-1545, 2011.[16] Anthony Fader, Luke S Zettlemoyer, and Oren Etzioni. Open Question Answer-ing Over Curated and Extracted Knowledge Bases. In
Proceedings of the 20thACM SIGKDD , pages 1156–1165, 2014.[17] Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. Incorporating CopyingMechanism in Sequence-to-Sequence Learning. In
Proceedings of the 54th AnnualMeeting of ACL , pages 1631–1640, 2016.[18] Rahul Gupta and A Halevy. Biperpedia: An Ontology for Search Applications.In
Proceedings of the VLDB Endowment , pages 505–516, 2014.[19] Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. Deep SemanticRole Labeling: What Works and What’s Next. In
Proceedings of the 55th AnnualMeeting of the ACL , pages 473–483, 2017.[20] Luheng He, Mike Lewis, and Luke Zettlemoyer. Question-Answer Driven Se-mantic Role Labeling: Using Natural Language to Annotate Natural Language.In
Proceedings of the 2015 Conference on EMNLP , pages 643–653, 2015.[21] Marti A. Hearst. Automatic Acquisition of Hyponyms ftom Large Text Corpora.In
Proceedings of the 14th conference on Computational Linguistics , pages 23–28,1992.[22] Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Overview of mini-batchgradient descent. Technical report, 2012.[23] Nanda Kambhatla. Combining lexical, syntactic, and semantic features withmaximum entropy models for extracting relations. In
Proceedings of the ACL:Interactive Poster and Demonstration Sessions , 2004.[24] Rohit J Kate, Yuk Wah, and Wong Raymond. Learning to Transform Natural toFormal Languages. In
Proceedings of the 20th AAAI , pages 1062—-1068, 2005.[25] Tushar Khot, Ashish Sabharwal, and Peter Clark. Answering Complex Ques-tions Using Open Information Extraction. In
Proceedings of the 55th AnnualMeeting of the ACL , pages 311—-316, 2017.[26] Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke Zettlemoyer. Scaling Se-mantic Parsers with On-the-fly Ontology Matching. In
Proceedings of the 2013Conference on EMNLP , pages 1545–1556, 2013.[27] Jinyang Li, Chengyu Wang, Xiaofeng He, Rong Zhang, and Ming Gao. UserGenerated Content Oriented Chinese Taxonomy Construction. In
Lecture Notesin Computer Science , volume 9313, pages 623–634. 2015.[28] Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. NeuralRelation Extraction with Selective Attention over Instances. In
Proceedings ofthe 54th Annual Meeting of ACL , pages 2124–2133, 2016.[29] Christopher D Manning, John Bauer, Jenny Finkel, Steven J Bethard, Mihai Sur-deanu, and David McClosky. The Stanford CoreNLP Natural Language Process-ing Toolkit. In
Proceedings of 52nd Annual Meeting of ACL: System Demonstra-tions , pages 55–60, 2014.[30] Mausam. Open Information Extraction Systems and Downstream Applications.In
Proceedings of the 25th IJCAI , pages 4074–4077, 2016.[31] Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision forrelation extraction without labeled data. In
Proceedings of the Joint Conferenceof the 47th Annual Meeting of the ACL and the 4th IJCNLP , volume 2, page 1003,2009.32] Makoto Miwa and Mohit Bansal. End-to-End Relation Extraction using LSTMson Sequences and Tree Structures. In
Proceedings of the 54th Annual Meeting ofACL , pages 1105–1116, 2016.[33] Harinder Pal and Mausam. Demonyms and Compound Relational Nouns inNominal Open IE. In
Proceedings of the 5th Workshop on AKBC , pages 35–39,2016.[34] Likun Qiu and Yue Zhang. ZORE : A Syntax-based System for Chinese OpenRelation Extraction. In
Proceedings of the 2014 Conference on EMNLP , pages 1870–1880, 2014.[35] John W Ratcliff and David E Metzener. Pattern Matching: The Gestalt Approach.
Dr Dobb’s , 13(7), 1988.[36] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. Rela-tion Extraction with Matrix Factorization and Universal Schemas.
Proceedingsof the 2013 Conference of NAACL: HLT , (June):74–84, 2013.[37] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. Rela-tion Extraction with Matrix Factorization and Universal Schemas. In
Proceedingsof the 2013 Conference of NAACL: HLT , number June, pages 74–84, 2013.[38] Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. Open lan-guage learning for information extraction. In
Proceedings of the 2012 Joint Con-ference on EMNLP and CoNLL , pages 523–534, 2012.[39] Stephen Soderland, Brendan Roof, Bo Qin, and Shi Xu. Adapting Open Informa-tion Extraction to Domain-Specific Relations.
AI Magazine , 31(3):93–102, 2010.[40] Gabriel Stanovsky and Ido Dagan. Creating a Large Benchmark for Open In-formation Extraction. In
Proceedings of the 2016 Conference on EMNLP , pages2300–2305, 2016.[41] Gabriel Stanovsky, Ido Dagan, and Mausam. Open IE as an Intermediate Struc-ture for Semantic Tasks. In
Proceedings of the 53rd Annual Meeting of ACL andthe 7th IJCNLP , pages 303–308, 2015.[42] Duyu Tang, Nan Duan, Tao Qin, Zhao Yan, and Ming Zhou. Question answeringand question generation as dual tasks. arXiv preprint arXiv:1706.02027 , 2017.[43] Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. ModelingCoverage for Neural Machine Translation.
In Proceedings of the Annual Meetingof ACL , pages 76–85, 2016.[44] Vered Shwartz, Yoav Goldberg, Ido Dagan, Vered Shwartz, Yoav Goldberg, andIdo Dagan. Improving hypernymy detection with an integrated path-based anddistributional method. In
Proceedings of the 54th Annual Meeting of ACL , pages 2389–2398, 2016.[45] Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Fred-erique Laforest, Jonathon Hare, and Elena Simperl. Neural Wikipedian: Generat-ing Textual Summaries from Knowledge Base Triples.
Journal of Web Semantics:Science, Services and Agents on the World Wide Web , 2017.[46] Chengyu Wang and Xiaofeng He. A Short Survey on Taxonomy Learning fromText Corpora : Issues , Resources and Recent Advances. In
Proceedings of theConference on EMNLP , 2017.[47] Wikipedia. Assignment problem— Wikipedia, The Free Encyclopedia, 2017.[48] Sam Wiseman, Stuart M. Shieber, and Alexander M. Rush. Challenges in Data-to-Document Generation. In
Proceedings of the 2017 Conference on EmpiricalMethods in Natural Language Processing , pages 2243–2253, 2017.[49] Fei Wu and Daniel S Weld. Open Information Extraction using Wikipedia. In
Proceedings of the 48th Annual Meeting of ACL , pages 118–127, 2010.[50] Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. Probase: A prob-abilistic taxonomy for text understanding. In
Proceedings of the 2012 ACM SIG-MOD , pages 481–492, 2012.[51] Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, and Wei-YingMa. Dual Learning for Machine Translation. In
Advances in Neural InformationProcessing Systems 29 , pages 1–9, 2016.[52] Mohamed Yahya, Steven Euijong Whang, Rahul Gupta, and Alon Halevy. Re-Noun : Fact Extraction for Nominal Attributes. In
Proceedings of the Conferenceon EMNLP 2014, Doha, Qatar , pages 325–335, 2014.[53] Pengcheng Yin, Zhengdong Lu, Hang Li, and Ben Kao. Neural Enquirer: Learn-ing to Query Tables. In
In Proceedings of the Annual Meeting of ACL , pages 29–35,2016.[54] Dmitry Zelenko, Chinatsu Aone, Anthony Richardella, Jaz Kandola, ThomasHofmann, Tomaso Poggio, and John Shawe-Taylor. Kernel Methods for Rela-tion Extraction.
Journal of Machine Learning Research , 3:1083–1106, 2003.[55] Luke S Zettlemoyer and Michael Collins. Learning to Map Sentences to Logi-cal Form : Structured Classification with Probabilistic Categorial Grammars. In
Proceedings of the 21st Conference on UAI , pages 658–666, 2005.[56] Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu.Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme.