[PDF] How to Train Your Agent to Read and Write

Abstract

Reading and writing research papers is one of the most privileged abilities that a qualified researcher should master. However, it is difficult for new researchers (\eg{students}) to fully {grasp} this ability. It would be fascinating if we could train an intelligent agent to help people read and summarize papers, and perhaps even discover and exploit the potential knowledge clues to write novel papers. Although there have been existing works focusing on summarizing (\emph{i.e.}, reading) the knowledge in a given text or generating (\emph{i.e.}, writing) a text based on the given knowledge, the ability of simultaneously reading and writing is still under development. Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice. In this paper, we propose a Deep ReAder-Writer (DRAW) network, which consists of a \textit{Reader} that can extract knowledge graphs (KGs) from input paragraphs and discover potential knowledge, a graph-to-text \textit{Writer} that generates a novel paragraph, and a \textit{Reviewer} that reviews the generated paragraph from three different aspects. Extensive experiments show that our DRAW network outperforms considered baselines and several state-of-the-art methods on AGENDA and M-AGENDA datasets. Our code and supplementary are released at this https URL

Full PDF

HHow to Train Your Agent to Read and Write

Li Liu, , * Mengge He, ∗ Guanghui Xu, Mingkui Tan, , † Qi Wu School of Software Engineering, South China University of Technology, Pazhou Laboratory, University of Adelaide, Key Laboratory of Big Data and Intelligent Robot, Ministry of Education{seliushiya, semenggehe, sexuguanghui}@mail.scut.edu.cn, [email protected], [email protected]

Abstract

Reading and writing research papers is one of the most privi-leged abilities that a qualiﬁed researcher should master. How-ever, it is difﬁcult for new researchers ( e.g., students) to fullygrasp this ability. It would be fascinating if we could trainan intelligent agent to help people read and summarize pa-pers, and perhaps even discover and exploit the potentialknowledge clues to write novel papers. Although there havebeen existing works focusing on summarizing ( i.e. , reading)the knowledge in a given text or generating ( i.e. , writing) atext based on the given knowledge, the ability of simultane-ously reading and writing is still under development. Typi-cally, this requires an agent to fully understand the knowledgefrom the given text materials and generate correct and ﬂu-ent novel paragraphs, which is very challenging in practice.In this paper, we propose a Deep ReAder-Writer (DRAW)network, which consists of a

Reader that can extract knowl-edge graphs (KGs) from input paragraphs and discover poten-tial knowledge, a graph-to-text

Writer that generates a novelparagraph, and a

Reviewer that reviews the generated para-graph from three different aspects. Extensive experimentsshow that our DRAW network outperforms considered base-lines and several state-of-the-art methods on AGENDA andM-AGENDA datasets. Our code and supplementary are re-leased at https://github.com/menggehe/DRAW.

Currently, hundreds of papers are published online every dayeven on small topics. However, a study (Wang et al. 2019)shows that US scientists can only read 264 papers per yearon average. Thus, researchers are exhausted by followingthe sharply increased numbers of papers, much less to under-standing the research and coming up with new ideas to writenovel papers (Gopen and Ja 1990; Buenz 2019). In practice,writing novel papers requires not only the abilities of readingand reasoning but also the ability of creative thinking, whichis nontrivial for most fresh researchers (Xiao et al. 2020). Itwould be fantastic if an agent could help people, especially * Authors contributed equally. † Related Works

Write Knowledge GraphNew ParagraphScores

Figure 1: An intuitive understanding of our DRAW network.First, the DRAW network reads multiple related works anddiscovers potential knowledge among them. And then, itwrites a new paragraph based on knowledge graph. Last, itreviews the output and uses feedback rewards to improve thequality of writing.new researchers, to read and write. However, building suchan agent encounters several challenges.First, to understand multiple related works, the agent needsto capture complex logic in the related works, which is non-trivial. Several knowledge extraction methods (Min et al.2006; Gerber and Chai 2010; Yoshikawa et al. 2010) achieveit by identifying entities in the texts, extracting the relation-ships between these entities, and representing them as aknowledge graph (KG). However, they have trouble in dis-covering potential connections among these entities, whichhampers a comprehensive understanding of related works.Second, after generating a KG, the agent is then requiredto decode a ﬂuent novel paragraph from the KG. In practice,however, how to evaluate the quality of the generated textsaccurately is still an open problem. Existing methods (Koncel-Kedziorski et al. 2019; Wang et al. 2019) adopt the teacher-forcing scheme that aims to match the tokens in the generatedtexts to the tokens in the target texts. However, these methods a r X i v : . [ c s . C L ] J a n nly focus on token-level matching while ignoring sentence-level and graph-level evaluation of the generated texts.In this paper, we propose a method named Deep ReAder-Writer (DRAW). Our DRAW network is able to read multipletexts, discover potential knowledge, and then write a novelparagraph. From Figure 1, the DRAW network consists ofthree modules: i.e., Reader , Writer and

Reviewer . Speciﬁ-cally, the

Reader ﬁrst extracts KGs from the research textsand discovers potential knowledge to enrich the KGs. The

Reader considers the multi-hop neighborhood to predict newlinks among conceptual nodes. Then, the

Writer writes anovel paragraph to describe the main idea of the enrichedKGs using a graph attention network, which aggregates theglobal and local graph information. Inspired by the reviewprocess of research papers, we further propose a

Reviewer module to evaluate the quality of the generated paragraphsand return rewards as feedback signals to reﬁne the

Writer .To be speciﬁc, given a generated paragraph, the

Reviewer willoutput three feedback signals, including (1) a quality reward,which reﬂects the metric scores of the generated paragraph;(2) an adversarial reward, which denotes the probability ofthe generated paragraph passing the Turing test; and (3) analignment reward, which represents the matching score be-tween the generated paragraphs and the enriched KGs. In thisway, the

Writer is able to write better paragraphs that clearlyrepresent the key idea of the enriched KGs.In summary, our main contributions are threefold:• We propose a Deep ReAder-Writer (DRAW) network thatreads multiple research texts and then discovers potentialknowledge to write a novel paragraph covering the keyidea of the source inputs.• We propose a feedback mechanism to review whether thegenerated paragraph is consistent with the enriched KG,and whether the generated paragraph is human written,thereby greatly improving the quality of paragraph genera-tion.• Extensive experiments show that our

Writer - Reviewer leads to signiﬁcant improvements in the KGs-to-text gen-eration task and outperforms the state-of-the-art methods.

Automatic writing.

PaperRobot (Wang et al. 2019) per-forms as an automatic research assistant to incrementallywrite to chemical-related research datasets. It enriches KGsby predicting links of input papers’ KGs. According to agiven title, it then selects several entities that are related tothe title in enriched KGs to generate texts. However, Paper-Robot neglects to consider the multi-hop neighborhood topredict links, which is very important for capturing potentialrelationships. In addition, the generated texts do not closelyalign with the KGs. To address this, we use a graph attentionnetwork to consider the multi-hop neighborhood, capturingthe complex and hidden information that is inherently im-plicit in the neighborhood. Moreover, we design a

Reviewer to measure the quality of the generated text from different di-mensions to effectively align with the KGs. In particular, ourDRAW network is different from the multi-document sum-mary (Ling and Hui 2013), which compresses the lengthy document content into several relatively short paragraphs.We not only extract important knowledge but also discoverpotential knowledge from multiple paragraphs by predictinglinks and writing a novel paragraph.

Link prediction.

Some translation-based approaches (Bor-des et al. 2013; Zhen et al. 2014; Lin et al. 2015) are widelyused in link prediction but result in poor representation ability.Recently, CNN based models (Dettmers et al. 2018; Nguyenet al. 2018) have been proposed for relation prediction. Thesemethods only focus on the entity and its neighborhood whilenot considering the relationships among these nodes. Othermethods (Kipf and Welling 2017; Schlichtkrull et al. 2018)take the relationships among the entities and their 1-hopneighbors into consideration. However, they still omit theinformation from multi-hop neighborhood. Instead, we pro-pose a

Reader module to capture semantic information of themulti-hop neighborhood in the KG.

Graph-to-Text task.

Graph-to-Text is an active researcharea. Some works generate texts based on structured knowl-edge (Trisedya et al. 2018; Xu et al. 2018a; Feng et al. 2018),while several neural graph-to-text models use different en-coders based on GNN (Ribeiro, Gardent, and Gurevych2019; Zhijiang et al. 2019; Huang et al. 2020) and Trans-former (Vaswani et al. 2017) architectures to learn graphrepresentations. Koncel-Kedziorski et al. proposes a novelgraph transformer encoder, which leverages the topologicalstructure of KGs to generate texts. However, it ignores theglobal graph information, which is important for text genera-tion. To solve this, Ribeiro et al. introduce a novel architecturethat aggregates both global and local graph information togenerate texts. However, such an encoder-decoder frameworkpresents some problems such as word repetition and lackof diversity. To solve these issues, we propose a

Reviewer module to review the generated paragraphs and reﬁne thequality of paragraphs using feedback rewards. Our

Reviewer consists of three modules to review and evaluate whetherthe generated paragraphs are real and to align with the givenKGs, in order to improve the text generation ability.

In this paper, we focus on generating novel paragraphs viareading multiple AI-related paragraphs. To this end, we pro-pose a Deep ReAder-Writer (DRAW) network that consistsof three modules, namely

Reader , Writer , and

Reviewer asshown in Figure 2. To understand and sort out the textuallogic of given paragraphs, the

Reader ﬁrst ‘reads’ and extractsknowledge graphs (KGs) from them. And then, consideringthe multi-hop neighborhood, the

Reader predicts new linksbetween conceptual nodes, namely potential knowledge, toenrich the KGs. The

Writer adopts a graph encoder to encodethe rich semantic information in KGs, and delivers it to atext encoder to generate a novel paragraph. Inspired by theadversarial learning (Cao et al. 2019; Wang et al. 2018; Caoet al. 2020; Chen et al. 2020), we also devise a

Reviewer toevaluate the quality of the generated paragraph, which servesas a feedback signal to reﬁne the

Writer . We relate the detailsof these modules in the following sections.

C EDA F

H G

Graph

Encoder

Writer … Related Works

Text

Decoder

B C DAA FE F H G Knowledge Graph

Reader

Reward

Reviewer

Alignment

Module

Adversarial

Module

Quality

Module

New Paragraph

Reading and writing researchpapers …Recently, we consider the task ...In this paper, we propose a new …

In this paper, we consider the … … Figure 2: An overview of our Deep ReAder-Writer (DRAW) Network. The DRAW network consists of three modules, namely

Reader , Writer and

Reviewer . Given multiple related works, the

Reader ﬁrst extracts knowledge to construct initial knowledgegraphs (KGs) and performs link prediction to enrich KGs. Based on the enriched KGs, the

Writer captures global and localtopology information using a graph encoder and generates a novel paragraph with a text decoder. In particular, the

Reviewer employs three feedback modules to measure the quality of the generated paragraph.

To extract the textual logic from the given related paragraphs,we use the standard SciIE (Luan et al. 2018), a sciencedomain information extraction system to constrcu knowl-edge graphs Speciﬁcally, the output of the SciIE system isa list of triplets, where each triplet consists of two entitiesand the corresponding relation. The knowledge graph de-noted as G I = {V , R} , where V = { ¯ v i } Ni =1 is the node set, R = { ¯ r ij } Ni,j =1 is the edge set and N represents the numberof nodes. V and R represent the extracted entities and therelations, respectively. However, the initial knowledge graph G I does not exploit potential knowledge. To address this,we perform a link prediction to predict new links betweenentities based on the initial KGs. Link prediction.

Given KGs G I , we obtain the entity em-bedding ¯ v i ∈ R d and relation embedding ¯ r ij ∈ R d with twoseparated embedding layers, where d is the feature dimen-sion. Formally, given entity embedding ¯ v i , ¯ v j and relationembedding ¯ r ij between them, the triplet is represented by (¯ v i , ¯ r ij , ¯ v j ) . To aggregate more information, we introduceauxiliary edges between one entity and its n-hop neighbor-hood. For the entity and its n-hop neighborhood, we sum theembeddings of all the relations in the path between them asthe auxiliary relation embedding. We apply a linear trans-formation to update the entity representation (cid:101) v i ∈ R d by: (cid:101) v i = W [¯ v i , ¯ r ij , ¯ v j ] , (1)where W is a trainable parameter and [ · , · ] denotes the con-catenation operation. A particular entity v i may be involvedin multiple triplets and its neighborhood can be denoted as { (cid:101) v ki } , where (cid:101) v ki denotes the k -th neighborhood of the i -thentity. To learn the importance of each triplet for the entity,we apply a self-attention to calculate attention weights as follows: (cid:98) a k = exp (cid:0)(cid:101) v ki (cid:1)(cid:80) k exp (cid:0)(cid:101) v ki (cid:1) , (2)With the help of the attention weights, we update feature v i ∈ R d by fusing the information from its neighborhood, i.e., v i = W (cid:101) v i + σ (cid:16)(cid:88) k (cid:98) a k (cid:101) v ki (cid:17) , (3)where W is a trainable parameter and σ is the Sigmoidfunction.Based on the original relation feature ¯ r ij , we apply a lin-ear transformation to obtain the updated relation embedding r ij ∈ R d . After updating the node and relation embeddings,we need to determine whether there is a relationship betweentwo given entities. An intuitive way is to calculate the prob-ability for each triple. Following ConvKB (Nguyen et al.2018), we train a scoring function to perform the relationprediction as follows: s m = FC([ v i , r m , v j ] ∗ Ω ) , (4)where ∗ denotes a convolution operation, Ω ∈ R × is a setof convolution ﬁlters, and FC( · ) is a linear transformationlayer. Following (Nathani et al. 2019), we assign a score s m to the triplet ( v i , r m , v j ) in Eqn.(4), which indicates theprobability that the triplet holds. For each entity, we ﬁrsttraverse all entities and relationships to construct triples, andthen we select the triplet with the highest score as the newlink. In this way, the Reader can capture potential relationsbetween different nodes and derive a new graph G P . Finally,we denote the enriched knowledge graph as G = G I ∪ G P . Based on the enriched graph G with N entities, we proposea Writer to generate novel paragraphs, which consists of araph encoder and a text decoder. Speciﬁcally, the

Writer ﬁrst uses the graph encoder to extract the knowledge rep-resentations and then writes a new paragraph with the textdecoder (Vaswani et al. 2017).

Graph encoder.

A comprehensive understanding of a KG G is the ﬁrst step to generate the desired paragraph. However,it is difﬁcult to directly capture rich semantic information inthe knowledge graph G . To address this, we extract the knowl-edge representations within two sub-encoders, i.e., global-graph encoder and local-graph encoder. Following CGE-LW(Ribeiro et al. 2020), we integrate global context informationand local topology information to generate new paragraphs.To aggregate global context information, we ﬁrst concate-nate all of the node features v and feed them into the global-graph encoder Ψ as follows: [ (cid:98) v , . . . , (cid:98) v N ] = Ψ([ v , . . . , v N ]) , (5)where Ψ is a standard Transformer encoder (Vaswani et al.2017), which contains multi-head self-attention layers andfeed-forward networks. In the global-graph encoder, we treatthe knowledge graphs G as a fully connected graph withoutlabeled edges. Based on the self-attention mechanism, theglobal-graph encoder is suitable for discovering the globalcorrelation between nodes. Each node (cid:98) v i ∈ R d has the abilityto capture all nodes’ information.To better represent the interaction between nodes, we needto build local relations between each node and its neighbor-hood. However, the global-graph encoder does not explicitlyconsider such graph topology information. To address this,we use the local-graph encoder to model the local relations.For each node, we ﬁrst calculate attention weights for itsadjacent nodes since the different types of relationships haveconsiderable discrepancies in impact when fusing informa-tion. Based on the attention weights, we obtain the hiddennode features (cid:98) h by (cid:98) h i = (cid:88) j ∈N i a ij r ij (cid:98) v j , where (cid:98) r ij = ReLU( r ij W [ (cid:98) v i , (cid:98) v j ]) ,a ij = exp( (cid:98) r ij ) (cid:80) j ∈N i exp( (cid:98) r ij ) . (6)Here, W denotes the model parameters and (cid:98) h i denotes thehidden features which encode the local interaction betweenthe i -th node and its neighborhood. N i denotes the neigh-bourhood of the i -th node. We also perform the multi-headattention operation to learn structural information from dif-ferent perspectives. we employ a GRU (Cho et al. 2014) tomerge local information between different layers as follows: h i = GRU( (cid:98) v i , (cid:98) h i ) , (7)where the ﬁnal node representation h i ∈ R d . Text decoder.

Based on the node representations H = { h i } Ni =1 , we use the standard Transformer decoder (Vaswaniet al. 2017) to generate a novel paragraph τ with T words inan auto-regression manner. At each step t , the text decoder consumes the previously generated tokens as additional inputand outputs a probability p t over candidate vocabularies. Wetrain the Writer with supervised learning as follows: L SL = − T (cid:88) t =1 y t log( p t ) , (8)where y t is the ground-truth one-hot vector at step t andgenerates words by selecting the element with the highestscore at this step. In practice, the text decoder also can useother sequence generation models, such as LSTM (Hochreiterand Schmidhuber 1997) and so on. The encoder-decoder framework has made great progressin many sequence generation tasks, including text summa-rization and image captioning. Nevertheless, it suffers fromsome problems. For each training sample, such a frameworktends to use only one word as ground-truth at each generationstep, even if other candidate words are also reasonable. Thisleads to a lack of diversity in the generated text. Moreover,the language is so complex that it requires us to evaluate thequality of the generated paragraph from different dimensions,such as grammatical correctness, topic relevance, languageand logic coherence, etc. Inspired by the review process of aresearch paper, we propose a

Reviewer module to review thegenerated paragraph from different dimensions. The outputof

Reviewer can be used as an auxiliary training signal tooptimize the

Writer , which is similar to researchers furtherpolishing the paper based on reviews.Speciﬁcally, we design three feedback rewards in the

Re-viewer . First, we use the metric scores of the generated para-graph as a reward to meet the rules of these metrics. Second,we train a Turing-Test discriminator to determine whetherthe paragraph is generated by an agent or written by a human,which draws on the idea of adversarial training and requiresthe paragraph to conform to the natural language speciﬁ-cation. Third, we design an alignment module to align thegenerated paragraphs and the corresponding enriched knowl-edge graphs, which ensures the correctness and completenessof the generated texts. Different from teacher-forcing meth-ods, the

Reviewer focuses on sentence-level and graph-levelalignment. Given a generated paragraph, however, the aboveevaluation processes are non-differentiable. As discussed inSeqGAN (Yu et al. 2017), the discrete signals are limited inpassing the gradient update from the

Reviewer to the

Writer .To address this, we denote the outputs of

Reviewer as rewards R and maximize expectation rewards E [ · ] via reinforcementlearning. Formally, the goal of Reviewer can be representedby max E P ( τ ; θ ) [ R ( τ )] , where θ denotes the trainable pa-rameters of our model, and τ is the paragraph generated bythe Writer based on the generation probability P w.r.t. θ .Speciﬁcally, the reward function is denoted as R ( τ ) = R + λ AR R + λ MR R , (9)where R , R , and R correspond to the three modules ofthe Reviewer . λ AR and λ MR control the contribution ofthe corresponding reward. Following policy gradient meth-ods (Williams 1992; Schulman et al. 2017), we can solve thebove problem in batch training as follows: L RL = − E P ( τ ; θ ) [ R ( τ ) log P ( τ ; θ )] , ≈ − B B (cid:88) b =1 R ( τ ( b ) ) log P (cid:16) τ ( b ) ; θ (cid:17) , (10)where B is the training batch size. Now, we introduce thesereward modules in detail. Quality reward.

Given a generated paragraph τ , wecan calculate some quantitative metrics for it, such asBLEU(Papineni et al. 2002), METEOR (Denkowski andLavie 2014), CIDEr (Vedantam, Zitnick, and Parikh 2015), etc . Directly using these metrics as the training reward canboost the sentence generation quality. In this paper, we simplyadopt the BLEU score R = BLEU( τ ) as the reward sincethe BLEU score is one of the most popular automated andinexpensive metrics. In practice, the BLEU can be replacedwith any metric that needs to be optimized. Adversarial reward.

Based on a paragraph τ , this mod-ule acts as a discriminator to determine whether τ is manualannotation (Real) or generated by the machine (Fake). Follow-ing (Yu et al. 2017), we use Convolutional Neural Network(CNN) to extract text features since it can capture sequenceinformation and has shown exhibited high performance inthe complicated sequence classiﬁcation task (Zhang and Le-Cun 2015). Speciﬁcally, given a generated paragraph τ , weﬁrst concatenate the token embedding as the text representa-tion. We then use different numbers of kernels with differentwindow sizes to extract different features over the text repre-sentation and produce a new feature map. After applying amax-pooling operation, we perform a fully connected layerwith Sigmoid activation to output a probability, which de-notes the probability that the input text is real. The calculationcan be formulated as R = CNN( τ ) . Inspired by adversarialtraining (Cao et al. 2018), this module aims to minimize theperformance gap between humans and the Writer . Alignment reward.

A paragraph τ is supposed to align itsenriched KG G since τ is generated by Writer according tothe G . In this sense, we propose to compute the similaritybetween τ and G based on the attention mechanism. Givenan abstract τ with T words, we ﬁrst use Long Short-TermMemory (LSTM) to extract text representation C = { c t } ,where c t ∈ R d , t ∈ { , . . . , T } . Following AttnGAN (Xuet al. 2018b), we obtain the hidden representation as follows: q t = Softmax (cid:18) ( W Q c t )( W K H ) (cid:62) √ d (cid:19) W V H , (11)where W Q , W K , and W V are trainable parameters, √ d isa scaling factor and H ∈ R d × N are node features obtainedfrom the Writer . With the help of the self-attention mecha-nism (Vaswani et al. 2017), the hidden feature q t ∈ R d notonly fuses the text representations but also merges graph in-formation. Then, we calculate the cosine similarity as match-ing score R as follows: R = T (cid:88) t =1 q (cid:62) t c t (cid:107) q t (cid:107) (cid:107) c t (cid:107) . (12) Thus far, we can obtain the rewards R , R , and R fromabove the Reviewer modules. Finally, to train our DRAWnetwork, we deﬁne the overall training loss as follows: L = L SL + λ RL L RL , (13)where λ RL is a trade-off parameter. L SL trains the DRAWnetwork within supervised learning while L RL allows theDRAW network to explore diverse generation via reinforce-ment learning and evaluate the generation from multiple ori-entations. AGENDA dataset.

AGENDA is one of the most popularKGs-to-text datasets, which concludes 40,000 pair samplescollected from the proceedings of 12 top AI conferences.Each sample consists of a title, an abstract, and the corre-sponding KG, which is extracted by the SciIE system. TheKG is composed of recognized scientiﬁc terms and their rela-tionships. In particular, the types of scientiﬁc terms includeTask, Metric, Method, Material, and Other. The types of rela-tionships include Used-for, Conjunction, Feature-of, Part-of,Compare, Evaluate-for, and Hyponym-of.

M-AGENDA dataset.

To further demonstrate the effec-tiveness of our DRAW network, we create a new dataset,called M-AGENDA. Speciﬁcally, we ﬁrst calculate the co-sine similarity between each abstract and the others in theAGENDA dataset. We select two most-related instances foreach one and combine these three as a new data example inthe M-AGENDA dataset.

Implementation details.

Our DRAW network consists ofthree well-design modules, i.e., Reader , Writer and

Reviewer .We ﬁrst train our

Reader , Writer and

Reviewer on AGENDAdataset. Then, we use the trained

Reader and

Writer modelon the M-AGENDA to generate novel paragraphs. To speedup convergence early in training, we adopt different pretrain-ing strategies for each module. For the

Reader , we ﬁrst useTransE (Bordes et al. 2013) to train entity and relation embed-dings. We then aggregate information passed from a 2-hopneighborhood to update the embedding of each node. Follow-ing (Nathani et al. 2019), we use Adam optimization with aninitial learning rate of 0.1. For the

Writer , we pre-train for 30epochs with early stopping. Following (Ribeiro et al. 2020),we use Adam optimization with an initial learning rate of0.5. To ensure the generation effect, we set the maximumgeneration length to 430. For the

Reviewer , we pre-train theadversarial module with SGD optimization and initialize alearning rate of 0.001. When pre-training the graph encoderof the alignment module, we use the same model and pa-rameters of writer . In addition, we systematically adjust thevalues of λ AR and λ MR to conduct several ablation studies.We ﬁnd that the experimental results of different coefﬁcientcombinations ﬂuctuate only around 0.1, causing little effecton the results. Writer - Reviewer obtains the best results with λ AR = λ MR = 2 . We set the trade-off parameter λ RL = 1 .We implement our method with PyTorch.odel BLEU METEOR CIDErGraphWriter 14.44 18.80 28.30GraphWriter+RBS 15.17 19.59 -Graformer 17.33 21.43 -CGE-LW 18.01 22.34 33.06 Writer - Reviewer ( Ours ) Table 1: Quantitative evaluations of generation systems onthe AGENDA dataset (higher is better).Paragraph Turing Test ResultsHuman MachineWritten by Human 68% 32%Written by

DRAW

48% 52%Table 2: Quantitative results of Turing test.

Evaluation metrics.

To demonstrate the quality of the gen-erated paragraphs, we report both quantitative results andhuman study results. We divide our evaluation into two parts:KGs-to-text evaluation and overall performance evaluation.For KGs-to-text evaluations, we adopt three general quan-titative evaluation metrics, i.e.,

BLEU (Papineni et al. 2002),METEOR (Denkowski and Lavie 2014) and CIDEr (Vedan-tam, Zitnick, and Parikh 2015) to evaluate our

Writer - Reviewer . In addition, to demonstrate the realness of theparagraphs generated by our model, we also set up a Tur-ing test. Speciﬁcally, we randomly select 100 abstracts andshufﬂe them to ﬁnd an evaluation set, where half of the ab-stracts are written by authors and the rest are generated by our

Writer - Reviewer . After that, we test the turkers on AmazonMechanical Turk (AMT) to determine whether the paragraphsin the evaluation set are written by humans.For overall performance evaluation, we set up a humanstudy to rate the abstracts generated by DRAW network,CGE-LW and PaperRobot. For each model, we randomlyselect 50 generated paragraphs and score them in terms of‘grammar’, ‘informativeness’, and ‘coherence’ on AmazonMechanical Turk (AMT). Speciﬁcally, the metric ‘grammar’measures the paragraphs written in well-formed English. Themetric ‘informativeness’ denotes whether the paragraphsmake use of appropriate scientiﬁc terms. The metric ‘co-herence’ denotes that the generated text conforms to generalspeciﬁcations. For example, a complete abstract should in-clude a brief introduction to a task, describe the solution,analyze and discuss the results, and so on. Each metric de-scribed above, contains 10 levels, with rankings from 1 to 10(from bad to good).Following the relation prediction task (Nathani et al. 2019),we evaluate our link prediction method of

Reader on theproportion of correct entities in the top N ranks (Hits@N) forN=1,3, and 10.

To verify our model on KGs-to-text task, we compare our

Writer - Reviewer against several state-of-the-art models in-cluding GraphWriter (Koncel-Kedziorski et al. 2019), Graph-Writer+RBS (An 2019), Graformer (Schmitt et al. 2020) and Model BLEU METEOR CIDEr

Writer

Writer +Adversarial 19.37 23.87 39.30

Writer +Alignment 19.33 24.00 43.49

Writer +Quality 19.50 24.03 44.40

Writer - Reviewer ( Ours) 19.60 24.03 45.21

Table 3: Ablation study for modules used in the

Reviewer onthe AGENDA dataset.

Model Grammar Coherence InformativenessPaperRobot 5.11 4.95 5.01CGE-LW 6.77 6.29 6.57

DRAW (Ours) 7.63 6.83 7.10

Table 4: Automatic evaluations results (higher is better).CGE-LW (Ribeiro et al. 2020) on the AGENDA dataset.

Results.

We report the results of our method and othercompared models with respect to three quantitative evaluationmetrics in Table 1. As shown in Table 1, our

Writer - Reviewer achieves better performance than all the compared models inthree quantitative evaluation metrics. Speciﬁcally, our

Writer - Reviewer outperforms the state-of-the-art method CGE-LWby 1.6 points in BLEU, 1.7 points in METEOR and 12.2points in CIDEr. These results demonstrate the superiority ofour

Writer - Reviewer in the KGs-to-text task.In addition, we carry out a human evaluation to demon-strate the effectiveness of our

Writer - Reviewer . To be speciﬁc,for each paragraph in the evaluation set, we ask the humanto choose whether these paragraphs are written by human-authors. From these results in Table 2, nearly half of theparagraphs generated by our

Writer - Reviewer are reviewedas written by humans. More critically, 32% of the paragraphswritten by humans are chosen as written by the AI system.These results demonstrate that our

Writer - Reviewer can gen-erate realistic paragraphs similar to those written by humans.

Ablation studies in

Reviewer . To investigate the effect ofdifferent modules in

Reviewer , we conduct an ablation study.As shown in Table 3,

Writer combined with one of the mod-ules in

Reviewer arbitrarily obtains better performance than

Writer , which demonstrates the effectiveness of the modulesin

Reviewer . Writer combined with all the modules in

Re-viewer , namely

Writer - Reviewer , achieves best performance.

To show the effectiveness of our DRAW network, we con-duct experiments on the M-AGENDA dataset. Since the M-AGENDA dataset does not provide ground-truth, we conducthuman study instead of quantitative evaluations. Speciﬁcally,for each metric in the human study, we average the scores ofthe paragraphs rated by the humans as the ﬁnal score.

Results of DRAW.

We report the experimental results ofour DRAW network and other compared methods in Table 4.From these results, our DRAW network achieves the bestperformance in terms of ‘grammar’, ‘coherence’, and ‘in-formativeness’. Speciﬁcally, PaperRobot (Wang et al. 2019)nitial KGs entities, relations: (global scene-level contextual information, PART-OF, spatial context recurrentconvnet model) ; (wikipedia, USED-FOR, multilingual ner systems) ; (local image de-scriptors, CONJUNC-TION, spatial conﬁgurations) . . .PaperRobot In this paper we propose a novel approach for multilingual named entity recognition tasks . The proposedmethod is based on semantic similarity measure that can be used to improve word retrieval performance byusing wikipedia type of words from text documents and then build an efﬁcient query language model whichallows users with similar information between entities as clusters across different domains : part-of-speechtags are generated through each user ’s document representation ; our knowledge base system was evaluatedover state-of-the-art approaches trained object . . . [ covering 6 entities. ]CGE-LW in this paper ,. . . we propose a spatial context recurrent convnet model to incorporate global scene-levelcontextual information into a spatial context recurrent convnet model for object retrieval .. . . , and thecontextual information from candidate boxes is used for object retrieval. a positional language model thatcaptures contextual information from candidate boxes for object retrieval. the proposed system is evaluatedon the tac-kbp 2010 data,and the experimental results show that the proposed system can signiﬁcantlyimprove the entity linking performance. . . [ covering 21 entities. ]DRAW in this paper , we propose a novel approach to entity linking based on statistical language model-basedinformation retrieval , which exploits both local contexts and global world knowledge to improve theentity linking performance.. . . , we propose a spatial context recurrent convnet model to integrate globalcontext features with local image de-scriptors ,spatial conﬁgurations , and global scene-level contextualinformation into a spatial context recurrent convnet model. . . , and a recurrent network with local and globalinformation to guide the search for candidate boxes for object retrieval. . . [ covering 26 entities. ]Table 5: Example outputs of various models. To better visualize the generated text, we omit information irrelevant to thecomparisons. Repetitive words are represented in red and entities included in KGs are represented in orange. The potentialknowledge is represented in blue with the corresponding superscript.Method Hits@N@1 @3 @10PaperRobot 11.9 19.5 42.4 Our 36.8 46.0 56.1

Table 6: Accuracy of the link prediction on the M-AGENDAdataset. Hits@N values are in percentage.obtains poor performance due to the neglect of the topo-logical structure between entities. CGE-LW (Ribeiro et al.2020) takes advantage of the graph information effectivelyand achieves 6.77, 6.29, and 6.57 points in terms of threemetrics, but it also ignores the fact that the generated para-graphs are supposed to match the KGs. Different from themethods above, our DRAW network not only performs linkprediction with multi-hop information in the

Reader but alsomatches the graphs and the generated paragraphs, and thusachieves the best performance. More ablation experimentsabout

Reader can be found in the supplementary material.

Results of

Reader . As shown in Table 6, we report the ex-perimental results of the link prediction method of our

Reader and PaperRobot. Our method achieves the Hits@1, Hits@3,Hits@10 scores of 36.8, 46.0, and 56.1, outperforming thePaperRobot by 24.5, 26.5, and 13.9 points, respectively. Itdemonstrates the effectiveness of our link prediction method.

Visualization analysis.

As shown in Table 5, we visualizea generated paragraph of our DRAW network. More visual-ization results can be found in the supplementary material.We see that our DRAW network has the ability to cover moreentities (represented in orange), while PaperRobot mentionsless entities in the given KG. In addition, CGE-LW tends torepeat unrelated entities/sentences (represented in red). Withthe help of

Reviewer , the generated text of DRAW network isﬂuent and grammatically correct. Moreover, our DRAW net-work is able to discover the potential relationships betweenentities (represented in blue superscript.)

In this paper, we propose a Deep ReAder-Writer (DRAW) net-work that reads multiple AI-related abstracts and then writesa new paragraph to represent enriched knowledge combiningthe potential knowledge covering the topics mentioned in thesource abstracts. Inspired by the review process, we propose a

Reviewer to rate the quality of the generated texts from differ-ent dimensions, which serve as feedback signals to reﬁne ourDRAW network. Ablation experiments demonstrate the effec-tiveness of our method. Moreover,

Writer - Reviewer achievesstate-of-the-art results on KGs-to-text generation task. Interms of human study, some generations of our DRAW net-work successfully pass the Turing test and confuse the turkers.In future study, we will extend the DRAW network to write acomplete paper in an iterative manner and develop more tech-niques to discover novel ideas, such as creating new entities. cknowledgments

This work was partially supported by Key-Area Re-search and Development Program of Guangdong Province2018B010107001, National Natural Science Foundation ofChina (NSFC) 61836003 (key project), Program for Guang-dong Introducing Innovative and Entrepreneurial Teams2017ZT07X183, International Cooperation open Project ofState Key Laboratory of Subtropical Building Science, SouthChina University of Technology (2019ZA01), FundamentalResearch Funds for the Central Universities D2191240.

References

An, B. 2019. Repulsive Bayesian Sampling for DiversiﬁedAttention Modeling. In workshop of NeurIPS .Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; andYakhnenko, O. 2013. Translating Embeddings for ModelingMulti-relational Data. In

NeurIPS .Buenz, E. J. 2019. Essential elements for high-impact scien-tiﬁc writing.

Nature doi: 10.1038/d41586-019-00546-7.Cao, J.; Guo, Y.; Wu, Q.; Shen, C.; Huang, J.; and Tan, M.2018. Adversarial Learning with Local Coordinate Coding.In

ICML .Cao, J.; Guo, Y.; Wu, Q.; Shen, C.; Huang, J.; and Tan, M.2020. Improving Generative Adversarial Networks withLocal Coordinate Coding.

IEEE Transactions on PatternAnalysis and Machine Intelligence .Cao, J.; Mo, L.; Zhang, Y.; Jia, K.; Shen, C.; and Tan, M.2019. Multi-marginal wasserstein gan. In

NeurIPS .Chen, P.; Zhang, Y.; Tan, M.; Xiao, H.; Huang, D.; and Gan,C. 2020. Generating Visually Aligned Sound From Videos.

IEEE Trans. Image Process.

29: 8292–8302.Cho, K.; Merrienboer, B. V.; Çaglar Gülçehre; Bahdanau, D.;Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. LearningPhrase Representations using RNN Encoder-Decoder forStatistical Machine Translation. In

EMNLP .Denkowski, M. J.; and Lavie, A. 2014. Meteor Universal:Language Speciﬁc Translation Evaluation for Any TargetLanguage. In

ACL .Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018.Convolutional 2D Knowledge Graph Embeddings. In

AAAI .Feng, N.; Jinpeng, W.; Jin-Ge, Y.; Rong, P.; and Chin-Yew, L.2018. Operation-guided Neural Networks for High FidelityData-To-Text Generation. In

EMNLP .Gerber, M.; and Chai, J. 2010. Beyond NomBank: A Studyof Implicit Arguments for Nominal Predicates. In

ACL .Gopen, G. D.; and Ja, S. 1990. The Science of ScientiﬁcWriting.

American Scientist .Hochreiter, S.; and Schmidhuber, J. 1997. Long Short-termMemory.

Neural Computation .Huang, D.; Chen, P.; Zeng, R.; Du, Q.; Tan, M.; and Gan, C.2020. Location-Aware Graph Convolutional Networks forVideo Question Answering. In

AAAI , 11021–11028. Kipf, T.; and Welling, M. 2017. Semi-Supervised Classiﬁca-tion with Graph Convolutional Networks. In

ICLR .Koncel-Kedziorski, R.; Bekal, D.; Luan, Y.; Lapata, M.; andHajishirzi, H. 2019. Text Generation from Knowledge Graphswith Graph Transformers. In

NAACL-HLT .Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015. Learn-ing Entity and Relation Embeddings for Knowledge GraphCompletion. In

AAAI .Ling, F. U.; and Hui, Z. 2013. Multi-document summaryusing LDA and spectral clustering.

Computer Engineering &Applications .Luan, Y.; He, L.; Ostendorf, M.; and Hajishirzi, H. 2018.Multi-Task Identiﬁcation of Entities, Relations, and Coref-erence for Scientiﬁc Knowledge Graph Construction. In

EMNLP .Min, Z.; Jie, Z.; Jian, S.; and GuoDong, Z. 2006. A Compos-ite Kernel to Extract Relations between Entities with BothFlat and Structured Features. In

ACL .Nathani, D.; Chauhan, J.; Sharma, C.; and Kaul, M. 2019.Learning Attention-based Embeddings for Relation Predic-tion in Knowledge Graphs. In

ACL .Nguyen, D. Q.; Nguyen, T.; Nguyen, D. Q.; and Phung, D. Q.2018. A Novel Embedding Model for Knowledge BaseCompletion Based on Convolutional Neural Network. In

NAACL-HLT .Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.BLEU: A Method for Automatic Evaluation of MachineTranslation. In

ACL .Ribeiro, L.; Gardent, C.; and Gurevych, I. 2019. EnhancingAMR-to-Text Generation with Dual Graph Representations.In

EMNLP-IJCNLP .Ribeiro, L. F. R.; Zhang, Y.; Gardent, C.; and Gurevych, I.2020. Modeling Global and Local Node Contexts for TextGeneration from Knowledge Graphs.

Transactions of theAssociation for Computational Linguistics .Schlichtkrull, M.; Kipf, T.; Bloem, P.; Berg, R.; Titov, I.; andWelling, M. 2018. Modeling Relational Data with GraphConvolutional Networks. In

ESWC .Schmitt, M.; Ribeiro, L. F. R.; Dufter, P.; Gurevych, I.; andSchutze, H. 2020. Modeling Graph Structure via RelativePosition for Better Text Generation from Knowledge Graphs.

ArXiv abs/2006.09242.Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; andKlimov, O. 2017. Proximal Policy Optimization Algorithms.

ArXiv abs/1707.06347.Trisedya, B.; Jianzhong, Q.; Rui, Z.; and Wei, W. 2018. GTR-LSTM: A Triple Encoder for Sentence Generation from RDFData. In

ACL .Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,L.; Gomez, A. N.; Kaiser, L. u.; and Polosukhin, I. 2017.Attention is All you Need. In

NeurIPS .Vedantam, R.; Zitnick, C. L.; and Parikh, D. 2015. CIDEr:Consensus-based image description evaluation. In

CVPR .ang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang,F.; Xie, X.; and Guo, M. 2018. Graphgan: Graph representa-tion learning with generative adversarial nets. In

AAAI .Wang, Q.; Huang, L.; Jiang, Z.; Knight, K.; Ji, H.; Bansal,M.; and Luan, Y. 2019. PaperRobot: Incremental Draft Gen-eration of Scientiﬁc Ideas. In

ACL .Williams, R. J. 1992. Simple statistical gradient-followingalgorithms for connectionist reinforcement learning.

MachineLearning .Xiao, L.; Wang, L.; He, H.; and Jin, Y. 2020. Copy orRewrite: Hybrid Summarization with Hierarchical Reinforce-ment Learning. In

AAAI .Xu, K.; Wu, L.; guo Wang, Z.; Yu, M.; Chen, L.; and Sheinin,V. 2018a. SQL-to-Text Generation with Graph-to-SequenceModel. In

EMNLP .Xu, T.; Zhang, P.; Huang, Q.; Zhang, H.; Gan, Z.; Huang,X.; and He, X. 2018b. AttnGAN: Fine-Grained Text to Im-age Generation with Attentional Generative Adversarial Net-works. In

CVPR .Yoshikawa, K.; Riedel, S.; Hirao, T.; Asahara, M.; and Mat-sumoto, Y. 2010. Coreference based event-argument relationextraction on biomedical text.

Journal of Biomedical Seman-tics .Yu, L.; Zhang, W.; Wang, J.; and Yu, Y. 2017. SeqGAN:Sequence Generative Adversarial Nets with Policy Gradient.In

AAAI .Zhang, X.; and LeCun, Y. 2015. Text Understanding fromScratch.

ArXiv abs/1502.01710.Zhen, W.; Jianwen, Z.; Jianlin, F.; and Zheng, C. 2014.Knowledge Graph Embedding by Translating on Hyper-planes. In

AAAI .Zhijiang, G.; Yan, Z.; Zhiyang, T.; and Wei, L. 2019. DenselyConnected Graph Convolutional Networks for Graph-to-Sequence Learning.

Transactions of the Association forComputational Linguistics . References

NeurIPS .Buenz, E. J. 2019. Essential elements for high-impact scien-tiﬁc writing.

Nature doi: 10.1038/d41586-019-00546-7.Cao, J.; Guo, Y.; Wu, Q.; Shen, C.; Huang, J.; and Tan, M.2018. Adversarial Learning with Local Coordinate Coding.In

ICML .Cao, J.; Guo, Y.; Wu, Q.; Shen, C.; Huang, J.; and Tan, M.2020. Improving Generative Adversarial Networks withLocal Coordinate Coding.

IEEE Transactions on PatternAnalysis and Machine Intelligence .Cao, J.; Mo, L.; Zhang, Y.; Jia, K.; Shen, C.; and Tan, M.2019. Multi-marginal wasserstein gan. In

NeurIPS . Chen, P.; Zhang, Y.; Tan, M.; Xiao, H.; Huang, D.; and Gan,C. 2020. Generating Visually Aligned Sound From Videos.

IEEE Trans. Image Process.

EMNLP .Denkowski, M. J.; and Lavie, A. 2014. Meteor Universal:Language Speciﬁc Translation Evaluation for Any TargetLanguage. In

ACL .Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018.Convolutional 2D Knowledge Graph Embeddings. In

AAAI .Feng, N.; Jinpeng, W.; Jin-Ge, Y.; Rong, P.; and Chin-Yew, L.2018. Operation-guided Neural Networks for High FidelityData-To-Text Generation. In

EMNLP .Gerber, M.; and Chai, J. 2010. Beyond NomBank: A Studyof Implicit Arguments for Nominal Predicates. In

ACL .Gopen, G. D.; and Ja, S. 1990. The Science of ScientiﬁcWriting.

American Scientist .Hochreiter, S.; and Schmidhuber, J. 1997. Long Short-termMemory.

Neural Computation .Huang, D.; Chen, P.; Zeng, R.; Du, Q.; Tan, M.; and Gan, C.2020. Location-Aware Graph Convolutional Networks forVideo Question Answering. In

AAAI , 11021–11028.Kipf, T.; and Welling, M. 2017. Semi-Supervised Classiﬁca-tion with Graph Convolutional Networks. In

ICLR .Koncel-Kedziorski, R.; Bekal, D.; Luan, Y.; Lapata, M.; andHajishirzi, H. 2019. Text Generation from Knowledge Graphswith Graph Transformers. In

NAACL-HLT .Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015. Learn-ing Entity and Relation Embeddings for Knowledge GraphCompletion. In

AAAI .Ling, F. U.; and Hui, Z. 2013. Multi-document summaryusing LDA and spectral clustering.

EMNLP .Min, Z.; Jie, Z.; Jian, S.; and GuoDong, Z. 2006. A Compos-ite Kernel to Extract Relations between Entities with BothFlat and Structured Features. In

ACL .Nathani, D.; Chauhan, J.; Sharma, C.; and Kaul, M. 2019.Learning Attention-based Embeddings for Relation Predic-tion in Knowledge Graphs. In

ACL .Nguyen, D. Q.; Nguyen, T.; Nguyen, D. Q.; and Phung, D. Q.2018. A Novel Embedding Model for Knowledge BaseCompletion Based on Convolutional Neural Network. In

NAACL-HLT .Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.BLEU: A Method for Automatic Evaluation of MachineTranslation. In

ACL .ibeiro, L.; Gardent, C.; and Gurevych, I. 2019. EnhancingAMR-to-Text Generation with Dual Graph Representations.In

EMNLP-IJCNLP .Ribeiro, L. F. R.; Zhang, Y.; Gardent, C.; and Gurevych, I.2020. Modeling Global and Local Node Contexts for TextGeneration from Knowledge Graphs.

ESWC .Schmitt, M.; Ribeiro, L. F. R.; Dufter, P.; Gurevych, I.; andSchutze, H. 2020. Modeling Graph Structure via RelativePosition for Better Text Generation from Knowledge Graphs.

ArXiv abs/2006.09242.Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; andKlimov, O. 2017. Proximal Policy Optimization Algorithms.

ArXiv abs/1707.06347.Trisedya, B.; Jianzhong, Q.; Rui, Z.; and Wei, W. 2018. GTR-LSTM: A Triple Encoder for Sentence Generation from RDFData. In

ACL .Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones,L.; Gomez, A. N.; Kaiser, L. u.; and Polosukhin, I. 2017.Attention is All you Need. In

NeurIPS .Vedantam, R.; Zitnick, C. L.; and Parikh, D. 2015. CIDEr:Consensus-based image description evaluation. In

CVPR .Wang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang,F.; Xie, X.; and Guo, M. 2018. Graphgan: Graph representa-tion learning with generative adversarial nets. In