[PDF] Bootstrapping Multilingual AMR with Contextual Word Alignments

Abstract

We develop high performance multilingualAbstract Meaning Representation (AMR) sys-tems by projecting English AMR annotationsto other languages with weak supervision. Weachieve this goal by bootstrapping transformer-based multilingual word embeddings, in partic-ular those from cross-lingual RoBERTa (XLM-R large). We develop a novel technique forforeign-text-to-English AMR alignment, usingthe contextual word alignment between En-glish and foreign language tokens. This wordalignment is weakly supervised and relies onthe contextualized XLM-R word embeddings.We achieve a highly competitive performancethat surpasses the best published results forGerman, Italian, Spanish and Chinese.

Full PDF

BBootstrapping Multilingual AMR with Contextual Word Alignments

Janaki Sheth** ∗ Young-Suk Lee* Ram´on Fernandez Astudillo* Tahira Naseem*Radu Florian* Salim Roukos* Todd Ward* **University of Pennsylvania, Philadelphia, PA, USA *IBM Research, Yorktown Heights, NY, USA

[email protected], [email protected], tnaseem, raduf, roukos, [email protected]

Abstract

Abstract Meaning Representation graphs arerooted, labeled, directed, acyclic graphs repre-senting sentence-level semantics (Banarescu et al.,2013). In the example shown in Figure 1, the sen-tence

The boy wants to go is parsed into an AMRgraph. The nodes of the AMR graph representthe AMR concepts, which may include normal-ized surface symbols e.g. boy , Propbank frames(Kingsbury and Palmer, 2002) e.g. want-01 , go-02 as well as other AMR-speciﬁc constructs. Edgesin an AMR graph represent the relations betweenconcepts. In this example :arg0 , :arg1 correspondto standard roles of Propbank.One distinctive aspect of AMR annotation is thelack of explicit alignments between nodes in thegraph and words in the sentences. Since such align-ments are essential for training many of present-day AMR parsers, there have been various effortsto link the AMR concepts to their correspondingspan of words (Flanigan et al., 2014; Pourdamghani ∗ This research was done during an internship at IBMResearch AI.

Figure 1: AMR graph for

The boy wants to go andits German translation

Der Junge will gehen . Implicitalignments between the English text and AMR con-cepts are denoted by dotted arrows. Explicit alignmentsbetween English and German texts are denoted by solidarrows. et al., 2014; Lyu and Titov, 2018; Chen and Palmer,2017). A signiﬁcant emphasis of this paper is onderiving these alignments for multilingual AMRparsers.Even though by nature AMR is biased towardsEnglish, recent work has evaluated the potential ofAMR to work as an interlingua. Hajiˇc et al. (2014)and Xue et al. (2014) categorize and propose reﬁne-ments for divergences in the annotation betweenEnglish and Chinese as well as Czech AMRs. An-chiˆeta and Pardo (2018) import the correspond-ing AMR annotation for each sentence from theEnglish annotated corpus and revisit the annota-tion to adapt it to Portuguese. However, Damonteand Cohen (2018) show that it may be possible touse the original AMR annotations devised for En-glish as representation for equivalent sentences inother languages without any modiﬁcation despitethe translation divergence. This deﬁnes the prob-lem of multilingual AMR parsing that we seek toaddress in this paper - given a sentence in a for-eign language, recover the AMR graph originallydesigned for its English translation. We implement a r X i v : . [ c s . C L ] F e b ultilingual AMR parsers for German, Spanish,Italian and Chinese.In this paper we propose that transformer-basedmultilingual word embeddings can be a useful toolfor addressing the problem of multilingual AMRparsing. Besides using contextual word embed-dings as input token embeddings, we leverage themfor annotation projection , where existing AMR an-notations for English are projected to a target lan-guage by using contextual word alignments. In ourexperiments, we employ XLM-RoBerta large (Con-neau et al., 2019) as the multilingual pre-trainedtransformer model. We show that our proposedprocedure achieves competitive results as some ofthe classical methods for text-to-AMR alignment.Furthermore, such a procedure is easily scalable tothe 100 languages that XLM-R is trained on.We also combine different techniques for con-cept alignments and AMR parser training whichsigniﬁcantly improve performance over the basemodels. For concept alignment, we combine theproposed contextual word alignments with previ-ously established alignment techniques utilizingmatching rules tailored to AMR as well as machinetranslation aligners (Flanigan et al., 2014; Pour-damghani et al., 2014). For AMR parser training,we pre-train an AMR parser on the treebanks of dif-ferent languages simultaneously and subsequentlyﬁnetune on each language. This is analogous to thetechniques used for silver data pre-training (Kon-stas et al., 2017; van Noord and Bos, 2017) inAMR parsing and multi-lingual pre-training (Aha-roni et al., 2019) in machine translation.Finally, we conduct a detailed error analysis ofthe multilingual AMR parsing. One of the major er-rors we have found involves synonymous concepts,which share the same meaning as the original con-cepts in English, but differ in spellings. While thiserror is mainly caused by the fact that the multilin-gual word embeddings bridge non-English inputtokens to English concepts, it also highlights thehighly lexical nature of Smatch scoring (Cai andKnight, 2013) which does not take synonymousconcepts into consideration. We also elaborateupon error analysis of the direct comparison be-tween our proposed annotation projection methodusing contextual word alignment and a previousbaseline, using fast align.The rest of the paper is organized as follows: InSection 2, we discuss related work. In Section 3,we present our main proposal on annotation projec- tion based on contextual word alignments. In Sec-tion 4, we describe various combination approachesthat improve the multilingual parser performancessigniﬁcantly. These include combining word-to-concept alignments, using multi-lingual treebanksand combining human-annotated and synthetic tree-banks. In Section 5, we discuss experimental re-sults. In Sections 6 and 7, we present detailed erroranalyses. We conclude the paper in Section 8. Multilingual AMR.

There have been signiﬁcantadvances in AMR parsing for languages other thanEnglish. Previous studies (Hajiˇc et al., 2014; Xueet al., 2014; Migueles-Abraira et al., 2018; Sobre-villa Cabezudo and Pardo, 2019) investigated AMRannotations for a variety of different languagessuch as Chinese, Czech, Spanish and Brazilian Por-tuguese. Vanderwende et al. (2015) automaticallyparse the logical representation for sentences inSpanish, Italian, German and Japanese, which isthen converted to AMR using a small set of rules.While much of this work, along with studiessuch as Li et al. (2016); Anchiˆeta and Pardo (2018),produces AMR graphs whose nodes were labeledwith words from the target language, Damonte andCohen (2018) developed AMR parsers for Englishand used parallel corpora for annotation projec-tion to train Italian, Spanish, German, and Chineseparsers that recover the AMR graph originally de-signed for the English translation. Their main re-sults showed that the new parsers can overcomecertain structural differences between languages.Similar to Damonte and Cohen (2018), we alsotrain multilingual AMR parsers by projecting En-glish AMR annotation to target foreign languages(German, Spanish, Italian and Chinese), but wedepart from their approach in the speciﬁcs of theannotation projection by exploring contextual wordalignments directly derived from multilingual con-textualized word embeddings. While both proce-dures utilize parallel corpora, the annotation pro-jection of Damonte and Cohen (2018) requires ad-ditional supervised training of their statistical wordaligner. Our proposed contextualized word align-ment is however unsupervised in nature. Alter-natively, a recent study by Blloshmi et al. (2020)showed that one may in fact not need alignment-based parsers for cross-lingual AMR, rather mod-elling concept identiﬁcation as a seq2seq problem.In this paper, we will compare our results to bothamonte and Cohen (2018) and Blloshmi et al.(2020).

Word vector alignment techniques.

Tradi-tional word alignment methods often use parallelcorpora and IBM alignment models (Brown et al.,1990, 1993) as well as improved versions (Och andNey, 2003; Dyer et al., 2013). More recently, therehave been an advent of techniques that align vectorrepresentation of words from varying levels of su-pervision (Ruder et al., 2019). Often word vectorsare learned independently for each language andthen a mapping from source language vectors totarget language vectors with a bilingual dictionaryis developed (Mikolov et al., 2013; Smith et al.,2017; Artetxe et al., 2017). To reduce the needfor bilingual supervision, the iterative method ofstarting from a minimal seed dictionary and alter-nating with learning the linear map was employedby a recent body of work (Conneau et al., 2018;Schuster et al., 2019; Artetxe et al., 2018).The work most similar to ours is Cao et al. (2020)where the authors obtain contextual embeddingalignments from multilingual BERT (Devlin et al.,2018; Pires et al., 2019) and subsequently improvethe alignments via ﬁnetuning using supervised par-allel corpora. Our contextual word alignment be-tween two parallel sentences may be thought ofas an adaptation of their contextual word retrievaltask. However, we refrain from any ﬁnetuningof the contextual embeddings and show that thecontextual word alignments from the off-the-shelfXLM-R model achieves results competitive to theword alignments by fast-align (see Damonte andCohen (2018)). This suggests the potential for in-expensive, massive scaling of AMR parsing up to100 languages on which XLM-R is trained.

We adopt a transition-based parsing approachfor AMR parsing following (Ballesteros and Al-Onaizan, 2017; Naseem et al., 2019; Fernandez As-tudillo et al., 2020). These produce an AMR graph g from an input sentence s by predicting instead anaction sequence a from s as a sequence to sequenceproblem. This action sequence applied to a statemachine M produces then the desired target graphas g = M ( a, s ) . Transition-based parsers requirethe action sequence for each graph in the trainingdata. This is determined by a rule-based oracle a = O ( g, s ) which relies on external word-to-nodealignments. In all the subsequent experiments we will use the oracle and action set from (Fernan-dez Astudillo et al., 2020). In order to train AMR parsers in a non-Englishlanguage, we use the annotation projection methodto leverage existing English AMR annotation andovercome resource shortage in the target language.First, the English text is aligned to correspondingAMR concepts using both rule-based JAMR aligner(Flanigan et al., 2014) and a IBM model typealigner (Pourdamghani et al., 2014). The latter willhenceforth be referred to as the EM aligner. Giventhe English text-to-AMR concept alignments, wethen project these to the target language using wordalignment. In the following subsection we describein the proposed word alignment method, called contextual word alignment , which is trained in aweakly supervised manner.

Given two languages, we align word pairs withinparallel sentences if their vector representations de-rived from the underlying multilingual pre-trainedmodel are similar according to cosine distance. Asvector representation we use the average of all 24layers of the XLM-R large contextual embeddings.We will refer to this average as the word’s contex-tual embedding henceforth for simplicity.More precisely, suppose we have two parallelsentences - E = e , e , e , ..., e M in English and F = f , f , f , ..., f N in the target language. Wewill use r to represent the pre-trained multilingualmodel such that r ( S ) i is the contextual embeddingfor the i th word in sentence S . Then a word e i ∈ E is contextually word aligned to f j if and only ifthe cosine similarity score between their word em-beddings is the highest. Thus we deﬁne the cor-responding contextual alignment function χ ( f j | e i ) as, χ ( f j | e i ) = argmax ≤ j ≤| F | cos ( r ( E ) i , r ( F ) j ) . (1)Similarly, performing the same procedure in thereverse direction we have, χ ( e i | f j ) = argmax ≤ i ≤| E | cos ( r ( F ) j , r ( E ) i ) (2)While these methods can be noisy, by only keep-ing word pairs in their intersection i.e. χ ( E | F ) ∩ χ ( F | E ) , one can derive the intersection cosinealignment approach which gives us a word-aligneddataset with low coverage but high accuracy. igure 2: Annotation projection is achieved usingJAMR and EM aligners for English text-to-AMR con-cept alignment and contextual word alignment betweentokens of the source (English) and target languages. As an example, the following are sentencesfrom our German and English training datasets: E: Establishing models in industrial Innova-tion F: Etablierung von Modellen in der industriellenInnovationTheir contextual word alignments are, χ ( F | E ) = [( e , f ) , ( e , f ) , ( e , f ) , ( e , f ) , ( e , f )] χ ( E | F ) = [( f , e ) , ( f , e ) , ( f , e ) , ( f , e ) , ( f , e ) , ( f , e ) , ( f , e )] χ ( F | E ) ∩ χ ( E | F ) = [( e , f ) , ( e , f ) , ( e , f ) , ( e , f )] Figure 2 pictorially illustrates our complete an-notation projection method using the contextualword alignment χ ( F | E ) . English tokens and AMRconcepts are aligned using JAMR and EM aligners.The resulting AMR annotation augmented with En-glish word-to-concept alignments is then projectedonto the given target language using contextualword embeddings. Henceforth, for brevity we willat times refer to this approach as A.P. We apply three types of combination techniquesto the multilingual AMR parsers, trained by pro-jecting English annotations using contextual wordalignments derived from the multilingual contex-tual word embeddings, each of which improves theparser performance signiﬁcantly.

One such technique is to combine the contextualword alignment based A.P. with the baseline word-to-concept alignment which aligns the target to-kens directly to AMR concepts using JAMR andEM aligners. Since the EM aligner is an unsuper-vised method, it can be directly applied to the targetlanguage tokens and English AMR concepts. How-

Figure 3: Illustration of the EM, JAMR + A.P. com-bination alignment: ﬁrst align target tokens to AMRconcepts using JAMR+EM aligners with any remain-ing concepts then aligned using the annotation projec-tion method proposed in Figure 2. ever, we note that this baseline alignment approachgives incomplete coverage (87 % concepts alignedto German, 88 % to Italian and 91 % to Spanish to-kens). Thus, we supplement this by aligning theremaining concepts using the A.P. of Figure 2.For example, suppose we have as before twoparallel sentences - E = e , ..., e M in English and F = f , ..., f N in the target language, as well asAMR concepts N = n , ..., n L . Then one of ourproposed foreign text-to-AMR concept combina-tion alignment procedures EA ( f i | n j ) (see Figure3) is deﬁned as, EA ( f i | n j ) = AP ( BA ( f i | n j )) (3)where BA ( f i | n j ) represents that the j th conceptis aligned to the i th token in F using the base-line aligner BA . If for any concept n j ∈ N , BA ( f i | n j ) = None, we use annotation projectionto align it where AP ( f i | n j ) is given by, χ ( f i | e k ) ∧ BA ( e k | n j ) ⇒ AP ( f i | n j ) (4)We also experiment with other such alignments,in particular by using the intersection of cosinealignment ( χ ( F | E ) ∩ χ ( E | F ) ) as the contextualword alignment. In this case, EA ( f i | n j ) = max AP ( BA (i AP ( f i | n j ))) (5)wherein, ( χ ( f i | e k ) ∩ χ ( e k | f i )) ∧ BA ( e k | n j ) ⇒ i AP ( f i | n j ) (6)As before, ∀ n j ∈ N where i AP ( f i | n j ) = Nonewe align it using the baseline aligner BA ( f i | n j ) .For any further remaining unaligned concepts, weemploy max AP ( f i | n j ) which can be described as: max( χ ( f i | e k ) , χ ( e k | f i )) ∧ BA ( e k | n j ) ⇒ max AP ( f i | n j ) (7)That is, we pick the uni-directional contextual wordalignment with the higher score and project theAMR annotation accordingly. .2 Multilingual treebank combination In addition to training the parser on the treebankof each language - derived from English treebankvia annotation projection - we also experiment withcombining all the target language treebanks to cre-ate a single multilingual treebank. We notice thatpre-training an AMR parser on this multilingualtreebank with subsequent ﬁnetuning on the tree-bank of each language, improves performance overthe parser trained only on each individual treebank.

We create a synthetic AMR corpus by parsing 85kunlabeled sentences from the context portion ofSQuAD-2.0. The resulting synthetic AMR graphsare ﬁltered as per the procedure in (Lee et al.,2020) and combined with the AMR-2.0 trainingset (LDC2017T10), to produce an expanded

AMR-2.0 + SQuAD training dataset of 94k sentences. Wethen project annotations of this expanded Englishtreebank onto each of the target languages, andtrain the corresponding target language parser. Weobserve that despite the lower quality of the syn-thetic AMRs as compared to their human-annotatedcounterparts, their inclusion in the training set sig-niﬁcantly improves parser performance.

For our experiments, we use the stack-Transformermodel (Fernandez Astudillo et al., 2020) as ourAMR parser. The stack-Transformer is a transitionbased parser with a modiﬁed Transformer archi-tecture to encode the parser state. It uses a crossentropy loss function and has hyper-parameterssimilar to those of machine translation describedin (Vaswani et al., 2017). We use a beam size of to decode our models and evaluate them usingSmatch scores (Cai and Knight, 2013). Model per-formance values in this manuscript are an averageover the best performing models across randomseeds. Lastly, the input to the parser - the vectorrepresentation of each word - is obtained by aver-aging over not only all 24 layers of the pre-trainedXLM-R large contextual embeddings but also overconstituent wordpieces within each word.For all four languages - German, Spanish, Ital-ian and Chinese - we experiment on AMR1.0 https://github.com/IBM/transition-amr-parser (LDC2015E86). For the ﬁrst three we also experi-ment on AMR-2.0 (LDC2017T10). Results fromthe former are compared to Damonte and Cohen(2018) and from the latter to Blloshmi et al. (2020).Details of our training, dev and test sets are givenin Table1. To train each target language parser,we ﬁrst translate the input sentences of AMR-2.0and AMR-1.0 with Watson Language Translator. This creates the supervised parallel corpus whichwe then use for our unsupervised annotation pro-jection via contextual word alignment. We alsoalign target language tokens directly to AMR con-cepts using JAMR and EM aligners for baselinesystem evaluation and for combination alignments.We select the best performing models using the de-vset. Finally, for our best models, we report resultsusing the machine as well as human translations(LDC2020T07) of the test sets.

Our ﬁrst baseline is zero-shot learning, where wetrain on the English dataset but test on a foreignlanguage dev-set (Baseline I). The reason behindthis experiment is to test the ability of the XLM-Rcontextual word embeddings to capture the mean-ing of the given token irrespective of the underlyinglanguage. Note that it is only for this experimentthat languages for the train and dev sets differ. Inanother set of experiments we align the target lan-guage tokens directly to the AMR concepts onlyusing the JAMR and EM aligners (Baseline II).Lastly, we also test the annotation projection proce-dure of Damonte and Cohen (2018). Note thatwhile the previous authors use fast align (Dyeret al., 2013) for word alignment between the par-allel data and only JAMR aligner for the Englishtext-to-AMR alignment, in Baseline III we have uti-lized fast align in conjunction with both JAMR andEM aligners (for English text-to-AMR alignment)for improved performance.

Table 2 compares our different proposed ap-proaches to the three baseline methods using theAMR2.0 and AMR1.0 datasets. We see that ourproposed approach - annotation projection withcontextual word alignment, in this case using χ ( F | E ) - shows fairly competitive results with Word segmentation is applied to the Chinese raw texts formodel training and testing. ata set Experiment Number of sentences Number of tokens DE ES IT ZHTrain set AMR2.0 LDC 36k 677k 694k 654kAMR2.0 LDC + synAMR 94k 2.1m 2.2m 2.1mAMR1.0 LDC 10k 222k 240k 227k 195kDevelopment set All experiments 1368 30k 32k 31k 26kTest set All experiments 1371 31k 33k 32k 27k

Table 1: Details of our dataset

Model AMR2.0 AMR1.0

DE ES IT DE ES IT ZHBaseline I (zero-shot) 39.0 39.6 41.0 37.4 38.8 39.3 33.4Baseline II 61.4 66.2 68.3 57.2 60.3 60.7 55.4Baseline III 63.8 68.7 68.6 56.3 60.8 61.0 54.7Annotation Projection (A.P) 61.9 67.7 66.8 55.7 60.7 60.5 46.5EM,JAMR+A.P 63.9 68.7 69.8 57.7 62.3 62.5 55.8Intersect A.P+EM,JAMR+ max (A.P) 64.2 69.1 68.7EM,JAMR+A.P (Multilingual) 64.6 69.2 70.4

EM,JAMR+A.P (synAMR)

Table 2: Dev set Smatch for AMR2.0 and AMR1.0.

Model Machine translation Human translation

DE ES IT ZH DE ES IT ZHDamonte and Cohen (2018) 39 42 43 35Baseline I (zero-shot) 37.1 37.99 38.5 31.8 36.3 37.6 37.4 30.2Baseline II 56.1 58.94 59.7 53.3 53.6 57.8 56.8 48.3Baseline III 55.1 59.24 59.0 53.1 52.7 57.9 57.3 48.1Annotation Projection (A.P) 54.9 58.9 59.4 44.6 52.7 57.7 57.0 41.4EM,JAMR + A.P 56.4 60.6 61.3 54.0 53.6 59.2 58.6 48.3EM,JAMR + A.P (Multilingual)

Table 3: Test set Smatch for AMR1.0.

Model Machine translation Human translation

DE ES IT DE ES IT ZHBlloshmi et al. (2020) 53 58 58.1 43.1EM, JAMR + A.P (Multilingual) 63.8 67.7 69.0 59.9 66.0 65.7EM, JAMR + A.P (synAMR)

Table 4: Test set Smatch for AMR2.0. hose of Baseline III for the target languages ofGerman, Italian and Spanish, especially when ap-plied to the smaller corpus of AMR1.0. This isremarkable considering our method requires noadditional training and can be easily generalizedfor zero-shot learning on all different languagesthat XLM-R was pretrained on. We then trainseveral parsers using our suggested combinationapproaches. The ﬁrst such method comprises ofboth the EM, JAMR + A.P aligners (see Eq. 3).In a different approach, we use the intersectioncosine word alignment based annotation projec-tion ( i.e χ ( F | E ) ∩ χ ( E | F ) ). Since this leaves manyAMR concepts unaligned, we follow it by align-ing concepts using the baseline JAMR and EMaligners. Any leftover unaligned concepts are thenaligned using max( χ ( E | F ) , χ ( F | E )) (Eq. 5). Inanother set of experiments, we pre-train a parseron a multilingual treebank, where the train set isa combination of the LDC treebank in all targetlanguages. The parser is then ﬁnetuned on eachindividual language. We surmise that such an ex-periment will give us a truly multilingual parsercapable of successfully decoding all the target lan-guages. Its strength is evident in its performance,it outperforms all our baseline approaches - in thecase of AMR1.0 dev set by at least 1.4 points. Fi-nally, in the last two experiments on AMR2.0 wetrain on the language-speciﬁc LDC + SQuaD trainset. We see that this gives us our best performingparsers, where the training data is aligned using acombination (EM, JAMR + A.P) alignment.We test a subset of the AMR2.0 and all of theAMR1.0 models on corresponding test sets. Theresults are shown in Tables 3 and 4. For AMR1.0,while all of our models including the baselines out-perform previously published results, the best per-forming model is the parser which was trained onmultilingual data and whose training input text wasaligned to its AMR concepts using the combina-tion of EM, JAMR and A.P aligners. For AMR2.0,models trained on the LDC + SQuAD dataset out-perform those trained on multilingual data. Bothof these outperform the recently published work ofBlloshmi et al. (2020). We note that the parser performs better on themachine translated test data than on the humantranslated data. This should be attributed to the We did not run experiments with LDC + SQuAD dataseton AMR1.0 since our primary reason for running experimentson AMR1.0 was to more directly be able to compare ourresults to (Damonte and Cohen, 2018)

Figure 4: Histogram of different kinds of errors training and testing condition mismatch of the hu-man translated test data since all models are trainedon machine translated training data. For instance,the out-of-vocabulary (oov) ratio of the humantranslated test data is consistently higher than thatof the machine translated test data. For example,for AMR1.0 the oov ratio of human translated testdata vs. machine translated test data is 10.2% vs.9% for German, 7.3% vs. 6.8% for Spanish, 8.1%vs. 7.6% for Italian and 7.6% vs. 5.5% for Chinese.

We carried out an error analysis of 56 Germansentences parsed by the best performing modeltrained on the combination of AMR2.0 and SQuADtraining data. Statistics of the various errors aredepicted in Figure 4. Top 5 most frequent errorsinclude (i) introduction of synonymous concepts,(ii) missing concepts, (iii) incorrect roles, (iv) targettokens in AMR concepts, (v) incorrect parsing ofmulti-sentence as an instance of conjunction.

The most common error we encounter is synony-mous AMR concepts, as shown in Figure 5. Com-paring the expected graph (top) to the parsed ver-sion (bottom), we note that concept previous is syn-onymized to past . While this error is mainly causedby the fact that the multilingual word embeddingsbridge non-English input tokens to English con-cepts, it also highlights the highly lexical natureof Smatch scoring (Cai and Knight, 2013) whichdoes not take synonymous concepts into considera-tion. Given that AMR is supposed to represent thecore meaning of a sentence regardless of its syntac-tic and morphological variations, Smatch scoringshould be able to capture lexical variations such assynonymous concepts. n this environment, what’s wrong if theycriticize the previous stupefying propagandaa bit?(w / wrong-02:ARG1 (a2 / amr-unknown):ARG2 (c / criticize-01:ARG0 (t / they):ARG1 (p / propaganda:time (p2 / previous):ARG1-of (s / stupefy-01)):degree (b / bit)):location (e / environment:mod (t2 / this)))Was ist in dieser Umgebung falsch, wenn siedie bisherige stupeftende Propaganda einbisschen kritisieren?(w / wrong-02:ARG1 (c / criticize-01:ARG0 (t2 / they):ARG1 (p2 / propaganda:time (p / past)):degree (b / bit)):ARG2 (a / amr-unknown):location (e / environment:mod (t / this)))

Figure 5: The gold AMR (top) and the parsed AMR(bottom) for a German sentence exemplifying errors:synonymous concept ( previous vs. past ), missingconcept (concept stupefy-01 is missing in the parsedAMR), incorrect roles (the two arguments, :ARG1 and :ARG2 , of wrong-02 are swapped in the parsed AMR).

In critical moments, we are all descendants ofYan emperor and Huang emperor.(d / descend-01:ARG0 (w / we:mod (a / all)):source (a2 / and:op1 (p / person:name (n / name:op1 "Yan"):ARG0-of (h / have-org-role-91:ARG2 (e / emperor))):op2 (p2 / person:name (n2 / name :op1 "Huang"):ARG0-of h)):time (m / moment:ARG1-of (c / critical-02)))In kritischen Momenten sind wir alle Nachfahrendes Yan Kaisers und Huang Kaisers.(d / descend-01:ARG0 (w / we:mod (a / all)):ARG1 (a2 / and:op1 (p / person:name (n / name:op1 "Yan":op2 "Kaisers")):op2 (p2 / person:name (n2 / name:op1 "Huang":op2 "Kaisers"))):time (m / moment:ARG1-of (c / critical-02)))

Figure 6: The gold AMR (top) and the parsed AMR(bottom) for a German sentence illustrating incorrectroles ( :source is replaced by :ARG1 in the parsedAMR) and incorrect identiﬁcation of the target token

Kaisers as a named entity.

Some concepts are missing in the parsed AMR,such as stupefy-01 in Figure 5. The parser also in-correctly identiﬁes relations between concepts. InFigure 5, arguments

ARG1 and

ARG2 for concept wrong-02 are swapped. In Figure 6, the relation :source is replaced by frame argument

ARG1 . Another frequent error includes incorrect parsing ofmulti-sentence as an instance of conjunction, espe-cially when sentences are demarcated by commas.Note that the multi-sentence errors are not speciﬁcto multilingual parsing and occur frequently whenparsing English input sentences as well. This multi-sentence error is mostly caused by the ambiguityof commas, which can subsume various semanticsdepending on the contexts across languages.

Some target tokens may legitimately be realized inthe gold AMR, especially when the target tokensare named entities, e.g.

Frankfurt, Anna, Noah, etc .This often leads to errors in the parsed AMR whena target token is incorrectly recognized as a namedentity. In Figure 6, German token

Kaisers is incor-rectly parsed as part of named entities

Yan Kaisers and

Huang Kaisers . The failure to capture the cor-rect concept emperor for the German token

Kaisers leads to a subsequent error of not reifying the roleto have-org-role-91 , evident in the comparison ofthe parsed AMR with the gold AMRs. Other errors include lack of stemming in the targetlanguage, such as

Kaisers in Figure 6. Stemmingerrors are mostly caused by the fact that we havenot incorporated target language stemmers whereaswe have incorporated spacy for English. Some er-rors are caused by machine translation. Englishfragmentary input taking a look is translated to Se-hen Sie sich , which is then incorrectly parsed as imperative sentence. Nominal target language to-kens often fail to invoke predicates. Given the inputin English “cultural tyranny in the cloak of nation-alism”, tyranny invokes the predicate tyrannize-01 .Its German counterpart

Tyrannei , however, fails to https://spacy.io/ ontextual Fast AlignAlignment German 23.47 20.52Italian 29.40 29.30Spanish 28.81 26.69

Table 5: AMR1.0 parser performance on negations interms of Smatch. Fast align is compared with the pro-posed contextual alignment for different languages. invoke the predicate in “kulturellen Tyrannei imMantel des Nationalismus”.

We compared the annotation projection forAMR1.0 between fast align and the contextualalignment. As noted in Table 3 they perform com-parably for German, Italian and Spanish. How-ever, on detailed analysis we notice that annotationprojection using contextualized alignments has agreater coverage in terms of foreign text-to-AMRalignments compared to fast align (eg. for German,contextual alignment A.P. gives 99.95% coveragein comparison to 97.47%.). This is likely due tothe fact that fast align is based on an IBM align-ment model, which relies on expected counts ofalignment pairs and uses additional alignment con-straints. Contextualized alignment relies on theunrestricted pairing by cosine distance of the XLM-R contextual word embeddings of the input tokens.Given an English token, the contextualized align-ment necessarily aligns it to a foreign languageword. Furthermore, since embeddings are contex-tual and pre-trained with large amounts of data,they are robust to non frequent alignment pairs.The difference between contextualized align-ment and fast align for their coverage is most no-ticeable for compounds. A German counterpartof English non – tariff is nichttarif¨are . Whilecontextualized alignment aligns nichttarif¨are tonon, which is subsequently aligned to the concept“–” for polarity, fast align leaves nichttarif¨are un-aligned. Such difference is evidenced in the parserperformance on negations realized in diverse mor-phologies. Comparing the AMR1.0 parser perfor-mance on negations between fast align (Baseline IIIin Table 3) and the contextualized alignment (A.Pin Table 3), we ﬁnd that contextualized alignmentconsistently outperforms fast align across the threeEuropean target languages, as shown in Table 5.

In this paper we propose to use transformer-basedmultilingual word embeddings for annotation pro-jection of AMR annotations. We show that ourproposed procedure achieves competitive results assome of the classical methods for text-to-AMRalignment. We apply combination techniquesto concept alignments and AMR parser training,which signiﬁcantly improve performance over thebase models. We also provide a detailed error anal-ysis of the multilingual AMR parsing.Given pre-trained transformer-based multilin-gual word embeddings, contextual word alignmentproves to be a useful avenue for overcoming dif-ferences amongst languages and addressing themultilingual AMR problem with weak supervision.Moreover, our annotation projection procedure notonly achieves a highly competitive performancefor German, Spanish, Italian and Chinese but alsopermits zero-shot learning to other languages in-cluded in the training set of the underlying XLM-Rmultilingual transformer.Future work may include diversifying input textsusing AMR2text (Mager et al., 2020) generationwhich can address the difference in results betweenmachine translated and human translated test data.The potential of the AMR parser to overcome trans-lation divergence also points to its utility in anend-to-end multilingual translation system, bypass-ing the need for supervised parallel corpora formachine translation system training.

Acknowledgement

We thank the anonymous reviewers for helpful sug-gestions. We also thank Revanth Reddy and JasonFurmanek for their varied inputs.

References

Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019.Massively multilingual neural machine translation.In

Proceedings of the 2019 Conference of the NorthAmerican Chapter of the Association for Compu-tational Linguistics: Human Language Technolo-gies, Volume 1 (Long and Short Papers) , pages3874–3884, Minneapolis, Minnesota. Associationfor Computational Linguistics.Rafael Anchiˆeta and Thiago Pardo. 2018. TowardsAMR-BR: A SemBank for Brazilian Portuguese lan-guage. In

Proceedings of the Eleventh InternationalConference on Language Resources and Evaluation(LREC 2018) , Miyazaki, Japan. European LanguageResources Association (ELRA).ikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017.Learning bilingual word embeddings with (almost)no bilingual data. In

Proceedings of the 55th AnnualMeeting of the Association for Computational Lin-guistics (Volume 1: Long Papers) , pages 451–462,Vancouver, Canada. Association for ComputationalLinguistics.Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018.A robust self-learning method for fully unsupervisedcross-lingual mappings of word embeddings. In

Pro-ceedings of the 56th Annual Meeting of the Associa-tion for Computational Linguistics (Volume 1: LongPapers) , pages 789–798, Melbourne, Australia. As-sociation for Computational Linguistics.Miguel Ballesteros and Yaser Al-Onaizan. 2017. AMRparsing using stack-LSTMs. In

Proceedings of the2017 Conference on Empirical Methods in Natu-ral Language Processing , pages 1269–1275, Copen-hagen, Denmark. Association for ComputationalLinguistics.Laura Banarescu, Claire Bonial, Shu Cai, MadalinaGeorgescu, Kira Grifﬁtt, Ulf Hermjakob, KevinKnight, Philipp Koehn, Martha Palmer, and NathanSchneider. 2013. Abstract Meaning Representationfor sembanking. In

Proceedings of the 7th Linguis-tic Annotation Workshop and Interoperability withDiscourse , pages 178–186, Soﬁa, Bulgaria. Associa-tion for Computational Linguistics.Rexhina Blloshmi, Rocco Tripodi, and Roberto Navigli.2020. XL-AMR: Enabling cross-lingual AMR pars-ing with transfer learning techniques. In

Proceed-ings of the 2020 Conference on Empirical Methodsin Natural Language Processing (EMNLP) , pages2487–2500, Online. Association for ComputationalLinguistics.Peter Brown, John Cocke, Stephen Della Pietra, VicentDella Pietra, Fredrick Jelinek, John Lafferty, RobertMercer, and Paul Roossin. 1990. A statistical ap-proach to machine translation.

Computational Lin-guistics, Volume 16, Number 2, June 1990 .Peter Brown, Stephen Della Pietra, Vicent Della Pietra,and Robert Mercer. 1993. The mathematics of sta-tistical machine translation: Parameter estimation.

Computational Linguistics, Volume 19, Number 2 .Shu Cai and Kevin Knight. 2013. Smatch: an evalua-tion metric for semantic feature structures. In

Pro-ceedings of the 51st Annual Meeting of the Associa-tion for Computational Linguistics (Volume 2: ShortPapers) , pages 748–752, Soﬁa, Bulgaria. Associa-tion for Computational Linguistics.Steven Cao, Nikita Kitaev, and Dan Klein. 2020. Mul-tilingual alignment of contextual word representa-tions. In

Proceedings of the 8th International con-ference on learning representations .Wei-Te Chen and Martha Palmer. 2017. UnsupervisedAMR-dependency parse alignment. In

Proceedings of the 15th Conference of the European Chapterof the Association for Computational Linguistics:Volume 1, Long Papers , pages 558–567, Valencia,Spain. Association for Computational Linguistics.Alexis Conneau, Kartikay Khandelwal, Naman Goyal,Vishrav Chaudhary, Guillaume Wenzek, FranciscoGuzm´an, Edouard Grave, Myle Ott, Luke Zettle-moyer, and Veselin Stoyanov. 2019. Unsupervisedcross-lingual representation learning at scale. arXivpreprint arXiv:1911.02116 .Alexis Conneau, Guillaume Lample, Marc’AurelioRanzato, Ludovic Denoyer, and Herve Jegou. 2018.Word translation without parallel data. In

Proceed-ings of the 6th International conference on learningrepresentations .Marco Damonte and Shay B. Cohen. 2018. Cross-lingual Abstract Meaning Representation parsing.In

Proceedings of the 2018 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,Volume 1 (Long Papers) , pages 1146–1155, NewOrleans, Louisiana. Association for ComputationalLinguistics.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. Bert: Pre-training of deepbidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805 .Chris Dyer, Victor Chahuneau, and Noah A. Smith.2013. A simple, fast, and effective reparameter-ization of IBM model 2. In

Proceedings of the2013 Conference of the North American Chapter ofthe Association for Computational Linguistics: Hu-man Language Technologies , pages 644–648, At-lanta, Georgia. Association for Computational Lin-guistics.Ramon Fernandez Astudillo, Miguel Ballesteros,Tahira Naseem, Austin Blodgett, and Radu Flo-rian. 2020. Transition-based parsing with stack-transformers. In

Findings of the EMNLP2020 (toappear) .Jeffrey Flanigan, Sam Thomson, Jaime Carbonell,Chris Dyer, and Noah A. Smith. 2014. A discrim-inative graph-based parser for the Abstract Mean-ing Representation. In

Proceedings of the 52nd An-nual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers) , pages 1426–1436, Baltimore, Maryland. Association for Compu-tational Linguistics.Jan Hajiˇc, Ondˇrej Bojar, and Zdeˇnka Ureˇsov´a. 2014.Comparing Czech and English AMRs. In

Pro-ceedings of Workshop on Lexical and GrammaticalResources for Language Processing , pages 55–64,Dublin, Ireland. Association for Computational Lin-guistics and Dublin City University.Paul R Kingsbury and Martha Palmer. 2002. From tree-bank to propbank. In

LREC , pages 1989–1993. Cite-seer.oannis Konstas, Srinivasan Iyer, Mark Yatskar, YejinChoi, and Luke Zettlemoyer. 2017. Neural AMR:Sequence-to-sequence models for parsing and gener-ation. In

Proceedings of the 55th Annual Meeting ofthe Association for Computational Linguistics (Vol-ume 1: Long Papers) , pages 146–157, Vancouver,Canada. Association for Computational Linguistics.Young-Suk Lee, Ramon Fernandez Astudillo, TahiraNaseem, Revanth Gangi Reddy, Radu Florian, andSalim Roukos. 2020. Pushing the limits of amrparsing with self-learning. In

Findings of theEMNLP2020 (to appear) .Bin Li, Yuan Wen, Weiguang Qu, Lijun Bu, and Ni-anwen Xue. 2016. Annotating the little prince withChinese AMRs. In

Proceedings of the 10th Linguis-tic Annotation Workshop held in conjunction withACL 2016 (LAW-X 2016) , pages 7–15, Berlin, Ger-many. Association for Computational Linguistics.Chunchuan Lyu and Ivan Titov. 2018. AMR parsing asgraph prediction with latent alignment. In

Proceed-ings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers) , pages 397–407, Melbourne, Australia. Asso-ciation for Computational Linguistics.Manuel Mager, Ram´on Fernandez Astudillo, TahiraNaseem, Md Arafat Sultan, Young-Suk Lee, RaduFlorian, and Salim Roukos. 2020. Gpt-too: Alanguage-model-ﬁrst approach for amr-to-text gen-eration. In

Proceedings of the 58th Annual Meet-ing of the Association for Computational Linguis-tics , Seattle, USA. Association for ComputationalLinguistics.Noelia Migueles-Abraira, Rodrigo Agerri, and ArantzaDiaz de Ilarraza. 2018. Annotating Abstract Mean-ing Representations for Spanish. In

Proceedings ofthe Eleventh International Conference on LanguageResources and Evaluation (LREC 2018) , Miyazaki,Japan. European Language Resources Association(ELRA).Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013.Exploiting similarities among languages for ma-chine translation. arXiv preprint arXiv:1309.4168 .Tahira Naseem, Abhishek Shah, Hui Wan, Radu Flo-rian, Salim Roukos, and Miguel Ballesteros. 2019.Rewarding Smatch: Transition-based AMR parsingwith reinforcement learning. In

Proceedings of the57th Annual Meeting of the Association for Com-putational Linguistics , pages 4586–4592, Florence,Italy. Association for Computational Linguistics.Rik van Noord and Johan Bos. 2017. Neural seman-tic parsing by character-based translation: Experi-ments with abstract meaning representations. arXivpreprint arXiv:1705.09980 .Franz Josef Och and Hermann Ney. 2003. A systematiccomparison of various statistical alignment models.

Computational linguistics , 29(1):19–51. Telmo Pires, Eva Schlinger, and Dan Garrette. 2019.How multilingua is multilingual bert? arXivpreprint arXiv:1906.01502 .Nima Pourdamghani, Yang Gao, Ulf Hermjakob, andKevin Knight. 2014. Aligning English stringswith Abstract Meaning Representation graphs. In

Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP) ,pages 425–429, Doha, Qatar. Association for Com-putational Linguistics.Sebastian Ruder, Iva Vuli´c, and Søgaar Andersd. 2019.A survey of cross-lingual word embedding models.

J. Artif. Int. Res. , page 569–630.Tal Schuster, Ori Ram, Regina Barzilay, and AmirGloberson. 2019. Cross-lingual alignment of con-textual word embeddings, with applications to zero-shot dependency parsing. In

Proceedings of the2019 Conference of the North American Chapter ofthe Association for Computational Linguistics: Hu-man Language Technologies, Volume 1 (Long andShort Papers) , pages 1599–1613, Minneapolis, Min-nesota. Association for Computational Linguistics.Samuel Smith, David Turban, Steven Hamblin, andNils Hamerla. 2017. Ofﬂine bilingual word vectors,orthogonal transformations and the inverted softmax.In

Proceedings of the 5th International conferenceon learning representations .Marco Antonio Sobrevilla Cabezudo and Thiago Pardo.2019. Towards a general Abstract Meaning Repre-sentation corpus for Brazilian Portuguese. In

Pro-ceedings of the 13th Linguistic Annotation Work-shop , pages 236–244, Florence, Italy. Associationfor Computational Linguistics.Lucy Vanderwende, Arul Menezes, and Chris Quirk.2015. An AMR parser for English, French, German,Spanish and Japanese and a new AMR-annotatedcorpus. In

Proceedings of the 2015 Conference ofthe North American Chapter of the Association forComputational Linguistics: Demonstrations , pages26–30, Denver, Colorado. Association for Computa-tional Linguistics.Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. arXiv preprint arXiv:1706.03762 .Nianwen Xue, Ondˇrej Bojar, Jan Hajiˇc, Martha Palmer,Zdeˇnka Ureˇsov´a, and Xiuhong Zhang. 2014. Notan interlingua, but close: Comparison of EnglishAMRs to Chinese and Czech. In