UniParse: A universal graph-based parsing toolkit
UUniParse: A universal graph-based parsing toolkit
Daniel Varab and
Natalie Schluter
IT UniversityCopenhagen, Denmark { djam,natschluter } @itu.dk Abstract
This paper describes the design and use ofthe graph-based parsing framework and toolkitUniParse, released as an open-source pythonsoftware package. UniParse as a frameworknovelly streamlines research prototyping, de-velopment and evaluation of graph-based de-pendency parsing architectures. UniParsedoes this by enabling highly efficient, suffi-ciently independent, easily readable, and eas-ily extensible implementations for all depen-dency parser components. We distribute thetoolkit with ready-made configurations as re-implementations of all current state-of-the-artfirst-order graph-based parsers, including evenmore efficient Cython implementations of bothencoders and decoders, as well as the requiredspecialised loss functions.
Motivation.
While graph-based dependencyparsers are simple interfaces, extensible andmodular implementations for sustainable parserresearch and development have to date beenseverely lacking in the research community. Pars-ing research generally centres around particularcomponents of parsers in isolation, for example,a novel decoding algorithm, the encoding of newfeatures, a new learning algorithm, etc. However,due to perceived gains in performance or due tothe lack of foresight in writing sustainable code,these components are rarely implemented mod-ularly or with a view to extensibility. Both withprevious sparse feature graph-based dependencyparsers (such as McDonald and Pereira (2006)’sMST parser), as well as with recent state-of-the-art neural parsers (specifically Kiperwasserand Goldberg (2016) or Dozat and Manning(2017)’s neural parsers), implementations ofparser components are generally hard-coupledwith each other. Additionally, dependency parsers are oftenevaluated using mildly differently interpreted met-rics, different data preprocessing choices, and overdifferent target hardware. The persistently inad-equate setting for parser architecture comparisonentails that comparing and thereby exploring theeffect of different design choices often becomesimpossible to properly gauge.With UniParse, we provide a flexible, highlyexpressive, scientific framework for easy, low-barrier of entry, highly modular, highly efficientdevelopment and fair benchmarking of graph-based dependency parsing architectures. Addi-tionally, the framework is pre-configured with cur-rent state-of-the-art first-order sparse and neuralgraph-based parser implementations.
Novel contributions. • We align sparse feature and neural researchin graph-based dependency parsing to a com-mon terminology . With this shared termi-nology we develop a unified framework forthe UniParse toolkit to rapidly prototype newparsers and easily compare performance toprevious work. • Prototyping is now rapid due to modular-ity : parser components must be developed inisolation, with no resulting loss in efficiency.For example, measuring the empirical perfor-mance of a new decoder no longer requiresimplementing an encoder too, and investigat-ing the synergy between a learning strategyand a decoder no longer requires more than aflag or calling a library function. • Preprocessing is now made explicit withinits own component and is thereby adequatelyisolated and portable. • The evaluation module is now easy to readand fully specified.
We specify the subtledifferences in computing UAS and LAS from a r X i v : . [ c s . C L ] J u l revious literature and have implementedthese in UniParse in an explicit way. • To the best of our knowledge, UniParse isthe first attempt at unifying existing de-pendency parsers to the same code base .Moreover, UniParse appears to be the firstattempt to enable state-of-the-art first-ordersparse-feature dependency parsing within aPython environment.We make the parser freely available under a GNUGeneral Public License. A demonstration on how to easily integrate tworecent, more complex, embedding components–ELMo embeddings (Peters et al., 2018) and TCNrepresentations (Bai et al., 2018)–is also madeavailable. Traditionally, a graph-based dependency parserconsists of three components. An encoder Γ , aset of parameters λ , and a decoder h . The pos-sible dependency relations between all words ofa sentence S can modeled as a complete directedgraph G S where the words are nodes and arcs arethe relations. To sub-sets of arcs from G S (calledfactors), Γ associates a d -dimensional feature vec-tor, its encoding . The set of parameters λ are thenused to produce scores from the constructed fea-ture vectors according to some learning architec-ture. These parameters are optimised over tree-banks. Lastly the decoder h is some maximumspanning tree algorithm with input G S and scoresfor factors of G S given by λ ; it outputs a well-formed dependency tree, which is the raw outputof a dependency model.Recent work on neural dependency parserslearns factor embeddings discriminatively along-side the parameters used for scoring. The result isthat Γ and λ of dependency parsers fuse togetherinto a union of parameters. Thus, in this workwe fold the notion of encoding into the parameterspace. Now for the neural models, all parametersare trainable, whereas for sparse-feature models,the encodings of sub-sets of arcs are non-trainable.So the unified terminology addresses only param-eters λ and a decoder h . https://github.com/ITUnlp/UniParse https://github.com/danielvarab/UniParse-extensions We provide two abstractions to implementinggraph-based dependency parsers. First, our de-scriptive high-level approach focuses on expres-siveness, enabling models to be described in just afew lines of code by providing an interface wherethe required code is minimal, only a means to con-figure design choices. Second, as an alternativeto the high-level abstraction we emphasise thatparser definition is nothing more than a compo-sition of pre-configured low-level modular imple-mentations. With this we invite cherry pickingof the included implementations of optimised de-coders, data preprocessors, evaluation module andmore. We now briefly overview the basic use ofthe unified API and list the central low-level mod-ule implementations included with the UniParsetoolkit.
Elementary usage.
For ease of use we providea high-level class to encapsulate all componentsof a parser. Its use results in a significant reduc-tion in amount of code required to implement aparser and counters unwanted boilerplate code. Itprovides default best-practice configurations forall included components, while enabling customimplementation whenever needed so long as it iscallable and adheres to the framework’s functiondefinition of the specific component. The mini-mum requirements with the use of this interfaceare: decoder, loss function, optimiser, and batchstrategy. In Figure 1 is an example implementa-tion of Kiperwasser and Goldberg (2016)’s neuralparser in only a few lines. The full list of possiblearguments along with their interfaces can be foundin the toolkit documentation.
Vocabulary.
This module preprocesses aCoNLL-U formatted dataset and provides amapper and lookup from tokens to identifiers,with support for alignment according to pre-trained word embeddings. Text preprocessingstrategies have significant impact on NLP modelperformance. Despite this, little effort has put intodescribing such techniques in recent literature,which obfuscates where a model’s contributionactually lies. In the UniParse toolkit, we haveincluded implementations for recently employedtechniques within parsing for cleaning andpreprocessing during the tokenisation stage.
Data Provider.
This module organises the to-kenised data into batches for efficient learning ac- vocab = Vocabulary() vocab = vocab.fit(train) parser = CustomParser() model = Model(parser, decoder="eisner", loss="hinge", optimizer="adam", strategy="bucket", vocab=vocab) model.train(train, dev, epochs=30, batch_size=32) trees = model.run(test) Algorithm en ud en ptb sents/s
Eisner (generic) 96.35 479.1 ∼ ∼ ∼ ∼ Figure 1, Listing 1 & Table 1: (Right code snippet) Implementation of Kiperwasser and Goldberg (2016)’s neural parser inonly a few lines using UniParse.(Right table and left figure) Seconds a decoder takes to decode an entire dataset, given a set of scores. Score matrix entries aregenerated uniformly on [0 , . The random generated data has an impact on CLE since worst-case performance depends on thesorting bottleneck; the figure demonstrates this by the increasingly broad standard deviation band. Experiments are run on anUbuntu machine with an Intel Xeon E5-2660, 2.60GHz CPU. cording to the user-specified arguments. We pro-vide several implementations for different batch-ing strategies. This includes (1) batching by sen-tence length (bucketing), (2) fixed-size batchingwith padding, and (3) scaled padded batchingthrough approximate clustering (Dozat and Man-ning, 2017). Decoders.
We include optimised Cython imple-mentations of first-order decoders with the toolkit(as well as Python versions of these for com-parison), including both our implementations and“generic” implementations: Eisner’s algorithm(Eisner, 1996) and Chu-Liu-Edmonds (CLE) (Chuand Liu, 1965; Edmonds, 1967; Zwick, 2013). Wecompare Cython implementations in Table 1 andFigure 1 over randomised score input. Note thatour implementations are significantly faster. Evaluation.
Unlabeled attachment score (UAS)and labeled attachment score (LAS) are central This latter strategy is not explained in the paper but maybe observed from the published TensorFlow implementation.For a description, we refer to the toolkit’s README. Available from the Lisbon Machine LearningSummer School’s public github repository https://github.com/LxMLS/lxmls-toolkit/blob/1bdc382e509d24b24f581c1e1d78728c9e739169/lxmls/parsing/dependency_decoder.py dependency parser performance metrics, measur-ing unlabeled and labeled arc accuracy respec-tively with UAS = and LAS = . Unfortunately, there are also anumber unreported preprocessing choices preced-ing the application of these metrics, which rendersdirect comparison of parser performance in the lit-erature futile, regardless of how well-motivatedthese preprocessing choices are. These are gen-erally discovered by manually screening the codeimplementations when these implementations aremade available to the research community. Twoimportant variations found in state-the-art parserevaluation are the following.1. Punctuation removal.
Arcs incoming to anypunctuation are sometimes removed. More-over, the definition of punctuation is not uni-versally shared. We provide a clear Python im-plementation for these metrics with and with-out punctuation arc deletion before applica-tion, where the definition of punctuation isclear: punctuation refers to tokens that consistof characters complying to the Unicode punctu-ation standard. This is the strategy employed arser configurations Dataset UAS n.p.original LAS n.p.original UASn.p. LASn.p. UASw.p. LASw.p.Kiperwasser and Goldberg en ud — — 87.71 84.83 86.80 85.12(2016) en ptb 93.32 91.2 93.14 91.57 92.56 91.17da — — 83.72 79.49 83.24 79.62Dozat and Manning en ud — — 91.47 89.38 90.74 89.01(2017) en ptb 95.74 95.74 95.43 94.06 94.91 93.70da — — 87.84 84.99 87.42 84.98MSTparser en ud — — 75.55 66.25 73.47 65.20(2006) + extensions en ptb — — 76.07 64.67 74.00 63.60da — — 68.80 55.30 67.17 55.52 Table 2 : UAS/LAS for included parser configurations. We provide results with (w.p.) and without (n.p) punctuation. For theEnglish universal dependencies (UD) dataset we exclude the github repository suffix
EWT . Regarding (Dozat and Manning,2017), despite having access to the published TensorFlow code of we never observed scores exceed 95.58. Scores for neuralparsers are averages of 10 runs for the (Kiperwasser and Goldberg, 2016) reimplementation and 3 runs for the (Dozat andManning, 2017) reimplementation–this difference in number of runs reflects running time of the corresponding parsers. by the widely used Perl evaluation script, whichto our knowledge, originates from the CoNLL2006 and 2007 shared tasks. We infer thisfrom references in (Buchholz and Marsi, 2006).2.
Label prefixing.
Some arc labels are “com-posite”, their components separated by a colon.An example from the English Universal Depen-dencies data set is the label obl:tmod . Theofficial CoNLL 2017 shared-task evaluationscript allows partial matching of labels, forexample matching to the non-language-specificlabel prefix obl within the language-specificlabel obl:tmod for full points. We includethis variant in UniParse’s evaluation module. Callbacks.
We include a number of useful call-back utilities, such as a Tensorboard logger anda patience mechanism for early stopping togetherwith a model saver over iterations. Loss Functions.
State-of-the-art sparse featureparsers (and UniParse’s included specification forthis) evade direct loss computation (for example,using insights by Crammer and Singer (2003)),directly computing parameter adjustments fromfeature vectors. To train neural parser modelswe formalise a function interface and provide aset of common loss functions. The possibilitiesfor loss in graph-based neural dependency mod-els have not been greatly explored. Rather thantypical loss, which is calculated over all predic-tions, in parsing loss has been computed over onlya subset of the predicted score matrix. We im-pose a loss function to adhere to the type defini- https://depparse.uvt.nl/SoftwarePage.html http://universaldependencies.org/conll17/baseline.html https://github.com/tensorflow/tensorboard tion loss = f (scores , y p , y g ) , where scores is thescore tensor produced by the neural model, y p anoptional predicted tree, and y g the gold tree. Included parser configurations.
We includethree state-of-the-art first-order dependency parserimplementations as example configurations ofUniParse: McDonald and Pereira (2006)’s MSTsparse feature parser reimplementation , andKiperwasser and Goldberg (2016) and Dozat andManning (2017)’s respective graph-based neuralparsers. Experiments are carried out on Englishand Danish: the Penn Treebank (Marcus et al.,1994) (en ptb, training on sections 2-21, devel-opment on section 22 and testing on section 23),converted to dependency format following the de-fault configuration of the Standford Dependencyconverter (version > = In this paper, we have described the design andusage of UniParse, a high-level un-opinionatedframework and toolkit that supports both feature-based models with on-line learning techniques, aswell as recent neural architectures trained throughbackpropagation. We have presented the frame-work as answer to a long-standing need for highlyefficient, easily extensible, and, most of all, di-rectly comparable graph-based dependency pars-ing research. Note that this MST parser implementation consists of arestricted feature set and is only a first-order parser, as proofof concept. eferences
S. Bai, J. Zico Kolter, and V. Koltun. 2018. An Em-pirical Evaluation of Generic Convolutional and Re-current Networks for Sequence Modeling.
ArXiv e-prints .Sabine Buchholz and Erwin Marsi. 2006. Conll-xshared task on multilingual dependency parsing. In
In Proceedings of CoNLL , pages 149–164. Associa-tion for Computational Linguistics.Y.J. Chu and T.H. Liu. 1965. On the shortest arbores-cence of a directed graph.
Sci. Sinica , 14:13961400.Koby Crammer and Yoram Singer. 2003. Ultracon-servative online algorithms for multiclass problems.
Journal of Machine Learning Research , 3:951–991.Timothy Dozat and Christopher M. Manning. 2017.Deep biaffine attention for neural dependency pars-ing. In
Proceedings of ICLR .J. Edmonds. 1967. Optimum branchings.
J. Res. Nat.Bur. Standards , 71B:233240.Jason M. Eisner. 1996. Three new probabilistic modelsfor dependency parsing: An exploration. In
Pro-ceedings of the 16th Conference on ComputationalLinguistics - Volume 1 , COLING ’96, pages 340–345, Stroudsburg, PA, USA. Association for Com-putational Linguistics.Eliyahu Kiperwasser and Yoav Goldberg. 2016. Sim-ple and accurate dependency parsing using bidirec-tional lstm feature representations.
Transactions ofthe ACL , 4:313–327.Mitchell Marcus, Grace Kim, Mary AnnMarcinkiewicz, Robert MacIntyre, Ann Bies,Mark Ferguson, Karen Katz, and Britta Schas-berger. 1994. The penn treebank: Annotatingpredicate argument structure. In
Proceedings ofthe Workshop on Human Language Technology ,HLT ’94, pages 114–119, Stroudsburg, PA, USA.Association for Computational Linguistics.Ryan McDonald and Fernando Pereira. 2006. Onlinelearning of approximate dependency parsing algo-rithms. In
Proceedings of EACL . Association forComputational Linguistics.Joakim Nivre et al. 2017. Universal dependencies 2.1.LINDAT/CLARIN digital library at the Institute ofFormal and Applied Linguistics ( ´UFAL), Faculty ofMathematics and Physics, Charles University.Matthew E. Peters, Mark Neumann, Mohit Iyyer, MattGardner, Christopher Clark, Kenton Lee, and LukeZettlemoyer. 2018. Deep contextualized word rep-resentations. In