Andreas van Cranenburgh
Royal Netherlands Academy of Arts and Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andreas van Cranenburgh.
north american chapter of the association for computational linguistics | 2015
Federico Sangati; Andreas van Cranenburgh
We present a novel approach for the identification of multiword expressions (MWEs). The methodology extracts a large set of recurring syntactic fragments from a given treebank using a Tree-Kernel method. Di erently from previous studies, the expressions underlying these fragments are arbitrarily long and can include intervening gaps. In the initial study we use these fragments to identify MWEs as a parsing task (in a supervised manner) as proposed by Green et al. (2011). Here we obtain a small improvement over previous results. In the second part, we compare various association measures in reranking the expressions underlying these fragments in an unsupervised fashion. We show how a newly defined measure (Log Inside Ratio) based on statistical parsing techniques is able to outperform classical association measures in the French data.
north american chapter of the association for computational linguistics | 2015
Andreas van Cranenburgh; Corina Koolen
We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre.
Journal of Language Modelling | 2016
Andreas van Cranenburgh; Remko Scha; Rens Bod
Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.
meeting of the association for computational linguistics | 2017
Corina Koolen; Andreas van Cranenburgh
Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.
conference of the european chapter of the association for computational linguistics | 2012
Andreas van Cranenburgh
International Conference on Parsing Technologies | 2013
Andreas van Cranenburgh; Rens Bod
north american chapter of the association for computational linguistics | 2012
Andreas van Cranenburgh
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages | 2011
Andreas van Cranenburgh; Remko Scha; Federico Sangati
computational linguistics in the netherlands | 2014
Andreas van Cranenburgh
north american chapter of the association for computational linguistics | 2013
Kim Jautze; Corina Koolen; Andreas van Cranenburgh; Hayco de Jong