Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andreas van Cranenburgh is active.

Publication


Featured researches published by Andreas van Cranenburgh.


north american chapter of the association for computational linguistics | 2015

Multiword Expression Identification with Recurring Tree Fragments and Association Measures

Federico Sangati; Andreas van Cranenburgh

We present a novel approach for the identification of multiword expressions (MWEs). The methodology extracts a large set of recurring syntactic fragments from a given treebank using a Tree-Kernel method. Di erently from previous studies, the expressions underlying these fragments are arbitrarily long and can include intervening gaps. In the initial study we use these fragments to identify MWEs as a parsing task (in a supervised manner) as proposed by Green et al. (2011). Here we obtain a small improvement over previous results. In the second part, we compare various association measures in reranking the expressions underlying these fragments in an unsupervised fashion. We show how a newly defined measure (Log Inside Ratio) based on statistical parsing techniques is able to outperform classical association measures in the French data.


north american chapter of the association for computational linguistics | 2015

Identifying Literary Texts with Bigrams

Andreas van Cranenburgh; Corina Koolen

We study perceptions of literariness in a set of contemporary Dutch novels. Experiments with machine learning models show that it is possible to automatically distinguish novels that are seen as highly literary from those that are seen as less literary, using surprisingly simple textual features. The most discriminating features of our classification model indicate that genre might be a confounding factor, but a regression model shows that we can also explain variation between highly literary novels from less literary ones within genre.


Journal of Language Modelling | 2016

Data-Oriented Parsing with Discontinuous Constituents and Function Tags

Andreas van Cranenburgh; Remko Scha; Rens Bod

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.


meeting of the association for computational linguistics | 2017

These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution.

Corina Koolen; Andreas van Cranenburgh

Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.


conference of the european chapter of the association for computational linguistics | 2012

Efficient parsing with Linear Context-Free Rewriting Systems

Andreas van Cranenburgh


International Conference on Parsing Technologies | 2013

Discontinuous Parsing with an Efficient and Accurate DOP Model

Andreas van Cranenburgh; Rens Bod


north american chapter of the association for computational linguistics | 2012

Literary authorship attribution with phrase-structure fragments

Andreas van Cranenburgh


Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages | 2011

Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar

Andreas van Cranenburgh; Remko Scha; Federico Sangati


computational linguistics in the netherlands | 2014

Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel

Andreas van Cranenburgh


north american chapter of the association for computational linguistics | 2013

From high heels to weed attics: a syntactic investigation of chick lit and literature

Kim Jautze; Corina Koolen; Andreas van Cranenburgh; Hayco de Jong

Collaboration


Dive into the Andreas van Cranenburgh's collaboration.

Top Co-Authors

Avatar

Corina Koolen

Royal Netherlands Academy of Arts and Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Remko Scha

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar

Rens Bod

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar

Kim Jautze

Royal Netherlands Academy of Arts and Sciences

View shared research outputs
Top Co-Authors

Avatar

Dirk Roorda

Royal Netherlands Academy of Arts and Sciences

View shared research outputs
Top Co-Authors

Avatar

Gino Kalkman

VU University Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge