Johannes Bjerva
University of Groningen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Johannes Bjerva.
international conference on computational linguistics | 2014
Johannes Bjerva; Johan Bos; Rob van der Goot; Malvina Nissim
Shared Task 1 of SemEval-2014 comprised two subtasks on the same dataset of sentence pairs: recognizing textual entailment and determining textual similarity. We used an existing system based on formal semantics and logical inference to participate in the first subtask, reaching an accuracy of 82%, ranking in the top 5 of more than twenty participating systems. For determining semantic similarity we took a supervised approach using a variety of features, the majority of which was produced by our system for recognizing textual entailment. In this subtask our system achieved a mean squared error of 0.322, the best of all participating systems.
Handbook of Linguistic Annotation | 2017
Johan Bos; Valerio Basile; Kilian Evang; Noortje Venhuizen; Johannes Bjerva
The goal of the Groningen Meaning Bank (GMB) is to obtain a large corpus of English texts annotated with formal meaning representations. Since manually annotating a comprehensive corpus with deep semantic representations is a hard and time-consuming task, we employ a sophisticated bootstrapping approach. This method employs existing language technology tools (for segmentation, part-of-speech tagging, named entity tagging, animacy labelling, syntactic parsing, and semantic processing) to get a reasonable approximation of the target annotations as a starting point. The machine-generated annotations are then refined by information obtained from both expert linguists (using a wiki-like platform) and crowd-sourcing methods (in the form of a ‘Game with a Purpose’) which help us in deciding how to resolve syntactic and semantic ambiguities. The result is a semantic resource that integrates various linguistic phenomena, including predicate-argument structure, scope, tense, thematic roles, rhetorical relations and presuppositions. The semantic formalism that brings all levels of annotation together in one meaning representation is Discourse Representation Theory, which supports meaning representations that can be translated to first-order logic. In contrast to ordinary treebanks, the units of annotation in the GMB are texts, rather than isolated sentences. The current version of the GMB contains more than 10,000 public domain texts aligned with Discourse Representation Structures, and is freely available for research purposes.
north american chapter of the association for computational linguistics | 2016
Johannes Bjerva; Johannes Bos; Hessel Haagsma
We participated in the shared task on meaning representation parsing (Task 8 at SemEval-2016) with the aim of investigating whether we could use Boxer, an existing open-domain semantic parser, for this task. However, the meaning representations produced by Boxer, Discourse Representation Structures, are considerably different from Abstract Meaning Representations, AMRs, the target meaning representations of the shared task. Our hybrid conversion method (involving lexical adaptation as well as post-processing of the output) failed to produce state-of-the-art results. Nonetheless, F-scores of 53% on development and 47% on test data (50% unofficially) were obtained.
Proceedings of the Fourth Workshop on Metaphor in NLP | 2016
Hessel Haagsma; Johannes Bjerva
Recent work on metaphor processing often employs selectional preference information. We present a comparison of different approaches to the modelling of selectional preferences, based on various ways of generalizing over corpus frequencies. We evaluate on the VU Amsterdam Metaphor corpus, a broad corpus of metaphor. We find that using only selectional preference information is enough to outperform an all-metaphor baseline classification, but that generalization through prediction or clustering is not beneficial. A possible explanation for this lies in the nature of the evaluation data, and lack of power of selectional preference information on its own for non-novel metaphor detection. To better investigate the role of metaphor type in metaphor detection, we suggest a resource with annotation of novel metaphor should be created.
sighum workshop on language technology for cultural heritage social sciences and humanities | 2015
Johannes Bjerva; Raf Praet
Continuous space representations of words are currently at the core of many state-of-the-art approaches to problems in natural language processing. In spite of several advantages of using such methods, they have seen little usage within digital humanities. In this paper, we show a case study of how such models can be used to find interesting relationships within the field of late antiquity. We use a word2vec model trained on over one billion words of Latin to investigate the relationships between persons and concepts of interest from works of the 6 th -century scholar Cassiodorus. The results show that the method has high potential to aid the humanities scholar, but that caution must be taken as the analysis requires the assessment by the traditional historian.
conference of the european chapter of the association for computational linguistics | 2014
Johannes Bjerva
Animacy is the semantic property of nouns denoting whether an entity can act, or is perceived as acting, of its own will. This property is marked grammatically in various languages, albeit rarely in English. It has recently been highlighted as a relevant property for NLP applications such as parsing and anaphora resolution. In order for animacy to be used in conjunction with other semantic features for such applications, appropriate data is necessary. However, the few corpora which do contain animacy annotation, rarely contain much other semantic information. The addition of such an annotation layer to a corpus already containing deep semantic annotation should therefore be of particular interest. The work presented in this paper contains three main contributions. Firstly, we improve upon the state of the art in multiclass animacy classification. Secondly, we use this classifier to contribute to the annotation of an openly available corpus containing deep semantic annotation. Finally, we provide source code, as well as trained models and scripts needed to reproduce the results presented in this paper, or aid in annotation of other texts. 1
conference on computational natural language learning | 2017
Robert Östling; Johannes Bjerva
This paper describes the Stockholm University/University of Groningen (SU-RUG) system for the SIGMORPHON 2017 shared task on morphological inflection. Our system is based on an attentional sequence-to-sequence neural network model using Long Short-Term Memory (LSTM) cells, with joint training of morphological inflection and the inverse transformation, i.e. lemmatization and morphological analysis. Our system outperforms the baseline with a large margin, and our submission ranks as the 4th best team for the track we participate in (task 1, high-resource).
international conference on computational linguistics | 2016
Johannes Bjerva; Barbara Plank; Johan Bos
conference of the european chapter of the association for computational linguistics | 2017
Lasha Abzianidze; Johannes Bjerva; Kilian Evang; Hessel Haagsma; Rik van Noord; Pierre Ludmann; Duc-Duy Nguyen; Johannes Bos
arXiv: Computation and Language | 2017
Johannes Bjerva