Tanja Samardzic
University of Geneva
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tanja Samardzic.
international conference on computational linguistics | 2014
Noëmi Aepli; Ruprecht von Waldenfels; Tanja Samardzic
In this paper, we present an approach to developing resources for a low-resource language, taking advantage of the fact that it is closely related to languages with more resources. In particular, we test our approach on Macedonian, which lacks tools for natural language processing as well as data in order to build such tools. We improve the Macedonian training set for supervised part-ofspeech tagging by transferring available manual annotations from a number of similar languages. Our approach is based on multilingual parallel corpora, automatic word alignment, and a set of rules (majority vote). The performance of a tagger trained on the improved data set of 88% accuracy is significantly better than the baseline of 76%. It can serve as a stepping stone for further improvement of resources for Macedonian. The proposed approach is entirely automatic and it can be easily adapted to other language in similar circumstances.
conference on computational natural language learning | 2017
Tatyana Ruzsics; Tanja Samardzic
Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.
sighum workshop on language technology for cultural heritage social sciences and humanities | 2015
Tanja Samardzic; Robert Schikowski; Sabine Stoll
Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic glossing of Chintang. We decompose the task of glossing into steps suitable for statistical processing. We first perform grammatical glossing as standard supervised part-of-speech tagging. We then add lexical glosses from a stand-off dictionary applying context disambiguation in a similar way to word lemmatisation. We obtain the highest accuracy score of 96% for grammatical and 94% for lexi-
conference of the european chapter of the association for computational linguistics | 2014
Tanja Samardzic; Paola Merlo
This article addresses the causal structure of events described by verbs: whether an event happens spontaneously or it is caused by an external causer. We automatically estimate the likelihood of external causation of events based on the distribution of causative and anticausative uses of verbs in the causative alternation. We train a Bayesian model and test it on a monolingual and on a bilingual input. The performance is evaluated against an independent scale of likelihood of external causation based on typological data. The accuracy of a two-way classification is 85% in both monolingual and bilingual setting. On the task of a three-way classification, the score is 61% in the monolingual setting and 69% in the bilingual setting.
linguistic annotation workshop | 2010
Lonneke van der Plas; Tanja Samardzic; Paola Merlo
meeting of the association for computational linguistics | 2012
Andrea Gesmundo; Tanja Samardzic
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground | 2010
Tanja Samardzic; Paola Merlo
Archive | 2015
Tanja Samardzic; Yves Scherrer; Elvira Glaser
language resources and evaluation | 2016
Tanja Samardzic; Yves Scherrer; Elvira Glaser
Archive | 2013
Tanja Samardzic