Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tanja Samardzic is active.

Publication


Featured researches published by Tanja Samardzic.


international conference on computational linguistics | 2014

Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote

Noëmi Aepli; Ruprecht von Waldenfels; Tanja Samardzic

In this paper, we present an approach to developing resources for a low-resource language, taking advantage of the fact that it is closely related to languages with more resources. In particular, we test our approach on Macedonian, which lacks tools for natural language processing as well as data in order to build such tools. We improve the Macedonian training set for supervised part-ofspeech tagging by transferring available manual annotations from a number of similar languages. Our approach is based on multilingual parallel corpora, automatic word alignment, and a set of rules (majority vote). The performance of a tagger trained on the improved data set of 88% accuracy is significantly better than the baseline of 76%. It can serve as a stepping stone for further improvement of resources for Macedonian. The proposed approach is entirely automatic and it can be easily adapted to other language in similar circumstances.


conference on computational natural language learning | 2017

Neural Sequence-to-sequence Learning of Internal Word Structure.

Tatyana Ruzsics; Tanja Samardzic

Learning internal word structure has recently been recognized as an important step in various multilingual processing tasks and in theoretical language comparison. In this paper, we present a neural encoder-decoder model for learning canonical morphological segmentation. Our model combines character-level sequence-to-sequence transformation with a language model over canonical segments. We obtain up to 4% improvement over a strong character-level encoder-decoder baseline for three languages. Our model outperforms the previous state-of-the-art for two languages, while eliminating the need for external resources such as large dictionaries. Finally, by comparing the performance of encoder-decoder and classical statistical machine translation systems trained with and without corpus counts, we show that including corpus counts is beneficial to both approaches.


sighum workshop on language technology for cultural heritage social sciences and humanities | 2015

Automatic interlinear glossing as two-level sequence classification

Tanja Samardzic; Robert Schikowski; Sabine Stoll

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic glossing of Chintang. We decompose the task of glossing into steps suitable for statistical processing. We first perform grammatical glossing as standard supervised part-of-speech tagging. We then add lexical glosses from a stand-off dictionary applying context disambiguation in a similar way to word lemmatisation. We obtain the highest accuracy score of 96% for grammatical and 94% for lexi-


conference of the european chapter of the association for computational linguistics | 2014

Likelihood of External Causation in the Structure of Events

Tanja Samardzic; Paola Merlo

This article addresses the causal structure of events described by verbs: whether an event happens spontaneously or it is caused by an external causer. We automatically estimate the likelihood of external causation of events based on the distribution of causative and anticausative uses of verbs in the causative alternation. We train a Bayesian model and test it on a monolingual and on a bilingual input. The performance is evaluated against an independent scale of likelihood of external causation based on typological data. The accuracy of a two-way classification is 85% in both monolingual and bilingual setting. On the task of a three-way classification, the score is 61% in the monolingual setting and 69% in the bilingual setting.


linguistic annotation workshop | 2010

Cross-Lingual Validity of PropBank in the Manual Annotation of French

Lonneke van der Plas; Tanja Samardzic; Paola Merlo


meeting of the association for computational linguistics | 2012

Lemmatisation as a Tagging Task

Andrea Gesmundo; Tanja Samardzic


Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground | 2010

Cross-Lingual Variation of Light Verb Constructions: Using Parallel Corpora and Automatic Alignment for Linguistic Research

Tanja Samardzic; Paola Merlo


Archive | 2015

Normalising orthographic and dialectal variants for the automatic processing of Swiss German

Tanja Samardzic; Yves Scherrer; Elvira Glaser


language resources and evaluation | 2016

ArchiMob - A Corpus of Spoken Swiss German

Tanja Samardzic; Yves Scherrer; Elvira Glaser


Archive | 2013

Dynamics, causation, duration in the predicate-argument structure of verbs : a computational approach based on parallel corpora

Tanja Samardzic

Collaboration


Dive into the Tanja Samardzic's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge