Marcos Zampieri
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcos Zampieri.
meeting of the association for computational linguistics | 2016
Ondˇrej Bojar; Rajen Chatterjee; Christian Federmann; Yvette Graham; Barry Haddow; Matthias Huck; Antonio Jimeno Yepes; Philipp Koehn; Varvara Logacheva; Christof Monz; Matteo Negri; Aurélie Névéol; Mariana L. Neves; Martin Popel; Matt Post; Raphael Rubino; Carolina Scarton; Lucia Specia; Marco Turchi; Karin Verspoor; Marcos Zampieri
This paper presents the results of the WMT16 shared tasks, which included five machine translation (MT) tasks (standard news, IT-domain, biomedical, multimodal, pronoun), three evaluation tasks (metrics, tuning, run-time estimation of MT quality), and an automatic post-editing task and bilingual document alignment task. This year, 102 MT systems from 24 institutions (plus 36 anonymized online systems) were submitted to the 12 translation directions in the news translation task. The IT-domain task received 31 submissions from 12 institutions in 7 directions and the Biomedical task received 15 submissions systems from 5 institutions. Evaluation was both automatic and manual (relative ranking and 100-point scale assessments). The quality estimation task had three subtasks, with a total of 14 teams, submitting 39 entries. The automatic post-editing task had a total of 6 teams, submitting 11 entries.
international conference on computational linguistics | 2014
Marcos Zampieri; Liling Tan; Nikola Ljubešić; Jörg Tiedemann
This paper summarizes the methods, results and findings of the Discriminating between Similar Languages (DSL) shared task 2014. The shared task provided data from 13 different languages and varieties divided into 6 groups. Participants were required to train their systems to discriminate between languages on a training and development set containing 20,000 sentences from each language (closed submission) and/or any other dataset (open submission). One month later, a test set containing 1,000 unidentified instances per language was released for evaluation. The DSL shared task received 22 inscriptions and 8 final submissions. The best system obtained 95.7% average accuracy.
processing of the portuguese language | 2010
Jorge Baptista; Neuza Costa; Joaquim Guerra; Marcos Zampieri; Maria Cabral; Nuno J. Mamede
This paper presents and discusses the methodology for the construction of an Academic Word List for Portuguese: PAWL, inspired in its English equivalent. The aim of this linguistic resource is to provide a solid base for future studies and applications on Computer Assisted Language Learning, while maintaining comparability with other comparable resources.
conference of the european chapter of the association for computational linguistics | 2014
Vlad Niculae; Marcos Zampieri; Liviu P. Dinu; Alina Maria Ciobanu
This paper presents a novel approach to the task of temporal text classification combining text ranking and probability for the automatic dating of historical texts. The method was applied to three historical corpora: an English, a Portuguese and a Romanian corpus. It obtained performance ranging from 83% to 93% accuracy, using a fully automated approach with very basic features.
north american chapter of the association for computational linguistics | 2016
Shervin Malmasi; Marcos Zampieri; Mark Dras
We present our approach to predicting the severity of user posts in a mental health forum. This system was developed to compete in the 2016 Computational Linguistics and Clinical Psychology (CLPsych) Shared Task. Our entry employs a meta-classifier which uses a set of of base classifiers constructed from lexical, syntactic and metadata features. These classifiers were generated for both the target posts as well as their contexts, which included both preceding and subsequent posts. The output from these classifiers was used to train a meta-classifier, which outperformed all individual classifiers as well as an ensemble classifier. This meta-classifier was then extended to a Random Forest of meta-classifiers, yielding further improvements in classification accuracy. We achieved competitive results, ranking first among a total of 60 submitted entries in the competition.
international symposium on computational intelligence and informatics | 2013
Marcos Zampieri
This paper presents a number of experiments describing the use of machine learning algorithms and bag-of-words to the task of automatic language identification. The paper focuses on the identification of language varieties, which is a known weakness of general purpose language identification methods. This question was addressed by a number of studies in the recent years, most of them relying on character n-gram language models. In this paper, I experiment simple bag-of-words and compare the results with previously proposed n-gram-based approaches. To perform these classification experiments three algorithms were used: Multinomial Naive Bayes (MNB), Support Vector Machines (SVM) and the J48 classifier.
text speech and dialogue | 2013
Sanja Štajner; Marcos Zampieri
This paper investigates stylistic changes in a set of Portuguese historical texts ranging from the 17th to the early 20th century and presents a supervised method to classify them per century. Four stylistic features – average sentence length (ASL), average word length (AWL), lexical density (LD), and lexical richness (LR) – were automatically extracted for each sub-corpus. The initial analysis of diachronic changes in these four features revealed that the texts written in the 17th and 18th centuries have similar AWL, LD and LR, which differ significantly from those in the texts written in the 19th and 20th centuries. This information was later used in automatic classification of texts per century, leading to an F-Measure of 0.92.
Proceedings of the Fourth Workshop on NLP for Similar Languages,#N# Varieties and Dialects (VarDial) | 2017
Marcos Zampieri; Shervin Malmasi; Nikola Ljubešić; Preslav Nakov; Ahmed M. Ali; Jörg Tiedemann; Yves Scherrer; Noëmi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL’2017. This year, we included four shared tasks: Discriminating between Similar Languages (DSL), Arabic Dialect Identification (ADI), German Dialect Identification (GDI), and Cross-lingual Dependency Parsing (CLP). A total of 19 teams submitted runs across the four tasks, and 15 of them wrote system description papers.
conference of the european chapter of the association for computational linguistics | 2014
Marcos Zampieri; Mihaela Vela
This paper presents experiments on the use of machine translation output for technical translation. MT output was used to produced translation memories that were used with a commercial CAT tool. Our experiments investigate the impact of the use of different translation memories containing MT output in translations’ quality and speed compared to the same task without the use of translation memory. We evaluated the performance of 15 novice translators translating technical English texts into German. Results suggest that translators are on average over 28% faster when using TM.
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers | 2016
Santanu Pal; Marcos Zampieri; Josef van Genabith
This paper presents an automatic postediting (APE) method to improve the translation quality produced by an English–German (EN–DE) statistical machine translation (SMT) system. Our system is based on Operation Sequential Model (OSM) combined with phrasedbased statistical MT (PB-SMT) system. The system is trained on monolingual settings between MT outputs (TLMT ) produced by a black-box MT system and their corresponding post-edited version (TLPE). Our system achieves considerable improvement over TLMT on a held-out development set. The reported system achieves 64.10 BLEU (1.99 absolute points and 3.2% relative improvement in BLEU over raw MT output) and 24.14 TER and a TER score of 24.14 (0.66 absolute points and 0.25% relative improvement in TER over raw MT output) in the official test set.