Mihaela Vela
Saarland University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mihaela Vela.
NLPXML '06 Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing | 2006
Silvia Hansen-Schirra; Stella Neumann; Mihaela Vela
This paper presents the compilation of the CroCo Corpus, an English-German translation corpus. Corpus design, annotation and alignment are described in detail. In order to guarantee the searchability and exchangeability of the corpus, XML stand-off mark-up is used as representation format for the multi-layer annotation. On this basis it is shown how the corpus can be queried using XQuery. Furthermore, the generalisation of results in terms of linguistic and translational research questions is briefly discussed.
workshop on statistical machine translation | 2015
Santanu Pal; Mihaela Vela; Sudip Kumar Naskar; Josef van Genabith
We describe the USAAR-SAPE English‐ Spanish Automatic Post-Editing (APE) system submitted to the APE Task organized in the Workshop on Statistical Machine Translation (WMT) in 2015. Our system was able to improve upon the baseline MT system output by incorporating Phrase-Based Statistical MT (PBSMT) technique into the monolingual Statistical APE task (SAPE). The reported final submission crucially involves hybrid word alignment. The SAPE system takes raw Spanish Machine Translation (MT) output provided by the shared task organizers and produces post-edited Spanish text. The parallel data consist of English Text, raw machine translated Spanish output, and their corresponding manually post-edited versions. The major goal of the task is to reduce the post-editing effort by improving the quality of the MT output in terms of fluency and adequacy.
workshop on statistical machine translation | 2015
Mihaela Vela; Liling Tan
This paper describes USAAR’s submission to the the metrics shared task of the Workshop on Statistical Machine Translation (WMT) in 2015. The goal of our submission is to take advantage of the semantic overlap between hypothesis and reference translation for predicting MT output adequacy using language independent document embeddings. The approach presented here is learning a Bayesian Ridge Regressor using document skip-gram embeddings in order to automatically evaluate Machine Translation (MT) output by predicting semantic adequacy scores. The evaluation of our submission ‐ measured by the correlation with human judgements ‐ shows promising results on system-level scores.
conference of the european chapter of the association for computational linguistics | 2014
Marcos Zampieri; Mihaela Vela
This paper presents experiments on the use of machine translation output for technical translation. MT output was used to produced translation memories that were used with a commercial CAT tool. Our experiments investigate the impact of the use of different translation memories containing MT output in translations’ quality and speed compared to the same task without the use of translation memory. We evaluated the performance of 15 novice translators translating technical English texts into German. Results suggest that translators are on average over 28% faster when using TM.
information and communication technologies in tourism | 2012
Walter Kasper; Mihaela Vela
User reviews and comments on hotels on the Web are an important information source in travel planning. Knowledge of such comments therefore can be important to the hotel management for quality control. But often it is difficult to find and to follow such information on the Web. We present a system that automatically monitors user comments on the Web from various sites and provides classified summaries of positive and negative features of a hotel.
Proceedings of the Eight International Conference on Computational Semantics | 2009
Mihaela Vela; Thierry Declerck
In this paper, we describe the state of our work on the possible derivation of ontological structures from textual analysis. We propose an approach to semi-automatic generation of domain ontologies from scratch, on the basis of heuristic rules applied to the result of a multi-layered processing of textual documents.
meeting of the association for computational linguistics | 2016
Santanu Pal; Sudip Kumar Naskar; Mihaela Vela; Josef van Genabith
We present a neural network based automatic post-editing (APE) system to improve raw machine translation (MT) output. Our neural model of APE (NNAPE) is based on a bidirectional recurrent neural network (RNN) model and consists of an encoder that encodes an MT output into a fixed-length vector from which a decoder provides a post-edited (PE) translation. APE translations produced by NNAPE show statistically significant improvements of 3.96, 2.68 and 1.35 BLEU points absolute over the original MT, phrase-based APE and hierarchical APE outputs, respectively. Furthermore, human evaluation shows that the NNAPE generated PE translations are much better than the original MT output.
Machine Translation | 2016
Rohit Gupta; Constantin OrăźSan; Marcos Zampieri; Mihaela Vela; Josef van Genabith; Ruslan Mitkov
Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.
recent advances in natural language processing | 2017
Octavia-Maria Sulea; Marcos Zampieri; Mihaela Vela; Josef van Genabith
In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judges motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.
The Prague Bulletin of Mathematical Linguistics | 2016
Artuur Leeuwenberg; Mihaela Vela; Jon Dehdari; Josef van Genabith
Abstract In this paper we present a novel approach to minimally supervised synonym extraction. The approach is based on the word embeddings and aims at presenting a method for synonym extraction that is extensible to various languages. We report experiments with word vectors trained by using both the continuous bag-of-words model (CBoW) and the skip-gram model (SG) investigating the effects of different settings with respect to the contextual window size, the number of dimensions and the type of word vectors. We analyze the word categories that are (cosine) similar in the vector space, showing that cosine similarity on its own is a bad indicator to determine if two words are synonymous. In this context, we propose a new measure, relative cosine similarity, for calculating similarity relative to other cosine-similar words in the corpus. We show that calculating similarity relative to other words boosts the precision of the extraction. We also experiment with combining similarity scores from differently-trained vectors and explore the advantages of using a part-of-speech tagger as a way of introducing some light supervision, thus aiding extraction. We perform both intrinsic and extrinsic evaluation on our final system: intrinsic evaluation is carried out manually by two human evaluators and we use the output of our system in a machine translation task for extrinsic evaluation, showing that the extracted synonyms improve the evaluation metric.