Lieve Macken
Hogeschool Gent
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lieve Macken.
meeting of the association for computational linguistics | 2009
Els Lefever; Lieve Macken; Veronique Hoste
We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms.
ACM Transactions on Intelligent Systems and Technology | 2016
Sarah Schulz; Guy De Pauw; Orphée De Clercq; Bart Desmet; Veronique Hoste; Walter Daelemans; Lieve Macken
As social media constitutes a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the nonstandard language used on social media poses problems for natural language processing (NLP) tools, as these are typically trained on standard language material. We propose a text normalization approach to tackle this problem. More specifically, we investigate the usefulness of a multimodular approach to account for the diversity of normalization issues encountered in user-generated content (UGC). We consider three different types of UGC written in Dutch (SNS, SMS, and tweets) and provide a detailed analysis of the performance of the different modules and the overall system. We also apply an extrinsic evaluation by evaluating the performance of a part-of-speech tagger, lemmatizer, and named-entity recognizer before and after normalization.
international conference on computational linguistics | 2008
Lieve Macken; Els Lefever; Veronique Hoste
We present a sub-sentential alignment system that links linguistically motivated phrases in parallel texts based on lexical correspondences and syntactic similarity. We compare the performance of our sub-sentential alignment system with different symmetrization heuristics that combine the GIZA++ alignments of both translation directions. We demonstrate that the aligned linguistically motivated phrases are a useful means to extract bilingual terminology and more specifically complex multiword terms.
Essential speech and language technology for Dutch : results by the STEVIN-programme | 2013
Hans Paulussen; Lieve Macken; Willy Vandeweghe; Piet Desmet
Parallel corpora are a valuable resource for researchers across a wide range of disciplines,i.e. machine translation, computer-assisted translation, terminology extraction, computer-assisted language learning, contrastive linguistics and translation studies. Since the development of a high-quality parallel corpus is a time-consuming and costly process, the DPC project aimed at the creation of a multifunctional resource that satisfies the needs of this diverse group of disciplines.
New directions in empirical translation process research : exploring the CRITT TPR-DB | 2016
Joke Daems; Michael Carl; Sonia Vandepitte; Robert J. Hartsuiker; Lieve Macken
Consulting external resources is an important aspect of the translation process. Whereas most previous studies were limited to screen capture software to analyze the usage of external resources, we present a more convenient way to capture this data, by combining the functionalities of CASMACAT with those of Inputlog, two state-of-the-art logging tools. We used this data to compare the types of resources used and the time spent in external resources for 40 from-scratch translation sessions (HT) and 40 post-editing (PE) sessions of 10 master’s students of translation (from English into Dutch). We took a closer look at the effect of the usage of external resources on productivity and quality of the final product. The types of resources consulted were comparable for HT and PE, but more time was spent in external resources when translating. Though search strategies seemed to be more successful when translating than when post-editing, the quality of the final product was comparable, and post-editing was faster than regular translation.
workshop on statistical machine translation | 2015
Arda Tezcan; Veronique Hoste; Bart Desmet; Lieve Macken
This paper describes the submission of the UGENT-LT3 SCATE system to the WMT15 Shared Task on Quality Estimation (QE), viz. English-Spanish word and sentence-level QE. We conceived QE as a supervised Machine Learning (ML) problem and designed additional features and combined these with the baseline feature set to estimate quality. The sentence-level QE system re-uses the word level predictions of the word-level QE system. We experimented with different learning methods and observe improvements over the baseline system for wordlevel QE with the use of the new features and by combining learning methods into ensembles. For sentence-level QE we show that using a single feature based on word-level predictions can perform better than the baseline system and using this in combination with additional features led to further improvements in performance.
international conference on computational linguistics | 2010
Lieve Macken; Walter Daelemans
We present a linguistically-motivated sub-sentential alignment system that extends the intersected IBM Model 4 word alignments. The alignment system is chunk-driven and requires only shallow linguistic processing tools for the source and the target languages, i.e. part-of-speech taggers and chunkers. We conceive the sub-sentential aligner as a cascaded model consisting of two phases. In the first phase, anchor chunks are linked based on the intersected word alignments and syntactic similarity. In the second phase, we use a bootstrapping approach to extract more complex translation patterns. The results show an overall AER reduction and competitive F-Measures in comparison to the commonly used symmetrized IBM Model 4 predictions (intersection, union and grow-diag-final) on six different text types for English-Dutch. More in particular, in comparison with the intersected word alignments, the proposed method improves recall, without sacrificing precision. Moreover, the system is able to align discontiguous chunks, which frequently occur in Dutch.
language resources and evaluation | 2014
Maribel Montero Perez; Hans Paulussen; Lieve Macken; Piet Desmet
The aim of this paper is to illustrate the potential of a parallel corpus in the context of (computer-assisted) language learning. In order to do so, we propose to answer two main questions (1) what corpus (data) to use and (2) how to use the corpus (data). We provide an answer to the what-question by describing the importance and particularities of compiling and processing a corpus for pedagogical purposes. In order to answer the how-question, we first investigate the central concepts of the interactionist theory of second language acquisition: comprehensible input, input enhancement, comprehensible output and output enhancement. By means of two case studies, we illustrate how the abovementioned concepts can be realized in concrete corpus-based language learning activities. We propose a design for a receptive and productive language task and describe how a parallel corpus can be at the basis of powerful language learning activities. The Dutch Parallel Corpus, a ten-million word sentence aligned and annotated parallel corpus, is used to develop these language tasks.
Frontiers in Psychology | 2017
Joke Daems; Sonia Vandepitte; Robert J. Hartsuiker; Lieve Macken
Translation Environment Tools make translators’ work easier by providing them with term lists, translation memories and machine translation output. Ideally, such tools automatically predict whether it is more effortful to post-edit than to translate from scratch, and determine whether or not to provide translators with machine translation output. Current machine translation quality estimation systems heavily rely on automatic metrics, even though they do not accurately capture actual post-editing effort. In addition, these systems do not take translator experience into account, even though novices’ translation processes are different from those of professional translators. In this paper, we report on the impact of machine translation errors on various types of post-editing effort indicators, for professional translators as well as student translators. We compare the impact of MT quality on a product effort indicator (HTER) with that on various process effort indicators. The translation and post-editing process of student translators and professional translators was logged with a combination of keystroke logging and eye-tracking, and the MT output was analyzed with a fine-grained translation quality assessment approach. We find that most post-editing effort indicators (product as well as process) are influenced by machine translation quality, but that different error types affect different post-editing effort indicators, confirming that a more fine-grained MT quality analysis is needed to correctly estimate actual post-editing effort. Coherence, meaning shifts, and structural issues are shown to be good indicators of post-editing effort. The additional impact of experience on these interactions between MT quality and post-editing effort is smaller than expected.
Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers | 2016
Arda Tezcan; Veronique Hoste; Lieve Macken
This paper describes the submission of the UGENT-LT3 SCATE system to the WMT16 Shared Task on Quality Estimation (QE), viz. English-German word and sentence-level QE. Based on the observation that the data set is homogeneous (all sentences belong to the IT domain), we performed bilingual terminology extraction and added features derived from the resulting term list to the well-performing features of the word-level QE task of last year. For sentence-level QE, we analyzed the importance of the features and based on those insights extended the feature set of last year. We also experimented with different learning methods and ensembles. We present our observations from the different experiments we conducted and our submissions for both tasks.