Reinhard Rapp
University of Paderborn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Reinhard Rapp.
meeting of the association for computational linguistics | 1995
Reinhard Rapp
Common algorithms for sentence and word-alignment allow the automatic identification of word translations from parallel texts. This study suggests that the identification of word translations should also be possible with non-parallel and even unrelated texts. The method proposed is based on the assumption that there is a correlation between the patterns of word co-occurrences in texts of different languages.
meeting of the association for computational linguistics | 2009
Reinhard Rapp
Automatic tools for machine translation (MT) evaluation such as BLEU are well established, but have the drawbacks that they do not perform well at the sentence level and that they presuppose manually translated reference texts. Assuming that the MT system to be evaluated can deal with both directions of a language pair, in this research we suggest to conduct automatic MT evaluation by determining the orthographic similarity between a back-translation and the original source text. This way we eliminate the need for human translated reference texts. By correlating BLEU and back-translation scores with human judgments, it could be shown that the backtranslation score gives an improved performance at the sentence level.
Archive | 2014
Serge Sharoff; Reinhard Rapp; Pierre Zweigenbaum; Pascale Fung
The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume providesa reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.
international conference on computational linguistics | 2008
Reinhard Rapp
It is shown that the behaviour of test persons as observed in association experiments can be simulated statistically on the basis of the common occurrences of words in large text corpora, thereby applying the law of association by contiguity which is well known from psychological learning theory. In particular, the focus of this work is on the prediction of the word associations as obtained from subjects on presentation of multiword stimuli. Results are presented for applications as diverse as crossword puzzle solving and the identification of word translations based on non-parallel texts.
international symposium on neural networks | 1991
Reinhard Rapp; Manfred Wettler
An associative lexical net whose weights are computed on the basis of the co-occurrences of words using Hebbs rule has been built. The co-occurrences of word pairs are determined by shifting a window over a large body of text. To estimate the associative response to a given stimulus word, the corresponding node is activated and its activity is propagated in the net. The proposed model assumes that words with high activities after propagation correspond to the associative responses of human subjects. These predictions have been tested and confirmed using the association norms collected by Russel and Jenkins.<<ETX>>
international conference on computational linguistics | 2014
Reinhard Rapp; Michael Zock
The shared task of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALexIV) was devoted to a subtask of the lexical access problem, namely multi-stimulus association. In this task, participants were supposed to determine automatically an expected response based on a number of received stimulus words. We describe here the task definition, the theoretical background, the training and test data sets, and the evaluation procedure used for ranking the participating systems. We also summarize the approaches used and present the results of the evaluation. In conclusion, the outcome of the competition are a number of systems which provide very good solutions to the problem.
Journal of the Association for Information Science and Technology | 1995
Reginald Ferber; Manfred Wettler; Reinhard Rapp
To generate a search query based on a end‐user request, a database searcher has to select appropriate search terms. These terms can either be taken from the request, or they can be added by the searcher. This selection process is simulated by an associative lexical net; the nodes of the net are the terms used in 94 records of written requests to a psychological information agency and the respective on‐line searches. The weights connecting the nodes are calculated from the co‐occurrences of these terms in the abstracts of the database PsycLIT. To simulate the term selection process for a query, the nodes of all terms used in the written request are activated, and one or more spreading activation cycles are performed. The result of the simulation is a ranking of the terms according to the activities of their nodes. Simulations for all 94 records show a low mean activity rank for the terms selected from the request; the mean activity rank for new terms added by the searcher is lower than the mean activity rank for those terms of the request that were not used in the query.
meeting of the association for computational linguistics | 2017
Pierre Zweigenbaum; Serge Sharoff; Reinhard Rapp
This paper presents the BUCC 2017 shared task on parallel sentence extraction from comparable corpora. It recalls the design of the datasets, presents their final construction and statistics and the methods used to evaluate system results. 13 runs were submitted to the shared task by 4 teams, covering three of the four proposed language pairs: French-English (7 runs), German-English (3 runs), and Chinese-English (3 runs). The best F-scores as measured against the gold standard were 0.84 (German-English), 0.80 (French-English), and 0.43 (Chinese-English). Because of the design of the dataset, in which not all gold parallel sentence pairs are known, these are only minimum values. We examined manually a small sample of the false negative sentence pairs for the most precise French-English runs and estimated the number of parallel sentence pairs not yet in the provided gold standard. Adding them to the gold standard leads to revised estimates for the French-English F-scores of at most +1.5pt. This suggests that the BUCC 2017 datasets provide a reasonable approximate evaluation of the parallel sentence spotting task.
Natural Language Engineering | 2016
Reinhard Rapp; Serge Sharoff; Pierre Zweigenbaum
This paper highlights some of the recent developments in the field of machine translation using comparable corpora. We start by updating previous definitions of comparable corpora and then look at bilingual versions of continuous vector space models. Recently, neural networks have been used to obtain latent context representations with only few dimensions which are often called word embeddings. These promising new techniques cannot only be applied to parallel but also to comparable corpora. Subsequent sections of the paper discuss work specifically targeting at machine translation using comparable corpora, as well as work dealing with the extraction of parallel segments from comparable corpora. Finally, we give an overview on the design and the results of a recent shared task on measuring document comparability across languages.
meeting of the association for computational linguistics | 2015
Serge Sharoff; Pierre Zweigenbaum; Reinhard Rapp
We summarise the organisation and results of the first shared task aimed at detecting the most similar texts in a large multilingual collection. The dataset of the shared was based on Wikipedia dumps with interlanguage links with further filtering to ensure comparability of the paired articles. The eleven system runs we received have been evaluated using the TREC evaluation metrics. 1 Task description Parallel corpora of original texts with their translations provide the basis for multilingual NLP applications since the beginning of the 1990s. Relative scarcity of such resources led to greater attention to comparable (=less parallel) resources to mine information about possible translations. Many studies have been produced within the paradigm of comparable corpora, including publications in