Magnus Merkel
Linköping University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Magnus Merkel.
meeting of the association for computational linguistics | 1998
Lars Ahrenberg; Mikael Andersson; Magnus Merkel
We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses cooccurrence statistics as a basis, but differs in the assumptions it makes about the translation process. The algorithm has been implemented in a modular system that allows the user to experiment with different combinations and variants of these assumptions. We give performance results from two evaluations, which compare will with results reported in the literature.
Archive | 2000
Lars Ahrenberg; Magnus Merkel
Addresses themes such as techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; the use of parallel texts in fields as diverse as translation, lexicogr ...
Journal of Biomedical Informatics | 2009
Louise Deléger; Magnus Merkel; Pierre Zweigenbaum
Developing international multilingual terminologies is a time-consuming process. We present a methodology which aims to ease this process by automatically acquiring new translations of medical terms based on word alignment in parallel text corpora, and test it on English and French. After collecting a parallel, English-French corpus, we detected French translations of English terms from three terminologies-MeSH, SNOMED CT and the MedlinePlus Health Topics. We obtained respectively for each terminology 74.8%, 77.8% and 76.3% of linguistically correct new translations. A sample of the MeSH translations was submitted to expert review and 61.5% were deemed desirable additions to the French MeSH. In conclusion, we successfully obtained good quality new translations, which underlines the suitability of using alignment in text corpora to help translating terminologies. Our method may be applied to different European languages and provides a methodological framework that may be used with different processing tools.
conference of the european chapter of the association for computational linguistics | 2003
Lars Ahrenberg; Magnus Merkel; Michael Petterstedt
In this paper we report ongoing work on developing an interactive word alignment environment that will assist a user to quickly produce accurate full-coverage word alignment in bitexts for different language engineering tasks, such as MT lexicons and gold standards for evaluation. The system uses a graphical interface, static and dynamic resources as well as machine learning techniques. We also sketch how the system is being integrated with an automatic word aligner.
BMC Medical Informatics and Decision Making | 2006
Mikael Nyström; Magnus Merkel; Lars Ahrenberg; Pierre Zweigenbaum; Håkan Petersson; Hans Åhlfeldt
BackgroundThis paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish.MethodsThe medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates.ResultsA dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages.ConclusionIn three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.
BMC Medical Informatics and Decision Making | 2007
Mikael Nyström; Magnus Merkel; Håkan Petersson; Hans Åhlfeldt
BackgroundAutomatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality.MethodsWe automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary.ResultsThe results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms.ConclusionMore resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.
Archive | 2002
Magnus Merkel; Mikael Andersson; Lars Ahrenberg
In this paper an approach of using gold standards to evaluate word alignment systems is described. To make the process of creating gold standards easier, an interactive tool called the PLUG Link Annotator is presented along with the Link Scorer, which automatically evaluates the output from a word alignment system against the gold standard. It is argued that using reference data in this manner has several advantages, the most important being consistency in evaluation criteria as well as savings in time, due to the fact that the reference data only need to be constructed once, but can be applied many times.
international conference on computational linguistics | 1988
Magnus Merkel
In this paper interpretation principles for simple and complex frame-adverbial expressions are presented. Central to these principles is a distinction between phases and periods together with the temporal hierarchy, where multiple scales of time and relations can be expressed. A system, CLOCKWISE, has been implemented which interprets Swedish temporal expressions according to the principles outlined in the paper.
Mathematics in Computer Science | 2013
Lars Eldén; Magnus Merkel; Lars Ahrenberg; Martin Fagerlund
Using the technique of semantic mirroring a graph is obtained that represents words and their translations from a parallel corpus or a bilingual lexicon. The connectedness of the graph holds information about the semantic relations of words that occur in the translations. Spectral graph theory is used to partition the graph, which leads to a grouping of the words in different clusters. We illustrate the method using a small sample of seed words from a lexicon of Swedish and English adjectives and discuss its application to computational lexical semantics and lexicography.
Archive | 1999
Magnus Merkel