Hrafn Loftsson
Reykjavík University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hrafn Loftsson.
Nordic Journal of Linguistics | 2008
Hrafn Loftsson
The Icelandic language is a morphologically complex language, for which a large tagset has been created. This paper describes the design of a linguistic rulebased system for part-of-speech tagging Icelandic text. The system contains two main components: a disambiguator, IceTagger, and an unknown word guesser, IceMorphy. IceTagger uses a small number of local elimination rules along with a global heuristics component. The heuristics guess the functional roles of the words in a sentence, mark prepositional phrases, and use the acquired knowledge to force feature agreement where appropriate. IceMorphy is used for guessing the tag profile for unknown words and for automatically filling tag profile gaps in the lexicon. Evaluation shows that IceTagger achieves 91.54% accuracy, a substantial improvement on the highest accuracy, 90.44%, obtained using three state-of-the-art data-driven taggers. Furthermore, the accuracy increases to 92.95% by using IceTagger along with two data-driven taggers in a simple voting scheme. The development time of the tagging system was only 7 man-months, which can be considered a short development period for a linguistic rule-based system.
international conference natural language processing | 2008
Anton Karl Ingason; Sigrún Helgadóttir; Hrafn Loftsson; Eiríkur Rögnvaldsson
We present a new mixed method lemmatizer for Icelandic, Lemmald, which achieves good performance by relying on IceTagger [1] for tagging and The Icelandic Frequency Dictionary [2] corpus for training. We combine the advantages of data-driven machine learning with linguistic insights to maximize performance. To achieve this, we make use of a novel approach: Hierarchy of Linguistic Identities (HOLI), which involves organizing features and feature structures for the machine learning based on linguistic knowledge. Accuracy of the lemmatization is further improved using an add-on which connects to the Database of Modern Icelandic Inflections [3]. Given correct tagging, our system lemmatizes Icelandic text with an accuracy of 99.55%. We believe our method can be fruitfully adapted to other morphologically rich languages.
meeting of the association for computational linguistics | 2009
Hrafn Loftsson
The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this paper, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages. Once possible errors have been detected, we examine each error candidate and hand-correct the corresponding PoS tag if necessary. Overall, based on the three methods, we hand-correct the PoS tagging of 1,334 tokens (0.23% of the tokens) in the corpus. Furthermore, we re-evaluate existing state-of-the-art PoS taggers on Icelandic text using the corrected corpus.
language resources and evaluation | 2007
Hrafn Loftsson
We use integrations and combinations of taggers to improve the tagging accuracy of Icelandic text. The accuracy of the best performing integrated tagger, which consists of our linguistic rule-based tagger for initial disambiguation and a trigram tagger for full disambiguation, is 91.80%. Combining five different taggers, using simple voting, results in 93.34% accuracy. By adding two linguistically motivated rules to the combined tagger, we obtain an accuracy of 93.48%. This method reduces the error rate by 20.5%, with respect to the best performing tagger in the combination pool.
north american chapter of the association for computational linguistics | 2007
Hrafn Loftsson
We describe our linguistic rule-based tagger IceTagger, and compare its tagging accuracy to the TnT tagger, a state-of-the-art statistical tagger, when tagging Icelandic, a morphologically complex language. Evaluation shows that the average tagging accuracy is 91.54% and 90.44%, obtained by IceTagger and TnT, respectively. When tag profile gaps in the lexicon, used by the TnT tagger, are filled with tags produced by our morphological analyser IceMorphy, TnTs tagging accuracy increases to 91.18%.
technical symposium on computer science education | 2003
Elizabeth S. Adams; Orit Hazzan; Hrafn Loftsson; Alison Young
The topic of women in computer science has recently been getting more and more attention. The special issue of the SIGCSE Bulletin inroad (Volume 34, Number 2, published in June 2002) is one of the milestones that indicate this trend. However, a review of this special issue reveals that though the topic is examined from different angles, only one paper addresses it from an international point of view. By presenting statements derived from four countries on four different continents, our panel aims to highlight the topic from this multinational perspective. Throughout the discussion with the audience we hope to identify common interests and to check whether an international agenda with respect to the topic can be formulated.
language resources and evaluation | 2014
Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Hrafn Loftsson; Bente Maegaard; Joseph Mariani; Asunción Moreno; J.E.J.M. Odijk; Stelios Piperidis
conference of the international speech communication association | 2007
Hrafn Loftsson; Eiríkur Rögnvaldsson
NODALIDA | 2007
Hrafn Loftsson; Eiríkur Rögnvaldsson
NODALIDA | 2009
Hrafn Loftsson; Ida Kramarczyk; Sigrún Helgadóttir; Eiríkur Rögnvaldsson