Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hrafn Loftsson is active.

Publication


Featured researches published by Hrafn Loftsson.


Nordic Journal of Linguistics | 2008

Tagging Icelandic text: A linguistic rule-based approach

Hrafn Loftsson

The Icelandic language is a morphologically complex language, for which a large tagset has been created. This paper describes the design of a linguistic rulebased system for part-of-speech tagging Icelandic text. The system contains two main components: a disambiguator, IceTagger, and an unknown word guesser, IceMorphy. IceTagger uses a small number of local elimination rules along with a global heuristics component. The heuristics guess the functional roles of the words in a sentence, mark prepositional phrases, and use the acquired knowledge to force feature agreement where appropriate. IceMorphy is used for guessing the tag profile for unknown words and for automatically filling tag profile gaps in the lexicon. Evaluation shows that IceTagger achieves 91.54% accuracy, a substantial improvement on the highest accuracy, 90.44%, obtained using three state-of-the-art data-driven taggers. Furthermore, the accuracy increases to 92.95% by using IceTagger along with two data-driven taggers in a simple voting scheme. The development time of the tagging system was only 7 man-months, which can be considered a short development period for a linguistic rule-based system.


international conference natural language processing | 2008

A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI)

Anton Karl Ingason; Sigrún Helgadóttir; Hrafn Loftsson; Eiríkur Rögnvaldsson

We present a new mixed method lemmatizer for Icelandic, Lemmald, which achieves good performance by relying on IceTagger [1] for tagging and The Icelandic Frequency Dictionary [2] corpus for training. We combine the advantages of data-driven machine learning with linguistic insights to maximize performance. To achieve this, we make use of a novel approach: Hierarchy of Linguistic Identities (HOLI), which involves organizing features and feature structures for the machine learning based on linguistic knowledge. Accuracy of the lemmatization is further improved using an add-on which connects to the Database of Modern Icelandic Inflections [3]. Given correct tagging, our system lemmatizes Icelandic text with an accuracy of 99.55%. We believe our method can be fruitfully adapted to other morphologically rich languages.


meeting of the association for computational linguistics | 2009

Correcting a POS-Tagged Corpus Using Three Complementary Methods

Hrafn Loftsson

The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this paper, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages. Once possible errors have been detected, we examine each error candidate and hand-correct the corresponding PoS tag if necessary. Overall, based on the three methods, we hand-correct the PoS tagging of 1,334 tokens (0.23% of the tokens) in the corpus. Furthermore, we re-evaluate existing state-of-the-art PoS taggers on Icelandic text using the corrected corpus.


language resources and evaluation | 2007

Tagging Icelandic text: an experiment with integrations and combinations of taggers

Hrafn Loftsson

We use integrations and combinations of taggers to improve the tagging accuracy of Icelandic text. The accuracy of the best performing integrated tagger, which consists of our linguistic rule-based tagger for initial disambiguation and a trigram tagger for full disambiguation, is 91.80%. Combining five different taggers, using simple voting, results in 93.34% accuracy. By adding two linguistically motivated rules to the combined tagger, we obtain an accuracy of 93.48%. This method reduces the error rate by 20.5%, with respect to the best performing tagger in the combination pool.


north american chapter of the association for computational linguistics | 2007

Tagging Icelandic Text using a Linguistic and a Statistical Tagger

Hrafn Loftsson

We describe our linguistic rule-based tagger IceTagger, and compare its tagging accuracy to the TnT tagger, a state-of-the-art statistical tagger, when tagging Icelandic, a morphologically complex language. Evaluation shows that the average tagging accuracy is 91.54% and 90.44%, obtained by IceTagger and TnT, respectively. When tag profile gaps in the lexicon, used by the TnT tagger, are filled with tags produced by our morphological analyser IceMorphy, TnTs tagging accuracy increases to 91.18%.


technical symposium on computer science education | 2003

International perspective of women and computer science

Elizabeth S. Adams; Orit Hazzan; Hrafn Loftsson; Alison Young

The topic of women in computer science has recently been getting more and more attention. The special issue of the SIGCSE Bulletin inroad (Volume 34, Number 2, published in June 2002) is one of the milestones that indicate this trend. However, a review of this special issue reveals that though the topic is examined from different angles, only one paper addresses it from an international point of view. By presenting statements derived from four countries on four different continents, our panel aims to highlight the topic from this multinational perspective. Throughout the discussion with the audience we hope to identify common interests and to check whether an international agenda with respect to the topic can be formulated.


language resources and evaluation | 2014

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)

Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Hrafn Loftsson; Bente Maegaard; Joseph Mariani; Asunción Moreno; J.E.J.M. Odijk; Stelios Piperidis


conference of the international speech communication association | 2007

IceNLP: a natural language processing toolkit for icelandic.

Hrafn Loftsson; Eiríkur Rögnvaldsson


NODALIDA | 2007

IceParser: An Incremental Finite-State Parser for Icelandic

Hrafn Loftsson; Eiríkur Rögnvaldsson


NODALIDA | 2009

Improving the PoS tagging accuracy of Icelandic text

Hrafn Loftsson; Ida Kramarczyk; Sigrún Helgadóttir; Eiríkur Rögnvaldsson

Collaboration


Dive into the Hrafn Loftsson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge