Stig-Arne Grönroos
Helsinki University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stig-Arne Grönroos.
conference of the european chapter of the association for computational linguistics | 2014
Peter Smit; Sami Virpioja; Stig-Arne Grönroos; Mikko Kurimo
Morfessor is a family of probabilistic machine learning methods for finding the morphological segmentation from raw text data. Recent developments include the development of semi-supervised methods for utilizing annotated data. Morfessor 2.0 is a rewrite of the original, widely-used Morfessor 1.0 software, with well documented command-line tools and library interface. It includes new features such as semi-supervised learning, online training, and integrated evaluation code.
Computational Linguistics | 2016
Teemu Ruokolainen; Oskar Kohonen; Kairit Sirts; Stig-Arne Grönroos; Mikko Kurimo; Sami Virpioja
This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.
workshop on statistical machine translation | 2015
Sami Virpioja; Stig-Arne Grönroos
This paper describes the LeBLEU evaluation score for machine translation, submitted to WMT15 Metrics Shared Task. LeBLEU extends the popular BLEU score to consider fuzzy matches between word n-grams. While there are several variants of BLEU that allow to non-exact matches between words either by character-based distance measures or morphological preprocessing, none of them use fuzzy comparison between longer chunks of text. The results on WMT data sets show that fuzzy n-gram matching improves correlations to human evaluation especially for highly compounding languages.
workshop on statistical machine translation | 2015
Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo
This article describes the Aalto University entry to the English-to-Finnish shared translation task in WMT 2015. The system participates in the constrained condition, but in addition we impose some further constraints, using no language-specific resources beyond those provided in the task. We use a morphological segmenter, Morfessor FlatCat, but train and tune it in an unsupervised manner. The system could thus be used for another language pair with a morphologically complex target language, without needing modification or additional resources.
Septentrio Conference Series | 2015
Stig-Arne Grönroos; Kristiina Jokinen; Katri Hiovain; Mikko Kurimo; Sami Virpioja
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statistical model for morphological segmentation of North Sami language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfessor Baseline and FlatCat methods. Aer annotating 237 words with our active learning setup, we improve morph boundary recall over 20% with no loss of precision.
Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016
Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo
This article describes the Aalto University entry to the English-to-Finnish news translation shared task in WMT 2016. Our segmentation method combines the strengths of rule-based and unsupervised morphology. We also attempt to correct errors in the boundary markings by post-processing with a neural morph boundary predictor.
Archive | 2013
Sami Virpioja; Peter Smit; Stig-Arne Grönroos; Mikko Kurimo
international conference on computational linguistics | 2014
Stig-Arne Grönroos; Sami Virpioja; Peter Smit; Mikko Kurimo
Proceedings of the Second Conference on Machine Translation | 2017
Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo
arXiv: Computation and Language | 2018
Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo