Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stig-Arne Grönroos is active.

Publication


Featured researches published by Stig-Arne Grönroos.


conference of the european chapter of the association for computational linguistics | 2014

Morfessor 2.0: Toolkit for statistical morphological segmentation

Peter Smit; Sami Virpioja; Stig-Arne Grönroos; Mikko Kurimo

Morfessor is a family of probabilistic machine learning methods for finding the morphological segmentation from raw text data. Recent developments include the development of semi-supervised methods for utilizing annotated data. Morfessor 2.0 is a rewrite of the original, widely-used Morfessor 1.0 software, with well documented command-line tools and library interface. It includes new features such as semi-supervised learning, online training, and integrated evaluation code.


Computational Linguistics | 2016

A comparative study of minimally supervised morphological segmentation

Teemu Ruokolainen; Oskar Kohonen; Kairit Sirts; Stig-Arne Grönroos; Mikko Kurimo; Sami Virpioja

This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.


workshop on statistical machine translation | 2015

LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages

Sami Virpioja; Stig-Arne Grönroos

This paper describes the LeBLEU evaluation score for machine translation, submitted to WMT15 Metrics Shared Task. LeBLEU extends the popular BLEU score to consider fuzzy matches between word n-grams. While there are several variants of BLEU that allow to non-exact matches between words either by character-based distance measures or morphological preprocessing, none of them use fuzzy comparison between longer chunks of text. The results on WMT data sets show that fuzzy n-gram matching improves correlations to human evaluation especially for highly compounding languages.


workshop on statistical machine translation | 2015

Tuning Phrase-Based Segmented Translation for a Morphologically Complex Target Language

Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo

This article describes the Aalto University entry to the English-to-Finnish shared translation task in WMT 2015. The system participates in the constrained condition, but in addition we impose some further constraints, using no language-specific resources beyond those provided in the task. We use a morphological segmenter, Morfessor FlatCat, but train and tune it in an unsupervised manner. The system could thus be used for another language pair with a morphologically complex target language, without needing modification or additional resources.


Septentrio Conference Series | 2015

Low-Resource Active Learning of North Sámi Morphological Segmentation

Stig-Arne Grönroos; Kristiina Jokinen; Katri Hiovain; Mikko Kurimo; Sami Virpioja

Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statistical model for morphological segmentation of North Sami language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfessor Baseline and FlatCat methods. Aer annotating 237 words with our active learning setup, we improve morph boundary recall over 20% with no loss of precision.


Proceedings of the First Conference on Machine Translation: Volume 2,#N# Shared Task Papers | 2016

Hybrid Morphological Segmentation for Phrase-Based Machine Translation

Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo

This article describes the Aalto University entry to the English-to-Finnish news translation shared task in WMT 2016. Our segmentation method combines the strengths of rule-based and unsupervised morphology. We also attempt to correct errors in the boundary markings by post-processing with a neural morph boundary predictor.


Archive | 2013

Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline

Sami Virpioja; Peter Smit; Stig-Arne Grönroos; Mikko Kurimo


international conference on computational linguistics | 2014

Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology

Stig-Arne Grönroos; Sami Virpioja; Peter Smit; Mikko Kurimo


Proceedings of the Second Conference on Machine Translation | 2017

Extending hybrid word-character neural machine translation with multi-task learning of morphological analysis.

Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo


arXiv: Computation and Language | 2018

Cognate-aware morphological segmentation for multilingual neural translation.

Stig-Arne Grönroos; Sami Virpioja; Mikko Kurimo

Collaboration


Dive into the Stig-Arne Grönroos's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kairit Sirts

Tallinn University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge