Mikko Kurimo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mikko Kurimo is active.

Explore More

Publication

Featured researches published by Mikko Kurimo.

Computer Speech & Language | 2006

Unlimited Vocabulary Speech Recognition with Morph Language Models Applied to Finnish

Teemu Hirsimäki; Mathias Creutz; Vesa Siivola; Mikko Kurimo; Sami Virpioja; Janne Pylkkönen

Abstract In the speech recognition of highly inflecting or compounding languages, the traditional word-based language modeling is problematic. As the number of distinct word forms can grow very large, it becomes difficult to train language models that are both effective and cover the words of the language well. In the literature, several methods have been proposed for basing the language modeling on sub-word units instead of whole words. However, to our knowledge, considerable improvements in speech recognition performance have not been reported. In this article, we present a language-independent algorithm for discovering word fragments in an unsupervised manner from text. The algorithm uses the Minimum Description Length principle to find an inventory of word fragments that is compact but models the training text effectively. Language modeling and speech recognition experiments show that n -gram models built over these fragments perform better than n -gram models based on words. In two Finnish recognition tasks, relative error rate reductions between 12% and 31% are obtained. In addition, our experiments suggest that word fragments obtained using grammatical rules do not outperform the fragments discovered from text. We also present our recognition system and discuss how utilizing fragments instead of words affects the decoding process.

IEEE Transactions on Speech and Audio Processing | 2005

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

John H. L. Hansen; Rongqing Huang; Bowen Zhou; Michael Seadle; John R. Deller; Aparna Gurijala; Mikko Kurimo; Pongtep Angkititrakul

In this study, we discuss a number of issues for audio stream phrase recognition for information retrieval for a new National Gallery of the Spoken Word (NGSW). NGSW is the first largescale repository of its kind, consisting of speeches, news broadcasts, and recordings that are of historical content from the 20 th Century. We propose a system diagram and discuss critical tasks associated with effective audio information retrieval that include: advanced audio segmentation, speech recognition model adaptation for acoustic background noise and speaker variability, and natural language processing for text query requests. A number of questions regarding copyright assessment, metadata construction, digital watermarking must also be addressed for a sustainable audio collection of this magnitude. Our experimental online system entitled “SpeechFind” is presented which allows for audio retrieval from a portion of the NGSW corpus. We discuss a number of research challenges to address the overall task of robust phrase searching in unrestricted audio corpora. 1. Overview The problem of reliable speech recognition for spoken

ACM Transactions on Speech and Language Processing | 2007

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

Mathias Creutz; Teemu Hirsimäki; Mikko Kurimo; Antti Puurula; Janne Pylkkönen; Vesa Siivola; Matti Varjokallio; Ebru Arisoy; Murat Saraclar; Andreas Stolcke

We explore the use of morph-based language models in large-vocabulary continuous-speech recognition systems across four so-called morphologically rich languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are subword units discovered in an unsupervised, data-driven way using the Morfessor algorithm. By estimating n-gram language models over sequences of morphs instead of words, the quality of the language model is improved through better vocabulary coverage and reduced data sparsity. Standard word models suffer from high out-of-vocabulary (OOV) rates, whereas the morph models can recognize previously unseen word forms by concatenating morphs. It is shown that the morph models do perform fairly well on OOVs without compromising the recognition accuracy on in-vocabulary words. The Arabic experiment constitutes the only exception since here the standard word model outperforms the morph model. Differences in the datasets and the amount of data are discussed as a plausible explanation.

IEEE Transactions on Audio, Speech, and Language Processing | 2009

Importance of High-Order N-Gram Models in Morph-Based Speech Recognition

Teemu Hirsimäki; Janne Pylkkönen; Mikko Kurimo

Speech recognition systems trained for morphologically rich languages face the problem of vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions proposed in the literature include increasing the size of the vocabulary and segmenting words into morphs. However, in many cases, the methods have only been experimented with low-order n-gram models or compared to word-based models that do not have very large vocabularies. In this paper, we study the importance of using high-order variable-length n-gram models when the language models are trained over morphs instead of whole words. Language models trained on a very large vocabulary are compared with models based on different morph segmentations. Speech recognition experiments are carried out on two highly inflecting and agglutinative languages, Finnish and Estonian. The results suggest that high-order models can be essential in morph-based speech recognition, even when lattices are generated for two-pass recognition. The analysis of recognition errors reveal that the high-order morph language models improve especially the recognition of previously unseen words.

international conference on acoustics, speech, and signal processing | 2000

Fast latent semantic indexing of spoken documents by using self-organizing maps

Mikko Kurimo

This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TRECs spoken document retrieval databases (www.idiap.ch/kurimo/thisl.html).

language and technology conference | 2006

Unlimited vocabulary speech recognition for agglutinative languages

Mikko Kurimo; Antti Puurula; Ebru Arisoy; Vesa Siivola; Teemu Hirsimäki; Janne Pylkkönen; Tanel Alumäe; Murat Saraclar

It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into meaningful parts. Rule-based morphological analyzers can perform this splitting, but due to the handcrafted rules, they also suffer from an out-of-vocabulary problem. In this paper we apply a recently proposed fully automatic and rather language and vocabulary independent way to build sub-word lexica for three different agglutinative languages. We demonstrate the language portability as well by building a successful large vocabulary speech recognizer for each language and show superior recognition performance compared to the corresponding word-based reference systems.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Junichi Yamagishi; Bela Usabaev; Simon King; Oliver Watts; John Dines; Jilei Tian; Yong Guan; Rile Hu; Keiichiro Oura; Yi-Jian Wu; Keiichi Tokuda; Reima Karhila; Mikko Kurimo

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an “average voice model” plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on “non-TTS” corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues.

cross-language evaluation forum | 2009

Overview and results of Morpho challenge 2009

Mikko Kurimo; Sami Virpioja; Ville T. Turunen; Graeme W. Blackwood; William Byrne

The goal of Morpho Challenge 2009 was to evaluate unsupervised algorithms that provide morpheme analyses for words in different languages and in various practical applications. Morpheme analysis is particularly useful in speech recognition, information retrieval and machine translation for morphologically rich languages where the amount of different word forms is very large. The evaluations consisted of: 1. a comparison to grammatical morphemes, 2. using morphemes instead of words in information retrieval tasks, and 3. combining morpheme and word based systems in statistical machine translation tasks. The evaluation languages were: Finnish, Turkish, German, English and Arabic. This paper describes the tasks, evaluation methods, and obtained results. The Morpho Challenge was part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF.

Virtual Reality | 2011

An augmented reality interface to contextual information

Antti Ajanki; Mark Billinghurst; Hannes Gamper; Toni Järvenpää; Melih Kandemir; Samuel Kaski; Markus Koskela; Mikko Kurimo; Jorma Laaksonen; Kai Puolamäki; Teemu Ruokolainen; Timo Tossavainen

In this paper, we report on a prototype augmented reality (AR) platform for accessing abstract information in real-world pervasive computing environments. Using this platform, objects, people, and the environment serve as contextual channels to more information. The user’s interest with respect to the environment is inferred from eye movement patterns, speech, and other implicit feedback signals, and these data are used for information filtering. The results of proactive context-sensitive information retrieval are augmented onto the view of a handheld or head-mounted display or uttered as synthetic speech. The augmented information becomes part of the user’s context, and if the user shows interest in the AR content, the system detects this and provides progressively more information. In this paper, we describe the first use of the platform to develop a pilot application, Virtual Laboratory Guide, and early evaluation results of this application.

Kohonen Maps | 1999

Indexing Audio Documents by using Latent Semantic Analysis and SOM

Mikko Kurimo

This paper describes an important application for state-of-art automatic speech recognition, natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection and use it for more accurate indexing by generating new index terms and stochastic index weights. Indexing methods are evaluated for two broadcast news databases (one French and one English) using the average document perplexity defined in this paper and test queries analyzed by human experts

Explore More