Is this you? Create Your Porfile

Tanel Alumäe

Tallinn University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tanel Alumäe is active.

Explore More

Publication

Featured researches published by Tanel Alumäe.

language and technology conference | 2006

Unlimited vocabulary speech recognition for agglutinative languages

Mikko Kurimo; Antti Puurula; Ebru Arisoy; Vesa Siivola; Teemu Hirsimäki; Janne Pylkkönen; Tanel Alumäe; Murat Saraclar

It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into meaningful parts. Rule-based morphological analyzers can perform this splitting, but due to the handcrafted rules, they also suffer from an out-of-vocabulary problem. In this paper we apply a recently proposed fully automatic and rather language and vocabulary independent way to build sub-word lexica for three different agglutinative languages. We demonstrate the language portability as well by building a successful large vocabulary speech recognizer for each language and show superior recognition performance compared to the corresponding word-based reference systems.

conference of the international speech communication association | 2016

Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration.

Ottokar Tilk; Tanel Alumäe

Automatic speech recognition systems generally produce unpunctuated text which is difficult to read for humans and degrades the performance of many downstream machine processing tasks. This paper introduces a bidirectional recurrent neural network model with attention mechanism for punctuation restoration in unsegmented text. The model can utilize long contexts in both directions and direct attention where necessary enabling it to outperform previous state-of-the-art on English (IWSLT2011) and Estonian datasets by a large margin.

text speech and dialogue | 2004

Large Vocabulary Continuous Speech Recognition for Estonian Using Morphemes and Classes

Tanel Alumäe

This paper describes development of a large vocabulary continuous speaker independent speech recognition system for Estonian. Estonian is an agglutinative language and the number of different word forms is very large, in addition, the word order is relatively unconstrained. To achieve a good language coverage, we use pseudo-morphemes as basic units in a statistical trigram language model. To improve language model robustness, we automatically find morpheme classes and interpolate the morpheme model with the class-based model. The language model is trained on a newspaper corpus of 15 million word forms. Clustered triphones with multiple Gaussian mixture components are used for acoustic modeling. The system with interpolated morpheme language model is found to perform significantly better than the baseline word form trigram system in all areas. The word error rate of the best system is 27.7% which is a 10.0% absolute improvement over the baseline system.

conference of the international speech communication association | 2016

Sage: The New BBN Speech Processing Platform.

Roger Hsiao; Ralf Meermeier; Tim Ng; Zhongqiang Huang; Maxwell Jordan; Enoch Kan; Tanel Alumäe; Jan Silovsky; William Hartmann; Francis Keith; Omer Lang; Man-Hung Siu; Owen Kimball

To capitalize on the rapid development of Speech-to-Text (STT) technologies and the proliferation of open source machine learning toolkits, BBN has developed Sage, a new speech processing platform that integrates technologies from multiple sources, each of which has particular strengths. In this paper, we describe the design of Sage, which allows the easy interchange of STT components from different sources. We also describe our approach for fast prototyping with new machine learning toolkits, and a framework for sharing STT components across different applications. Finally, we report Sage’s state-of-the-art performance on different STT tasks.

international conference on acoustics, speech, and signal processing | 2006

Sentence-Adapted Factored Language Model for Transcribing Estonian Speech

Tanel Alumäe

This work presents a 2-pass recognition method for highly inflected agglutinative languages based on an Estonian large vocabulary recognition task. Morphemes are used as basic recognition units in a standard trigram language model in the first pass. The recognized morphemes are reconstructed back to words using hidden event language model for compound word detection. In the second pass, the vocabulary from N-best sentence candidates from the first pass is used to create an adaptive sentence-specific word-based language model which is applied for rescoring the N-best hypotheses. The sentence specific language model is based on the factored language model paradigm and estimates word probabilities based on the preceding two words and part-of-speech tags. The method achieves a 7.3% relative word error rate improvement over the baseline system that is used in the first pass

language resources and evaluation | 2017

Modeling under-resourced languages for speech recognition

Mikko Kurimo; Seppo Enarvi; Ottokar Tilk; Matti Varjokallio; André Mansikkaniemi; Tanel Alumäe

One particular problem in large vocabulary continuous speech recognition for low-resourced languages is finding relevant training data for the statistical language models. Large amount of data is required, because models should estimate the probability for all possible word sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem with the data is the huge amount of different word forms that are common in normal speech. The same problem exists also in other language technology applications such as machine translation, information retrieval, and in some extent also in other morphologically rich languages. In this paper we present methods and evaluations in four recent language modeling topics: selecting conversational data from the Internet, adapting models for foreign words, multi-domain and adapted neural network language modeling, and decoding with subword units. Our evaluations show that the same methods work in more than one language and that they scale down to smaller data resources.

conference of the international speech communication association | 2016

Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages.

Tanel Alumäe; Stavros Tsakalidis; Richard M. Schwartz

This paper proposes several improvements to multilingual training of neural network acoustic models for speech recognition and keyword spotting in the context of low-resource languages. We concentrate on the stacked architecture where the first network is used as a bottleneck feature extractor and the second network as the acoustic model. We propose to improve multilingual training when the amount of data from different languages is very different by applying balancing scalers to the training examples. We also explore how to exploit multilingual data to train the second neural network of the stacked architecture. An ensemble training method that can take advantage of both unsupervised pretraining as well as multilingual training is found to give the best speech recognition performance across a wide variety of languages, while system combination of differently trained multilingual models results in further improvements in keyword search performance.

Archive | 2008

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages

Mikko Kurimo; Murat Saraclar; Teemu Hirsimäki; Tanel Alumäe

Automatic Speech Recognition (ASR) systems utilize statistical acoustic and language models to find the most probable word sequence when the speech signal is given. Hidden Markov Models (HMMs) are used as acoustic models and language model probabilities are approximated using n-grams where the probability of a word is conditioned on n-1 previous words. The n-gram probabilities are estimated by Maximum Likelihood Estimation. One of the problems in n-gram language modeling is the data sparseness that results in non-robust probability estimates especially for rare and unseen n-grams. Therefore, smoothing is applied to produce better estimates for these n-grams. The traditional n-gram word language models are commonly used in state-of-the-art Large Vocabulary Continuous Speech Recognition (LVSCR) systems. These systems result in reasonable recognition performances for languages such as English and French. For instance, broadcast news (BN) in English can now be recognized with about ten percent word error rate (WER) (NIST, 2000) which results in mostly quite understandable text. Some rare and new words may be missing in the vocabulary but the result has proven to be sufficient for many important applications, such as browsing and retrieval of recorded speech and information retrieval from the speech (Garofolo et al., 2000). However, LVCSR attempts with similar systems in agglutinative languages, such as Finnish, Estonian, Hungarian and Turkish so far have not resulted in comparable performance to the English systems. The main reason of this performance deterioration in those languages is their rich morphological structure. In agglutinative languages, words are formed mainly by concatenation of several suffixes to the roots and together with compounding and inflections this leads to millions of different, but still frequent word forms. Therefore, it is practically impossible to build a word-based vocabulary for speech recognition in agglutinative languages that would cover all the relevant words. If words are used as language modeling units, there will be many out-of-vocabulary (OOV) words due to using limited vocabulary sizes in ASR systems. It was shown that with an optimized 60K lexicon O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

Archive | 2015

Evaluation of Automatic Speech Recognition Prototype for Estonian Language in Radiology Domain: A Pilot Study

Andrus Paats; Tanel Alumäe; Einar Meister; Ivo Fridolin

The aim of this study was to determine the dictation error rates in finalized radiology reports generated with a new automatic speech recognition (ASR) technology prototype for the Estonian language.

controlled natural language | 2012

Controlled Natural Language in Speech Recognition Based User Interfaces

Kaarel Kaljurand; Tanel Alumäe

In this paper we discuss how controlled natural language can be used in speech recognition based user interfaces. We have implemented a set of Estonian speech recognition grammars, a speech recognition server with support for grammar-based speech recognition, an Android app that mediates the communication between end-user Android apps and the speech recognition server, and an end-user Android app that lets the user execute various commands and queries via Estonian speech. The overall architecture is open and modular, offers high precision speech recognition, and greatly simplifies the building of mobile apps with a speech-based user interface. Although our system and resources were developed with the Estonian speaker in mind and currently target a small number of domains, our results are largely language and domain independent.

Explore More