Juha Häkkinen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juha Häkkinen is active.

Explore More

Publication

Featured researches published by Juha Häkkinen.

ieee automatic speech recognition and understanding workshop | 2001

n-gram and decision tree based language identification for written words

Juha Häkkinen; Jilei Tian

As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models becomes increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods proposed in the literature require fairly large amounts of text, which may not always be available. This paper describes a text-based language identification system developed for the identification of the language of short words, e.g., proper names. Two different approaches are compared. The n-gram method commonly used in the literature is first reviewed and further enhanced. We also propose a simple method for language identification that is based on decision trees. The methods are first evaluated in a text-based language identification task. Both methods are also tested as preprocessors for a multilingual speech recognition task, where the language of each text item has to be determined, in order to choose the correct text-to-pronunciation mapping. The experimental results show that the proposed methods perform very well, and merit further development.

international conference on acoustics, speech, and signal processing | 2001

Robust end-of-utterance detection for real-time speech recognition applications

Ramalingam Hariharan; Juha Häkkinen; Kari Laurila

We propose a sub-band energy based end-of-utterance algorithm that is capable of detecting the time instant when the user has stopped speaking. The proposed algorithm finds the time instant at which many enough sub-band spectral energy trajectories fall and stay for a pre-defined fixed time below adaptive thresholds, i.e. a non-speech period is detected after the end of the utterance. With the proposed algorithm a practical speech recognition system can give timely feedback for the user, thereby making the behaviour of the speech recognition system more predictable and similar across different usage environments and, noise conditions. The proposed algorithm is shown to be more accurate and noise robust than the previously proposed approaches. Experiments with both isolated command word recognition and continuous digit recognition in various noise conditions verify the viability of the proposed approach with an average proper end-of-utterance detection rate of around 94% in both cases, representing 43% error rate reduction over the most competitive previously published method.

Speech Communication | 2003

Assessing text-to-phoneme mapping strategies in speaker independent isolated word recognition

Juha Häkkinen; Janne Suontausta; Søren Riis; Kåre Jean Jensen

Abstract A phonetic transcription of the vocabulary, i.e., a lexicon, is needed in sub-word based speech recognition and text-to-speech systems. Decision trees and neural networks have successfully been used for creating lexicons on-line from an open vocabulary. We briefly review these methods and compare them in detail in the text-to-phoneme mapping task as part of a phoneme based speaker independent speech recognizer. The decision tree and neural network based methods were first evaluated in terms of phoneme accuracy and then in extensive speech recognition tests. American english dictionaries and speech databases were used in all experiments. The decision tree based method achieved high phoneme accuracies when the training material covered the test vocabulary well. In typical speech recognition tests, the recognition rates obtained using the decision tree based lexicons were close to the baseline that was obtained using accurate transcriptions. Although the lexicons obtained using neural networks resulted in somewhat lower baseline recognition rates, they provided slightly better results in generalization tests. Moreover, when the neural network based mappings were appended with a look-up table comprising the most likely vocabulary items, which would be the practical set-up, their performance increased significantly. The main advantage of neural networks over decision trees is their low memory consumption.

international conference on acoustics, speech, and signal processing | 2000

Fast decoding in large vocabulary name dialing

Janne Suontausta; Juha Häkkinen; Olli Viikki

The fast decoding problem is a key challenge virtually in all practical real-time speech recognition systems since model decoding is still by far the most time-consuming operation in automatic speech recognition (ASR) systems. In current speech recognizers, there is typically a trade-off between the desired vocabulary size, the processing power available for speech recognition, and the recognition accuracy. Fast decoding methods are often needed in order to meet the real-time requirements set for a system. The use of these methods may of course not degrade the recognition accuracy. In this paper, we investigate the performance of efficient decoding methods in large vocabulary name dialing. Tree-structured lexicon, fast observation probability evaluation, and adaptive Viterbi beam search are developed and integrated in a name dialing system. The system is tested with lexicons ranging from 100 to 3000 entries. With the lexicon of 1000 words the utilization of the fast decoding methods speeds up the system by 282%. The speed-up degrades the recognition accuracy as little as 0.95%.

Archive | 2001