Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Olli Viikki is active.

Publication


Featured researches published by Olli Viikki.


Speech Communication | 1998

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Olli Viikki; Kari Laurila

Abstract To date, speech recognition systems have been applied in real world applications in which they must be able to provide a satisfactory recognition performance under various noise conditions. However, a mismatch between the training and testing conditions often causes a drastic decrease in the performance of the systems. In this paper, we propose a segmental feature vector normalization technique which makes an automatic speech recognition system more robust to environmental changes by normalizing the output of the signal-processing front-end to have similar segmental parameter statistics in all noise conditions. The viability of the suggested technique was verified in various experiments using different background noises and microphones. In an isolated word recognition task, the proposed normalization technique reduced the error rates by over 70% in noisy conditions with respect to the baseline tests, and in a microphone mismatch case, over 75% error rate reduction was achieved. In a multi-environment speaker-independent connected digit recognition task, the proposed method reduced the error rates by over 16%.


international conference on acoustics speech and signal processing | 1998

A recursive feature vector normalization approach for robust speech recognition in noise

Olli Viikki; David Bye; Kari Laurila

The acoustic mismatch between testing and training conditions is known to severely degrade the performance of speech recognition systems. Segmental feature vector normalization was found to improve the noise robustness of mel-frequency cepstral coefficients (MFCC) feature vectors and to outperform other state-of-the-art noise compensation techniques in speaker-dependent recognition. The objective of feature vector normalization is to provide environment-independent parameter statistics in all noise conditions. We propose a more efficient implementation approach for feature vector normalization where the normalization coefficients are computed in a recursive way. Speaker-dependent recognition experiments show that the recursive normalization approach obtains over 60%, the segmental method approximately 50%, and parallel model combination a 14% overall error rate reduction, respectively. Moreover, in the recursive case, this performance gain is obtained with the smallest implementation costs. Also in speaker-independent connected digit recognition, over a 16% error rate reduction is obtained with the proposed feature vector normalization approach.


international conference on acoustics, speech, and signal processing | 2006

Multi-Lingual Speaker-Independent Voice User Interface For Mobile Devices

Juha Iso-Sipilä; Marko Moberg; Olli Viikki

This paper presents a multi-lingual speaker-independent voice user interface (UI) that has been implemented for Nokia S60 mobile phones. The paper concentrates on discussing the specific approach used for achieving a multi-lingual and configurable speech recognition and speech synthesis system. The main applications are speaker-independent name dialing and voice commands. The novelty of the applications is that the user does not need to train the voice dialing system but the application reads the users phonebook and generates the required voice tags automatically. The speaker-independent voice dialing has already been introduced in regions where the language diversity is not so great. The system presented in this paper is the first of its kind to support both speech recognition and speech synthesis in more than 40 languages in embedded devices with strict memory and performance requirements


ieee automatic speech recognition and understanding workshop | 2001

ASR in portable wireless devices

Olli Viikki

This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the authors background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there are ASR trends towards speaker-independent systems. As these modern communication devices are usually intended for mass markets, the paper reviews the ASR areas that are relevant for speech recognition on low cost embedded systems. In particular, multilingual ASR, low complexity ASR algorithms and their implementation, and acoustic model adaptation techniques play a key role in enabling cost effective realization of ASR systems. Low complexity and advanced noise robust ASR algorithms are sometimes conflicting concepts. The paper also briefly reviews some of the most important noise robust ASR techniques that are well suited for embedded systems.


international conference on acoustics speech and signal processing | 1998

A combination of discriminative and maximum likelihood techniques for noise robust speech recognition

Kari Laurila; Marcel Vasilache; Olli Viikki

We study how discriminative and maximum likelihood (ML) techniques should be combined in order to maximize the recognition accuracy of a speaker-independent automatic speech recognition (ASR) system that includes speaker adaptation. We compare two training approaches for the speaker-independent case and examine how well they perform together with four different speaker adaptation schemes. In a noise robust connected digit recognition task we show that the minimum classification error (MCE) training approach for speaker-independent modelling together with the Bayesian speaker adaptation scheme provide the highest classification accuracy over the whole lifespan of an ASR system. With the MCE training we are capable of reducing the recognition errors by 30% over the ML approach in the speaker-independent case. With the Bayesian speaker adaptation scheme we can further reduce the error rates by 62% using only as few as five adaptation utterances.


international conference on acoustics, speech, and signal processing | 2004

On a practical design of a low complexity speech recognition engine

Marcel Vasilache; Juha Iso-Sipilä; Olli Viikki

We outline the main design features of a low complexity speech recognition engine targeted for mobile devices. Although major parts have already been presented, new features and important refinements of the original ideas, which were omitted, are now described. We also show how these techniques can be successfully combined in order to achieve various design targets with minimized impact on the recognition performance.


international conference on acoustics, speech, and signal processing | 2000

Low complexity speaker independent command word recognition in car environments

Søren Riis; Olli Viikki

In this paper we compare a standard HMM based recognizer to a highly parameter efficient hybrid denoted hidden neural network (HNN). The comparison was done on a speaker independent command word recognition task aimed at car hands-free applications. Monophone based HMM and HNN recognizers were initially trained on clean Wall Street Journal British English data. Evaluation of these baseline models on noisy car speech data indicated superior performance of the HMMs. After smoothing to the car environment, however, an HNN with 28k parameters provided a relative error rate reduction of 23-53% over HMMs containing 21k-168k parameters. Due to the low number of parameters in the HNNs, they have a real-time decoding complexity 2-4 times below that of comparable HMMs. The low memory and computational requirements of the HNN makes it particularly attractive for implementation on portable commercial hardware like mobile phones and personal digital assistants.


international conference on acoustics, speech, and signal processing | 2000

Fast decoding in large vocabulary name dialing

Janne Suontausta; Juha Häkkinen; Olli Viikki

The fast decoding problem is a key challenge virtually in all practical real-time speech recognition systems since model decoding is still by far the most time-consuming operation in automatic speech recognition (ASR) systems. In current speech recognizers, there is typically a trade-off between the desired vocabulary size, the processing power available for speech recognition, and the recognition accuracy. Fast decoding methods are often needed in order to meet the real-time requirements set for a system. The use of these methods may of course not degrade the recognition accuracy. In this paper, we investigate the performance of efficient decoding methods in large vocabulary name dialing. Tree-structured lexicon, fast observation probability evaluation, and adaptive Viterbi beam search are developed and integrated in a name dialing system. The system is tested with lexicons ranging from 100 to 3000 entries. With the lexicon of 1000 words the utilization of the fast decoding methods speeds up the system by 282%. The speed-up degrades the recognition accuracy as little as 0.95%.


international conference on signal processing | 2000

Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients

Xia Wang; Yuan Dong; Juha Hakkinen; Olli Viikki

Speaker-dependent, or speaker-trained, isolated word recognition is a key technology behind automatic name dialling systems. In this paper, we investigate how the feature extraction process should be modified so that a maximum recognition rate could be achieved in Chinese name dialling under clean and noisy operating conditions. Our experimental results indicate that the use of higher-order cepstral coefficients improved the recognition rate by 30%. This performance gain is due to the fact that the higher-order cepstral coefficients are expected to carry tonal information. Noise robustness of a system could be improved by integrating the second-order time derivatives in the final feature vector.


Speech Communication | 2002

An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition

Ramalingam Hariharan; Olli Viikki

Abstract Inter-speaker variability and sensitivity to background noise are two major problems in modern speech recognition systems. In this paper, we investigate different techniques that have been developed to overcome these issues. These methods include vocal tract length normalisation (VTLN), on-line HMM adaptation and gender-dependent acoustic modelling. Our objective in this paper is to combine these techniques so that the system recognition performance is maximised. Moreover, we propose a vocal tract length normalisation technique, which is more implementation-friendly than the previously published utterance-specific VTLN (u-VTLN). In order to ensure the wide applicability of the methods to be studied, the performance evaluation is done both in connected digit recognition and monophone-based isolated word recognition. The recognition results obtained indicate the importance of the combined use of these techniques. The integrated use of VTLN and on-line adaptation always provided the highest performance in both types of recognition experiments using gender-independent models. As expected, on-line HMM adaptation provided the major performance improvement with respect to a gender- and speaker-independent baseline system. The combination of speaker-specific VTLN (s-VTLN) or gender-dependent acoustic modelling further improved the system accuracy. However, while the joint use of s-VTLN and gender-dependent HMMs improved the recognition rate with original unadapted models, a minor performance degradation was observed when s-VTLN was applied to on-line adapted gender-dependent HMMs.

Collaboration


Dive into the Olli Viikki's collaboration.

Top Co-Authors

Avatar

Jukka Saarinen

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mikko Harju

Tampere University of Technology

View shared research outputs
Top Co-Authors

Avatar

Petri Salmela

Tampere University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge