Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kari Laurila is active.

Publication


Featured researches published by Kari Laurila.


Speech Communication | 1998

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Olli Viikki; Kari Laurila

Abstract To date, speech recognition systems have been applied in real world applications in which they must be able to provide a satisfactory recognition performance under various noise conditions. However, a mismatch between the training and testing conditions often causes a drastic decrease in the performance of the systems. In this paper, we propose a segmental feature vector normalization technique which makes an automatic speech recognition system more robust to environmental changes by normalizing the output of the signal-processing front-end to have similar segmental parameter statistics in all noise conditions. The viability of the suggested technique was verified in various experiments using different background noises and microphones. In an isolated word recognition task, the proposed normalization technique reduced the error rates by over 70% in noisy conditions with respect to the baseline tests, and in a microphone mismatch case, over 75% error rate reduction was achieved. In a multi-environment speaker-independent connected digit recognition task, the proposed method reduced the error rates by over 16%.


international conference on acoustics speech and signal processing | 1998

A recursive feature vector normalization approach for robust speech recognition in noise

Olli Viikki; David Bye; Kari Laurila

The acoustic mismatch between testing and training conditions is known to severely degrade the performance of speech recognition systems. Segmental feature vector normalization was found to improve the noise robustness of mel-frequency cepstral coefficients (MFCC) feature vectors and to outperform other state-of-the-art noise compensation techniques in speaker-dependent recognition. The objective of feature vector normalization is to provide environment-independent parameter statistics in all noise conditions. We propose a more efficient implementation approach for feature vector normalization where the normalization coefficients are computed in a recursive way. Speaker-dependent recognition experiments show that the recursive normalization approach obtains over 60%, the segmental method approximately 50%, and parallel model combination a 14% overall error rate reduction, respectively. Moreover, in the recursive case, this performance gain is obtained with the smallest implementation costs. Also in speaker-independent connected digit recognition, over a 16% error rate reduction is obtained with the proposed feature vector normalization approach.


Multimedia Tools and Applications | 2009

Semantic ambient media--an introduction

Artur Lugmayr; Thomas Risse; Bjoern Stockleben; Kari Laurila; Juha Kaario

The medium is the message! And the message was literacy, media democracy and music charts. Mostly one single distinguishable medium such as TV, the Web, the radio, or books transmitted the message. Now in the age of ubiquitous and pervasive computing, where information flows through a plethora of distributed interlinked media—what is the message ambient media will tell us? What does semantic mean in this context? Which experiences will it open to us? What is content in the age of ambient media? Ambient media are embedded throughout the natural environment of the consumer—in his home, in his car, in restaurants, and on his mobile device. Predominant sample services are smart wallpapers in homes, location based services, RFID based entertainment services for children, or intelligent homes. The goal of this article is to define semantic ambient media and discuss the contributions to the Semantic Ambient Media Experience (SAME) workshop, which was held in conjunction with the ACM Multimedia conference in Vancouver in 2008. The results of the workshop can be found on: www.ambientmediaassociation.org.


international conference on acoustics, speech, and signal processing | 1997

Noise robust speech recognition with state duration constraints

Kari Laurila

In this paper, we present a method to incorporate and re-estimate state duration constraints within the maximum likelihood training of hidden Markov models. In the recognition phase we find the optimal state sequence fulfilling the state duration constraints obtained in the training phase. Our target is to get speaker-dependent training and recognition to perform well with a very small amount of training data in the case of mismatch between the training and testing environments. We take advantage of the fact that speakers tend to preserve their speaking style in similar situations (e.g. when speaking to a machine) and our main means to reach the target is to force similar state segmentations in the training and recognition phases. We show that with the proposed method we can substantially improve the robustness of a speech recognizer and decrease the error rates by over 93% when compared with a standard approach.


international conference on acoustics, speech, and signal processing | 2001

Robust end-of-utterance detection for real-time speech recognition applications

Ramalingam Hariharan; Juha Häkkinen; Kari Laurila

We propose a sub-band energy based end-of-utterance algorithm that is capable of detecting the time instant when the user has stopped speaking. The proposed algorithm finds the time instant at which many enough sub-band spectral energy trajectories fall and stay for a pre-defined fixed time below adaptive thresholds, i.e. a non-speech period is detected after the end of the utterance. With the proposed algorithm a practical speech recognition system can give timely feedback for the user, thereby making the behaviour of the speech recognition system more predictable and similar across different usage environments and, noise conditions. The proposed algorithm is shown to be more accurate and noise robust than the previously proposed approaches. Experiments with both isolated command word recognition and continuous digit recognition in various noise conditions verify the viability of the proposed approach with an average proper end-of-utterance detection rate of around 94% in both cases, representing 43% error rate reduction over the most competitive previously published method.


Multimedia Tools and Applications | 2009

Special issue on semantic ambient media experience

Artur Lugmayr; Thomas Risse; Bjoern Stockleben; Juha Kaario; Kari Laurila

It is our great pleasure to welcome you to this special issue, which collected the best papers form the 1 ACM International Workshop on Semantic Ambient Media Experience (NAMU Series) SAME’08. The first article in the special issue contains a full analysis of the contribution to the workshop, as well as it contains the results of the workshop, which was actually a ‘workshop’ in form of a creative think-tank instead of simple paper presentations.


international conference on acoustics speech and signal processing | 1998

A combination of discriminative and maximum likelihood techniques for noise robust speech recognition

Kari Laurila; Marcel Vasilache; Olli Viikki

We study how discriminative and maximum likelihood (ML) techniques should be combined in order to maximize the recognition accuracy of a speaker-independent automatic speech recognition (ASR) system that includes speaker adaptation. We compare two training approaches for the speaker-independent case and examine how well they perform together with four different speaker adaptation schemes. In a noise robust connected digit recognition task we show that the minimum classification error (MCE) training approach for speaker-independent modelling together with the Bayesian speaker adaptation scheme provide the highest classification accuracy over the whole lifespan of an ASR system. With the MCE training we are capable of reducing the recognition errors by 30% over the ML approach in the speaker-independent case. With the Bayesian speaker adaptation scheme we can further reduce the error rates by 62% using only as few as five adaptation utterances.


acm multimedia | 2008

ACM multimedia 2008: 1st workshop on semantic ambient media experiences (SAME2008) namu series

Artur Lugmayr; Bjoern Stockleben; Thomas Risse; Juha Kaario; Kari Laurila

The Semantic Ambient Media Experiences (SAME) workshop series aims at the development of semantic ambient media as new form of media. SAME provides a forum for scientists, practitioners, artists, content producers, industry, and researchers to discuss results in the field of ambient media. The multidisciplinary workshop shall raise awareness and promote collaboration between the leaders in the field of ambient media. The SAME 2008 was the premiere of a series of workshops held in conjunction with ACM Multimedia 2008 in Vancouver, Canada. The result of the workshop shall be an Internet platform for people interested in the field of ambient media. The workshop aims at the creation of a think-tank of creative thinkers with interest in glimpsing the future of semantic ambient media. To join our future activities or our mailing list, please refer to our website on http://namu.cs.tut.fi/acmmm2008/same2008/.


semantic ambient media experiences | 2008

Semantic ambient media experiences same 2008 pre-workshop review (NAMU series)

Artur Lugmayr; Thomas Risse; Björn Stockleben; Juha Kaario; Kari Laurila

The medium is the message! And the message was literacy, media democracy and music charts. Mostly one single distinguishable media such as TV, the Web, the radio, or books transmitted the message. No in the age of ubiquitous and pervasive computing, where information flows through a plethora of distributed interlinked media - what is the message ambient media will tell us? What means semantic in this context? Which experiences will it open to us? What is content in the age of ambient media? Ambient media are embedded throughout the natural environment of the consumer - in his home, in his car, in restaurants, and on his mobile device. Predominant sample services are smart wallpapers in homes, location based services, RFID based entertainment services for children, or intelligent homes. The goal of this paper is to define semantic ambient media and discuss the contributions to the SAME 2008 workshop.


international conference on acoustics, speech, and signal processing | 2000

Name dialing-how useful is it?

Kari Laurila; Petri Haavisto

Progress in automatic speech recognition technology has resulted in an increasing amount of deployed applications. Typically, the measure of success has been the amount of deployed or sold units and it has been much more difficult to evaluate the real user benefit from the technology. Actually, the usability, or usefulness, has largely remained an open issue. In this paper we focus on name dialing and discuss its usefulness from different angles, with a strong emphasis on mass market use and inexperienced users. As a concept, name dialing brings us back from where telephony started: an operator assisted way of making calls without a need to remember numbers. In essence, name dialing offers a solution to a minor inconvenience-using the directory. Even the arguably biggest advantage of name dialing, simplified car usage, is still less significant than the various concerns of users. Until time and further technical improvements alleviate the main concerns, usage of name dialing will remain as an occasional, rather than a primary, way of making calls.

Collaboration


Dive into the Kari Laurila's collaboration.

Researchain Logo
Decentralizing Knowledge