Jesper Olsen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jesper Olsen is active.

Explore More

Publication

Featured researches published by Jesper Olsen.

international conference on multimodal interfaces | 2006

Short message dictation on Symbian series 60 mobile phones

E. Karpov; Imre Kiss; Jussi Leppänen; Jesper Olsen; Daniela Oria; S. Sivadas; Jilei Tian

Dictation of natural language text on embedded mobile devices is a challenging task. First, it involves memory and CPU-efficient implementation of robust speech recognition algorithms that are generally resource demanding. Secondly, the acoustic and language models employed in the recognizer require the availability of suitable text and speech language resources, typically for a wide set of languages. Thirdly, a proper design of the UI is also essential. The UI has to provide intuitive and easy means for dictation and error correction, and must be suitable for a mobile usage scenario. In this demonstrator, an embedded speech recognition system for short message (SMS) dictation in US English is presented. The system is running on Nokia Series 60 mobile phones (e.g., N70, E60). The systems vocabulary is 23 thousand words. Its Flash and RAM memory footprints are small, 2 and 2.5 megabytes, respectively. After a short enrollment session, most native speakers can achieve a word accuracy of over 90% when dictating short messages in quiet or moderately noisy environments.

international conference on audio, language and image processing | 2010

Multi-layered features with SVM for Chinese accent identification

Jue Hou; Yi Liu; Thomas Fang Zheng; Jesper Olsen; Jilei Tian

In this paper, we propose an approach of multi-layered feature combination associated with support vector machine (SVM) for Chinese accent identification. The multi-layered features include both segmental and suprasegmental information, such as MFCC and pitch contour, to capture the diversity of variations in Chinese accented speech. The pitch contour is estimated using cubic polynomial method to model the variant characters in different accents in Chinese. We train two GMM acoustic models in order to express the features of a certain accent. As the original criterion of the GMM model cannot deal with such multi-layered features, the SVM is utilized to make the decision. The effectiveness of the proposed approach was evaluated on the 863 Chinese accent corpus. Our approach yields a significant 10% relative error rate reduction compared with traditional approaches using sole feature at single level in Chinese accented speech identification.

international conference on acoustics, speech, and signal processing | 2008

A decoder for large vocabulary continuous short message dictation on embedded devices

Jesper Olsen; Yang Cao; Guo-Hong Ding; Xinxing Yang

We present our recent progress towards implementing large vocabulary continuous SMS dictation in embedded devices. The dictation engine we describe here is based on the popular finite state transducer paradigm and is capable of handling large vocabularies and high order n-gram language models in a small memory footprint - even relative to what is available in current high end devices such as the Nokia N800 Internet tablet and the N95 Symbian phone. We illustrate the performance of the engine on a 20k vocabulary Chinese Mandarin dictation task which requires less than 10Mb RAM memory to run on the device. The accuracy of the continuous engine is similar to the accuracy of the isolated word dictation engine we have previously developed.

international conference on mobile technology applications and systems | 2007

Mandarin short message dictation on Symbian series 60 mobile phones

Jari Alhonen; Yang Cao; Guo-Hong Ding; Ying Liu; Jesper Olsen; Xia Wang; Xinxing Yang

Despite having limited keypads, mobile phones are nevertheless widely used for composing text messages, and the need to process text on the mobile is growing all the time. Speech dictation is a possible solution, but has until recently not been a viable option due the limited computational resources available on mobile phones. We have previously implemented mobile speech dictation for English and other European languages. Chinese differs in many ways from European languages - Chinese is a tonal, syllabic language and written Chinese is not based on an alphabet. From a UI perspective Chinese text input is arguably more challenging than text input in European languages. In this paper we describe how we have ported our UI and dictation engine to support Mandarin Chinese. The system has been implemented to run in real time on the MCU of Nokia S60 mobile phones (E50, E60, E62 and N73).

international conference on multimedia and expo | 2011

Reliable accent specific unit generation with dynamic Gaussian mixture selection for multi-accent speech recognition

Chao Zhang; Yi Liu; Yunqing Xia; Thomas Fang Zheng; Jesper Olsen; Jilei Tian

Multiple accents are often present in Mandarin speech, as most Chinese have learned Mandarin as a second language. We propose generating reliable accent specific unit together with dynamic Gaussian mixture selection for multi-accent speech recognition. Time alignment phoneme recognition is used to generate such unit and to model accent variations explicitly and accurately. Dynamic Gaussian mixture selection scheme builds a dynamical observation density for each specified frame in decoding, and leads to use Gaussian mixture component efficiently. This method increases the covering ability for a diversity of accent variations in multi-accent, and alleviates the performance degradation caused by pruned beam search without augmenting the model size. The effectiveness of this approach is evaluated on three typical Chinese accents Chuan, Yue and Wu. Our approach outperforms traditional acoustic model reconstruction approach significantly by 6.30%, 4.93% and 5.53%, respectively on Syllable Error Rate (SER) reduction, without degrading on standard speech.

2011 International Conference on Speech Database and Assessments (Oriental COCOSDA) | 2011

Development of Hindi mobile communication text and speech corpus

Shweta Sinha; S. S. Agrawal; Jesper Olsen

This paper describes the collection of a text and audio corpus for mobile personal communication in Hindi. Hindi is the largest of the Indian languages, and is the first language for more than 200 million people who use it not only for spoken mobile communication but also for sending text messages to each other. The main script for Hindi is Devanagari, but it is not well supported by the current generation of mobile devices. The Devanagari alphabet is twice as large as for English which makes it difficult to fit onto the small keypad of a mobile device. The aim of this project is to collect text and speech resources which can be used for training spoken language systems that aide text messaging on mobile devices - i.e. train a speech recogniser for the mobile personal communication domain so that text can be input through dictation rather than by typing. In total we collected a text corpus of 2 million words of natural messages in 12 different domains, and a spoken corpus of 100 speakers who each spoke 630 phonetically rich sentences - about 4 hours of speech. The speech utterances were recorded in 16 kHz through 3 recording channels: a mobile phone, a headset and a desktop mounted microphone. The data sets were properly annotated and available for development of speech recognition / synthesis systems in mobile domain.

international symposium on chinese spoken language processing | 2010

Using cepstral and prosodic features for Chinese accent identification

Jue Hou; Yi Liu; Thomas Fang Zheng; Jesper Olsen; Jilei Tian

In this paper, we propose an approach for Chinese accent identification using both cepstral and prosodic features with gender-dependent model. We exploit a combination of conventional Shifted Delta Cepstrum (SDC) features and pitch contour features as an example of segmental and suprasegmental features, to capture the characteristics in Chinese accents. We use cubic polynomials to estimate the pitch contour segments in order to model the differences within accents. We train gender-dependent GMM acoustic models to express the features in order to deal with the gender variation. Since conventional criterion of the GMM assumption cannot solve those multi-feature problems, we use the support vector machine (SVM) to make the decision. We evaluated the effectiveness of the proposed approach on the 863 Chinese accent database. The result shows that our approach yields a 15.5% relative error rate reduction compared to conventional approaches of using only SDC features.

computational intelligence and security | 2011

Voice-based Local Search Using a Language Model Look-ahead Structure for Efficient Pruning

Yao Lu; Gang Liu; Wei Chen; Jesper Olsen

on mobile terminals, voice-based local search services are quickly becoming a new important application. Voice search is essentially a large vocabulary speech recognition task with an open ended vocabulary, and this is a problem because speed and accuracy are essential for a good user experience. Fortunately when a user submits a local search query, contextual information such as the users current position can be used for constraining the full search space. In this paper, we use local information and present a pruning algorithm based on a LMLA (Language Model Look-Ahead) tree, which can significantly improve both the speed and the accuracy of the voice search system.

Archive | 2010