Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Reima Karhila is active.

Publication


Featured researches published by Reima Karhila.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Junichi Yamagishi; Bela Usabaev; Simon King; Oliver Watts; John Dines; Jilei Tian; Yong Guan; Rile Hu; Keiichiro Oura; Yi-Jian Wu; Keiichi Tokuda; Reima Karhila; Mikko Kurimo

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an “average voice model” plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on “non-TTS” corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues.


international conference on acoustics, speech, and signal processing | 2010

Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction

Matthew Gibson; Teemu Hirsimäki; Reima Karhila; Mikko Kurimo; William Byrne

This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.


Computer Speech & Language | 2013

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

John Dines; Hui Liang; Lakshmi Saheer; Matthew Gibson; William Byrne; Keiichiro Oura; Keiichi Tokuda; Junichi Yamagishi; Simon King; Mirjam Wester; Teemu Hirsimäki; Reima Karhila; Mikko Kurimo

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.


international conference on acoustics, speech, and signal processing | 2013

HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods

Reima Karhila; Ulpu Remes; Mikko Kurimo

This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective measure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.


IEEE Journal of Selected Topics in Signal Processing | 2014

Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments

Reima Karhila; Ulpu Remes; Mikko Kurimo

This work describes experiments on using noisy adaptation data to create personalized voices with HMM-based speech synthesis. We investigate how environmental noise affects feature extraction and CSMAPLR and EMLLR adaptation. We investigate effects of regression trees and data quantity and test noise-robust feature streams for alignment and NMF-based source separation as preprocessing. The adaptation performance is evaluated using a listening test developed for noisy synthesized speech. The evaluation shows that speaker-adaptive HMM-TTS system is robust to moderate environmental noise.


international conference on acoustics, speech, and signal processing | 2012

Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN

Reima Karhila; Doddipatla Rama Sanand; Mikko Kurimo; Peter Smit

This paper describes experiments in creating personalised childrens voices for HMM-based synthesis by adapting either an adult or child average voice. The adult average voice is trained from a large adult speech database, whereas the child average voice is trained using a small database of childrens speech. Here we present the idea to use stacked transformations for creating synthetic child voices, where the child average voice is first created from the adult average voice through speaker adaptation using all the pooled speech data from multiple children and then adding child specific speaker adaptation on top of it. VTLN is applied to speech synthesis to see whether it helps the speaker adaptation when only a small amount of adaptation data is available. The listening test results show that the stacked transformations significantly improve speaker adaptation for small amounts of data, but the additional benefit provided by VTLN is not yet clear.


spoken language technology workshop | 2010

Unsupervised cross-lingual speaker adaptation for accented speech recognition

Reima Karhila; Mikko Kurimo

In this paper we present investigations on how the acoustic models in automatic speech recognition can be adapted across languages in unsupervised fashion to improve recognition of speech with a foreign accent. Recognition systems were trained on large Finnish and English corpora, and tested both on monolingual and bilingual material. Adaptation with bilingual and monolingual recognisers was compared. We found out that recognition of foreign accented English with help of Finnish adaptation training data from the same speaker was not improved significantly. However, the recognition of native Finnish using foreign accented English adaptation data was improved significantly.


Proceedings of the 7th ISCA Speech Synthesis Workshop | 2010

Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project

Mirjam Wester; John Dines; Matthew Gibson; Hui Liang; Yi-Jian Wu; Lakshmi Saheer; Simon King; Keiichiro Oura; Philip N. Garner; William Byrne; Yong Guan; Teemu Hirsimäki; Reima Karhila; Mikko Kurimo; Matt Shannon; Sayaka Shiota; Jilei Tian; Keiichi Tokuda; Junichi Yamagishi


conference of the international speech communication association | 2009

Thousands of Voices for HMM-based Speech Synthesis

Junichi Yamagishi; Bela Usabaev; Simon King; Oliver Watts; John Dines; Jilei Tian; Rile Hu; Yong Guan; Keiichiro Oura; Keiichi Tokuda; Reima Karhila; Mikko Kurimo


meeting of the association for computational linguistics | 2010

Personalising Speech-To-Speech Translation in the EMIME Project

Mikko Kurimo; William Byrne; John Dines; Philip N. Garner; Matthew Gibson; Yong Guan; Teemu Hirsimäki; Reima Karhila; Simon King; Hui Liang; Keiichiro Oura; Lakshmi Saheer; Matt Shannon; Sayaki Shiota; Jilei Tian

Collaboration


Dive into the Reima Karhila's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

John Dines

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Keiichiro Oura

Nagoya Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Teemu Hirsimäki

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge