David Sündermann | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Sündermann is active.

Explore More

Publication

Featured researches published by David Sündermann.

international conference on acoustics, speech, and signal processing | 2006

Text-Independent Voice Conversion Based on Unit Selection

David Sündermann; Harald Höge; Antonio Bonafonte; Hermann Ney; Alan W. Black; Shrikanth Narayanan

So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and large speaker. Since several applications (e.g. speech-to-speech translation or dubbing) require text-independent training, over the last two years, training techniques that use non-parallel data were proposed In this paper, we present a new approach that applies unit selection to find corresponding time frames in source and target speech. By means of a subjective experiment it is shown that this technique achieves the same performance as the conventional text-dependent training

ieee automatic speech recognition and understanding workshop | 2003

VTLN-based cross-language voice conversion

David Sündermann; Hermann Ney; Harald Höge

In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As cross-language voice conversion aims at the transformation of a source speakers voice into that of a target speaker using a different language, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the conventional piece-wise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on cross-language voice conversion are performed on three corpora of two languages and both speaker genders.

international conference on acoustics, speech, and signal processing | 2005

A study on residual prediction techniques for voice conversion

David Sündermann; Antonio Bonafonte; Hermann Ney

Several well-studied voice conversion techniques use line spectral frequencies as features to represent the spectral envelopes of the processed speech frames. In order to return to the time domain, these features are converted to linear predictive coefficients that serve as coefficients of a filter applied to an unknown residual signal. We compare several residual prediction approaches that have already been proposed in the literature dealing with voice conversion. We also present a novel technique that outperforms the others in terms of voice conversion performance and sound quality.

international symposium on signal processing and information technology | 2003

VTLN-based voice conversion

David Sündermann; Hermann Ney

In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As voice conversion aims at the transformation of a source speakers voice into that of a target speaker, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the piecewise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on voice conversion are performed on three corpora of two languages and both speaker genders.

international symposium on signal processing and information technology | 2004

Time domain vocal tract length normalization

David Sündermann; Antonio Bonafonte; Hermann Ney; Harald Höge

Recently, the speaker normalization technique VTLN (vocal tract length normalization), known from speech recognition, was applied to voice conversion. So far, VTLN has been performed in frequency domain. However, to accelerate the conversion process, it is helpful to apply VTLN directly to the time frames of a speech signal. In this paper, we propose a technique which directly manipulates the time signal. By means of subjective tests, it is shown that the performance of voice conversion techniques based on frequency domain and time domain VTLN are equivalent in terms of speech quality, while the latter requires about 20 times less processing time.

ieee automatic speech recognition and understanding workshop | 2005

Residual prediction based on unit selection

David Sündermann; Harald Höge; Antonio Bonafonte; Hermann Ney; Alan W. Black

Recently, we presented a study on residual prediction techniques that can be applied to voice conversion based on linear transformation or hidden Markov model-based speech synthesis. Our voice conversion experiments showed that none of the six compared techniques was capable of successfully converting the voice while achieving a fair speech quality. In this paper, we suggest a novel residual prediction technique based on unit selection that outperforms the others in terms of speech quality (mean opinion score = 3) while keeping the conversion performance

international conference natural language processing | 2003

Synther - a new m-gram POS tagger

David Sündermann; Hermann Ney

The part-of-speech (POS) tagger synther based on m-gram statistics is described. After explaining its basic architecture, three smoothing approaches and the strategy for handling unknown words is exposed. Subsequently, synthers performance is evaluated in comparison with four state-of-the-art POS taggers. All of them are trained and tested on three corpora of different languages and domains. In the course of this evaluation, synther resulted in the lowest error rates or at least below average error rates. Finally, it is shown that the linear interpolation smoothing strategy with coverage-dependent weights features better properties than the two other approaches.

conference of the international speech communication association | 2004