David Sündermann
Siemens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Sündermann.
international conference on acoustics, speech, and signal processing | 2006
David Sündermann; Harald Höge; Antonio Bonafonte; Hermann Ney; Alan W. Black; Shrikanth Narayanan
So far, most of the voice conversion training procedures are text-dependent, i.e., they are based on parallel training utterances of source and large speaker. Since several applications (e.g. speech-to-speech translation or dubbing) require text-independent training, over the last two years, training techniques that use non-parallel data were proposed In this paper, we present a new approach that applies unit selection to find corresponding time frames in source and target speech. By means of a subjective experiment it is shown that this technique achieves the same performance as the conventional text-dependent training
ieee automatic speech recognition and understanding workshop | 2003
David Sündermann; Hermann Ney; Harald Höge
In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As cross-language voice conversion aims at the transformation of a source speakers voice into that of a target speaker using a different language, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the conventional piece-wise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on cross-language voice conversion are performed on three corpora of two languages and both speaker genders.
international conference on acoustics, speech, and signal processing | 2005
David Sündermann; Antonio Bonafonte; Hermann Ney
Several well-studied voice conversion techniques use line spectral frequencies as features to represent the spectral envelopes of the processed speech frames. In order to return to the time domain, these features are converted to linear predictive coefficients that serve as coefficients of a filter applied to an unknown residual signal. We compare several residual prediction approaches that have already been proposed in the literature dealing with voice conversion. We also present a novel technique that outperforms the others in terms of voice conversion performance and sound quality.
international symposium on signal processing and information technology | 2003
David Sündermann; Hermann Ney
In speech recognition, vocal tract length normalization (VTLN) is a well-studied technique for speaker normalization. As voice conversion aims at the transformation of a source speakers voice into that of a target speaker, we want to investigate whether VTLN is an appropriate method to adapt the voice characteristics. After applying several conventional VTLN warping functions, we extend the piecewise linear function to several segments, allowing a more detailed warping of the source spectrum. Experiments on voice conversion are performed on three corpora of two languages and both speaker genders.
international symposium on signal processing and information technology | 2004
David Sündermann; Antonio Bonafonte; Hermann Ney; Harald Höge
Recently, the speaker normalization technique VTLN (vocal tract length normalization), known from speech recognition, was applied to voice conversion. So far, VTLN has been performed in frequency domain. However, to accelerate the conversion process, it is helpful to apply VTLN directly to the time frames of a speech signal. In this paper, we propose a technique which directly manipulates the time signal. By means of subjective tests, it is shown that the performance of voice conversion techniques based on frequency domain and time domain VTLN are equivalent in terms of speech quality, while the latter requires about 20 times less processing time.
ieee automatic speech recognition and understanding workshop | 2005
David Sündermann; Harald Höge; Antonio Bonafonte; Hermann Ney; Alan W. Black
Recently, we presented a study on residual prediction techniques that can be applied to voice conversion based on linear transformation or hidden Markov model-based speech synthesis. Our voice conversion experiments showed that none of the six compared techniques was capable of successfully converting the voice while achieving a fair speech quality. In this paper, we suggest a novel residual prediction technique based on unit selection that outperforms the others in terms of speech quality (mean opinion score = 3) while keeping the conversion performance
international conference natural language processing | 2003
David Sündermann; Hermann Ney
The part-of-speech (POS) tagger synther based on m-gram statistics is described. After explaining its basic architecture, three smoothing approaches and the strategy for handling unknown words is exposed. Subsequently, synthers performance is evaluated in comparison with four state-of-the-art POS taggers. All of them are trained and tested on three corpora of different languages and domains. In the course of this evaluation, synther resulted in the lowest error rates or at least below average error rates. Finally, it is shown that the linear interpolation smoothing strategy with coverage-dependent weights features better properties than the two other approaches.
conference of the international speech communication association | 2004
Hermann Ney; David Sündermann; Antonio Bonafonte; Harald Höge
Procesamiento Del Lenguaje Natural | 2004
David Sündermann; Antonio Bonafonte; Harald Höge; Hermann Ney
international symposium on signal processing and information technology | 2005
David Sündermann; Harald Höge; Antonio Bonafonte; H. Duxans