Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juergen Schroeter is active.

Publication


Featured researches published by Juergen Schroeter.


IEEE Transactions on Speech and Audio Processing | 1994

Techniques for estimating vocal-tract shapes from the speech signal

Juergen Schroeter; Man Mohan Sondhi

This paper reviews methods for mapping from the acoustical properties of a speech signal to the geometry of the vocal tract that generated the signal. Such mapping techniques are studied for their potential application in speech synthesis, coding, and recognition. Mathematically, the estimation of the vocal tract shape from its output speech is a so-called inverse problem, where the direct problem is the synthesis of speech from a given time-varying geometry of the vocal tract and glottis. Different mappings are discussed: mapping via articulatory codebooks, mapping by nonlinear regression, mapping by basis functions, and mapping by neural networks. Besides being nonlinear, the acoustic-to-geometry mapping is also nonunique, i.e., more than one tract geometry might produce the same speech spectrum. The authors show how this nonuniqueness can be alleviated by imposing continuity constraints. >


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987

A hybrid time-frequency domain articulatory speech synthesizer

Man Mohan Sondhi; Juergen Schroeter

High quality speech at low bit rates (e.g., 2400 bits/s) is one of the important objectives of current speech research. As part of long range activity on this problem, we have developed an efficient computer program that will serve as a tool for investigating whether articulatory speech synthesis may achieve this low bit rate. At a sampling frequency of 8 kHz, the most comprehensive version of the program, including nasality and frication, runs at about twice real time on a Cray-1 computer.


Journal of the Acoustical Society of America | 1999

The AT&T Next‐Gen TTS System

Mark C. Beutnagel; Alistair Conkie; Juergen Schroeter; Yannis Stylianou; Ann K. Syrdal

The new AT&T TTS system for general U.S. English text is based on best‐choice components picked from the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR’s CHATR system. From Flextalk, it employs text normalization, letter‐to‐sound, and (optionally) baseline prosody generation. Festival provides general software‐engineering infrastructure (modularity) for easy experimentation and competitive evaluation of different algorithms or modules. Finally, CHATR’s unit selection was modified to guarantee the intelligibility of a good n‐phone (n=2 would be diphone) synthesizer while improving significantly on perceived naturalness relative to Flextalk. Each decision made during the research and development phase of this system was based on formal subjective evaluations. For example, the best voice found in a test that compared TTS systems built from several speakers gave a 0.3‐point head start (on a 5‐point rating scale) in quality over the mean of all speakers. Similarly, using our H...


Proceedings of the IEEE | 2000

Speech and language processing for next-millennium communications services

Richard V. Cox; Candace A. Kamm; Lawrence R. Rabiner; Juergen Schroeter; Jay G. Wilpon

In the future, the world of telecommunications will be vastly different than it is today. The driving force will be the seamless integration of real time communications (e.g. voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. The only currently available ubiquitous access device to the network is the telephone, and the only ubiquitous user access technology mode is spoken voice commands and natural language dialogues with machines. In the future, new access devices and modes will augment speech in this role, but are unlikely to supplant the telephone and access by speech anytime soon. Speech technologies have progressed to the point where they are now viable for a broad range of communications services, including: compression of speech for use over wired and wireless networks; speech synthesis, recognition, and understanding for dialogue access to information, people, and messaging; and speaker verification for secure access to information and services. The paper provides brief overviews of these technologies, discusses some of the unique properties of wireless, plain old telephone service, and Internet protocol networks that make voice communication and control problematic, and describes the types of voice services available in the past and today, and those that we foresee becoming available over the next several years.


Journal of the Acoustical Society of America | 1993

On the use of neural networks in articulatory speech synthesis

Mazin G. Rahim; Colin C. Goodyear; W. Bastiaan Kleijn; Juergen Schroeter; Man Mohan Sondhi

A long‐standing problem in the analysis and synthesis of speech by articulatory description is the estimation of the vocal tract shape parameters from natural input speech. Methods to relate spectral parameters to articulatory positions are feasible if a sufficiently large amount of data is available. This, however, results in a high computational load and large memory requirements. Further, one needs to accommodate ambiguities in this mapping due to the nonuniqueness problem (i.e., several vocal tract shapes can result in identical spectral envelopes). This paper describes the use of artificial neural networks for acoustic to articulatory parameter mapping. Experimental results show that a single feed‐forward neural net is unable to perform this mapping sufficiently well when trained on a large data set. An alternative procedure is proposed, based on an assembly of neural networks. Each network is designated to a specific region in the articulatory space, and performs a mapping from cepstral values into ...


Journal of the Acoustical Society of America | 1996

The potential role of speech production models in automatic speech recognition

R. C. Rose; Juergen Schroeter; Man Mohan Sondhi

This paper investigates the issues that are associated with applying speech production models to automatic speech recognition (ASR). Here the applicability of articulatory representations to ASR is considered independently of the role of articulatory representations in speech perception. While the question of whether it is necessary or even possible for human listeners to recover the state of the articulators during the process of perceiving speech is an important one, it is not considered here. Hence, the authors refrain from posing completely new paradigms for ASR which more closely parallel the relationship between speech production and human speech understanding. Instead, work aimed at integrating speech production models into existing ASR formalisms is described.


Journal of the Acoustical Society of America | 1986

The use of acoustical test fixtures for the measurement of hearing protector attenuation. Part II: Modeling the external ear, simulating bone conduction, and comparing test fixture and real‐ear data

Juergen Schroeter; Christoph Poesselt

This paper investigates two main features of the human head which influence the measured attenuation of circumaural and intraaural hearing protection devices (HPDs): the external ear and the different pathways of bone conduction. A theoretical model for the external ear shows that its influence on the insertion loss of HPDs, on the sensitivity level of headphones or earphones, and on the insertion gain of hearing aids, all can be described by one equation. While it is not necessary to simulate the eardrum impedance in order to measure the insertion loss of earmuffs and the sensitivity level of headphones with acoustical test fixtures (ATFs), the required accuracy of an ear simulator is more stringent when the same measurements are performed on intraaural devices. For the evaluation of HPDs, bone conduction plays an important role. We have developed a model to estimate HPD-dependent bone conduction effects. The model includes two bone conduction sources: one in the external ear and one in the middle ear. The model explains, for example, the occlusion effect of HPDs and the masking error at low frequencies due to physiological noise that arises when real-ear attenuation at threshold (REAT) measurements are made. Consequently, objectively measured insertion loss can now be used to predict REAT with improved accuracy. ATF and REAT data are compared using nine earmuffs and nine earplugs. In the majority of cases, the two sets of data agree well. Discrepancies are discussed.


international conference on acoustics speech and signal processing | 1998

TD-PSOLA versus harmonic plus noise model in diphone based speech synthesis

Ann K. Syrdal; Yannis Stylianou; Laurie Garrison; Alistair Conkie; Juergen Schroeter

In an effort to select a speech representation for our next generation concatenative text-to-speech synthesizer, the use of two candidates is investigated; TD-PSOLA and the harmonic plus noise model, HNM. A formal listening test has been conducted and the two candidates have been rated regarding intelligibility, naturalness and pleasantness. Ability for database compression and computational load is also discussed. The results show that HNM consistently outperforms TD-PSOLA in all the above features except for computational load. HNM allows for high-quality speech synthesis without smoothing problems at the segmental boundaries and without buzziness or other oddities observed with TD-PSOLA.


Journal of Phonetics | 1995

Modeling a leaky glottis

Bert Cranen; Juergen Schroeter

Abstract The fact that the oral flow of both males and females normallycontains an appreciable do component during the “closed glottis” interval of vowel sounds produced at normal loudness levels indicates that glottal leakage is a very common phenomenon. In this paper the acoustic consequences of glottal leakage are studied by means of a computer simulation. The effects of two different types of leaks were studied, i.e., of (a) a linked leak: an opening (at least partly) situated in the membranous glottis and caused by abduction, and (b) a parallel chink: a leak that can be viewed as an opening which is essentially separated from (parallel to) the time-varying part of the glottis. The results of our simulations show that a moderate leak may give rise to appreciable source-tract interaction which becomes most apparent for a parallel chink. In the time domain it manifests itself as a ripple in the glottal flow waveform just after closure. In the frequency domain, the spectrum of the flow through a glottis with a leak (both linked and parallel) is characterized by zeros at the formant frequencies. The major difference in spectral effects of a linked leak and a parallel chink is the spectral slope. For a parallel chink it is of the same order of magnitude as in the no leakage case or even slightly flatter. In the case of a linked leak, the spectral slope falls off much more rapidly. These findings suggest that the amount of do flow alone is not a very good measure to use for voice efficiency measures.


international conference on acoustics, speech, and signal processing | 1991

Acoustic to articulatory parameter mapping using an assembly of neural networks

Mazin G. Rahim; W. B. Keijn; Juergen Schroeter; C. C. Goodyear

The authors describe an efficient procedure for acoustic-to-articulatory parameter mapping using neural networks. An assembly of multilayer perceptrons, each designated to a specific region in the articulatory space, is used to map acoustic parameters of the speech into tract areas. The training of this model is executed in two stages; in the first stage a codebook of suitably normalized articulatory parameters is used and in the second stage real speech data are used to further improve the mapping. In general, acoustic-to-articulatory parameter mapping is nonunique; several vocal tract shapes can result in identical spectral envelopes. The model accommodates this ambiguity. During synthesis, neural networks are selected by dynamic programming using a criterion that ensures smoothly varying vocal tract shapes while maintaining a good spectral match.<<ETX>>

Collaboration


Dive into the Juergen Schroeter's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bert Cranen

Radboud University Nijmegen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Volker Strom

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge