Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bernd J. Kröger is active.

Publication


Featured researches published by Bernd J. Kröger.


non linear speech processing | 2009

Towards a neurocomputational model of speech production and perception

Bernd J. Kröger; Jim Kannampuzha; Christiane Neuschaefer-Rube

The limitation in performance of current speech synthesis and speech recognition systems may result from the fact that these systems are not designed with respect to the human neural processes of speech production and perception. A neurocomputational model of speech production and perception is introduced which is organized with respect to human neural processes of speech production and perception. The production-perception model comprises an artificial computer-implemented vocal tract as a front-end module, which is capable of generating articulatory speech movements and acoustic speech signals. The structure of the production-perception model comprises motor and sensory processing pathways. Speech knowledge is collected during training stages which imitate early stages of speech acquisition. This knowledge is stored in artificial self-organizing maps. The current neurocomputational model is capable of producing and perceiving vowels, VC-, and CV-syllables (V=vowels and C=voiced plosives). Basic features of natural speech production and perception are predicted from this model in a straight forward way: Production of speech items is feedforward and feedback controlled and phoneme realizations vary within perceptually defined regions. Perception is less categorical in the case of vowels in comparison to consonants. Due to its human-like production-perception processing the model should be discussed as a basic module for more technical relevant approaches for high-quality speech synthesis and for high performance speech recognition.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Simulation of Losses Due to Turbulence in the Time-Varying Vocal System

Peter Birkholz; Dietmar Jackèl; Bernd J. Kröger

Flow separation in the vocal system at the outlet of a constriction causes turbulence and a fluid dynamic pressure loss. In articulatory synthesizers, the pressure drop associated with such a loss is usually assumed to be concentrated at one specific position near the constriction and is represented by a lumped nonlinear resistance to the flow. This paper highlights discontinuity problems of this simplified loss treatment when the constriction location changes during dynamic articulation. The discontinuities can manifest as undesirable acoustic artifacts in the synthetic speech signal that need to be avoided for high-quality articulatory synthesis. We present a solution to this problem based on a more realistic distributed consideration of fluid dynamic pressure changes. The proposed method was implemented in an articulatory synthesizer where it proved to prevent any acoustic artifacts


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Model-Based Reproduction of Articulatory Trajectories for Consonant–Vowel Sequences

Peter Birkholz; Bernd J. Kröger; Christiane Neuschaefer-Rube

We present a novel quantitative model for the generation of articulatory trajectories based on the concept of sequential target approximation. The model was applied for the detailed reproduction of movements in repeated consonant-vowel syllables measured by electromagnetic articulography (EMA). The trajectories for the constrictor (lower lip, tongue tip, or tongue dorsum) and the jaw were reproduced. Thereby, we tested the following hypotheses about invariant properties of articulatory commands: (1) The target of the primary articulator for a consonant is invariant with respect to phonetic context, stress, and speaking rate. (2) Vowel targets are invariant with respect to speaking rate and stress. (3) The onsets of articulatory commands for the jaw and the constrictor are synchronized. Our results in terms of high-quality matches between observed and model-generated trajectories support these hypotheses. The findings of this study can be applied to the development of control models for articulatory speech synthesis.


COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007

A gesture-based concept for speech movement control in articulatory speech synthesis

Bernd J. Kröger; Peter Birkholz

An articulatory speech synthesizer comprising a three-dimensional vocal tract model and a gesture-based concept for control of articulatory movements is introduced and discussed in this paper. A modular learning concept based on speech perception is outlined for the creation of gestural control rules. The learning concept includes on sensory feedback information for articulatory states produced by the model itself, and auditory and visual information of speech items produced by external speakers. The complete model (control module and synthesizer) is capable of producing high-quality synthetic speech signals and introduces a scheme for the natural speech production and speech perception processes.


Journal of the Acoustical Society of America | 1995

A gesture‐based dynamic model describing articulatory movement data

Bernd J. Kröger; Georg Schröder; Claudia Opgen‐Rhein

A quantitative dynamic model for the description of speech movements using a critically damped linear second‐order system is proposed. This six‐parameter model is able to fit natural movement data with high accuracy. Since in this approach the actual location of gestural onset and gestural offset, i.e., the location and duration of gestural activation, results from the fitting procedure, no advance sectioning of movement traces is necessary. The model parameters are target position, eigenperiod, and four time parameters describing the temporal location of gestural onset and offset. The fitting algorithm is tested on simulated and natural data in order to evaluate the accuracy of the fits and the repeatability of the dynamic parameters extracted.


International Journal of Speech-Language Pathology | 2010

The effectiveness of traditional methods and altered auditory feedback in improving speech rate and intelligibility in speakers with Parkinson's disease

Anja Lowit; C. Dobinson; Claire Timmins; Peter Howell; Bernd J. Kröger

Communication problems are a frequent symptom for people with Parkinsons disease (PD) which can have a significant impact on their quality-of-life. Deciding on the right management approach can be problematic though, as, with the exception of LSVT®, very few studies have been published demonstrating the effectiveness of treatment techniques. The aim of this study was to compare traditional rate reduction methods with altered auditory feedback (AAF) with respect to their effectiveness to reduce speech rate and improve intelligibility in speakers with PD. Ten participants underwent both types of treatments in once weekly sessions for 6 weeks. Outcomes measures were speech rate for passage reading as well as intelligibility on both a passage reading and a monologue task. The results showed that, as a group, there was no significant change in either speech rate or intelligibility resulting from either treatment type. However, individual speakers showed improvements in speech performance as a result of each therapy technique. In most cases, these benefits persisted for at least 6 months post-treatment. Possible reasons for the variable response to treatment, as well as issues to consider when planning to use AAF devices in treatment are discussed.


Cognitive Processing | 2010

A model for production, perception, and acquisition of actions in face-to-face communication

Bernd J. Kröger; Stefan Kopp; Anja Lowit

The concept of action as basic motor control unit for goal-directed movement behavior has been used primarily for private or non-communicative actions like walking, reaching, or grasping. In this paper, literature is reviewed indicating that this concept can also be used in all domains of face-to-face communication like speech, co-verbal facial expression, and co-verbal gesturing. Three domain-specific types of actions, i.e. speech actions, facial actions, and hand-arm actions, are defined in this paper and a model is proposed that elucidates the underlying biological mechanisms of action production, action perception, and action acquisition in all domains of face-to-face communication. This model can be used as theoretical framework for empirical analysis or simulation with embodied conversational agents, and thus for advanced human–computer interaction technologies.


Phonetica | 1993

A gestural production model and its application to reduction in German.

Bernd J. Kröger

A quantitative speech production model has been computer-implemented based on specifications of gestures as its input. Articulatory gestures serve the central role of phonological/phonetic units in speech organization. The present model is not based on task dynamics, but it still assumes a critically damped linear second-order system. The capability of this model is partially demonstrated by its application to reduction phenomena in conversational speech in German. Explanations in terms of timing and magnitude alterations of gestures in different articulators use only a few general principles for a variety of apparent segmental alterations. Some salient examples have been synthesized based on the current model, which also contributed to the evaluation of the parameter values involved in sample gestures.


joint ieee international conference on development and learning and epigenetic robotics | 2015

Seeing [u] aids vocal learning: Babbling and imitation of vowels using a 3D vocal tract model, reinforcement learning, and reservoir computing

Max Murakami; Bernd J. Kröger; Peter Birkholz; Jochen Triesch

We present a model of imitative vocal learning consisting of two stages. First, the infant is exposed to the ambient language and forms auditory knowledge of the speech items to be acquired. Second, the infant attempts to imitate these speech items and thereby learns to control the articulators for speech production. We model these processes using a recurrent neural network and a realistic vocal tract model. We show that vowel production can be successfully learnt by imitation. Moreover, we find that acquisition of [u] is impaired if visual information is discarded during imitation. This might give sighted infants an advantage over blind infants during vocal learning, which is in agreement with experimental evidence.


Hno | 2004

MRT-Sequenzen als Datenbasis eines visuellen Artikulationsmodells

Bernd J. Kröger; Philip Hoole; Robert Sader; Christian Geng; B. Pompino-Marschall; Christiane Neuschaefer-Rube

Articulatory models can be used in phoniatrics for the visualisation of speech disorders, and can thus be used in teaching, the counselling of patients and their relatives, and in speech therapy. The articulatory model developed here was based on static MRI data of sustained sounds. MRI sequences are now being used to further refine the model with respect to speech movements. Medio-sagittal MRI sections were recorded for 12 consonants in the symmetrical context of the three point vowels [i:], [a:] and [u:] for this corpus. The recording-rate was eight images/s. The data show a strong influence of the vocalic context on the articulatory target-positions of all consonants. A method for the reduction of the MRI data for subsequent qualitative and quantitative analyses is presented.

Collaboration


Dive into the Bernd J. Kröger's collaboration.

Top Co-Authors

Avatar

Peter Birkholz

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anja Lowit

University of Strathclyde

View shared research outputs
Top Co-Authors

Avatar

Mengxue Cao

Chinese Academy of Social Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefan Heim

RWTH Aachen University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge