Silke Goronzy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Silke Goronzy is active.

Explore More

Publication

Featured researches published by Silke Goronzy.

agent-directed simulation | 2004

Emotion Recognition Using Bio-sensors: First Steps towards an Automatic System

Andreas Haag; Silke Goronzy; Peter Schaich; Jason Williams

The detection of emotion is becoming an increasingly important field for human-computer interaction as the advantages emotion recognition offer become more apparent and realisable. Emotion recognition can be achieved by a number of methods, one of which is through the use of bio-sensors. Bio-sensors possess a number of advantages against other emotion recognition methods as they can be made both inobtrusive and robust against a number of environmental conditions which other forms of emotion recognition have difficulty to overcome. In this paper, we describe a procedure to train computers to recognise emotions using multiple signals from many different bio-sensors. In particular, we describe the procedure we adopted to elicit emotions and to train our system to recognise them. We also present a set of preliminary results which indicate that our neural net classifier is able to obtain accuracy rates of 96.6% and 89.9% for recognition of emotion arousal and valence respectively.

Speech Communication | 2004

Generating non-native pronunciation variants for lexicon adaptation

Silke Goronzy; Stefan Rapp; Ralf Kompe

Abstract Handling non-native speech in automatic speech recognition (ASR) systems is an area of increasing interest. The majority of systems are tailored to native speech only and as a consequence performance for non-native speakers often is not satisfactory. One way to approach the problem is to adapt the acoustic models to the new speaker. Another important means to improve performance for non-native speakers is to consider non-native pronunciations in the dictionary. The difficulty here lies in the generation of the non-native variants, especially if various accents are to be considered. Traditional approaches to model pronunciation variation either require phonetic expertise or extensive speech databases. They are too costly, especially if a flexible modelling of several accents is desired. We propose to exclusively use native speech databases to derive non-native pronunciation variants. We use an English phoneme recogniser to generate English pronunciations for German words and use these to train decision trees that are able to predict the respective English-accented variant from the German canonical transcription. Furthermore we combine this approach with online, incremental weighted MLLR speaker adaptation. Using the enhanced dictionary and the speaker adaptation alone improved the word error rate of the baseline system by 5.2% and 16.8%, respectively. When both methods were combined, we achieved an improvement of 18.2%.

SmartKom | 2006

SmartKom-Home: The Interface to Home Entertainment

Thomas Portele; Silke Goronzy; Martin Emele; Andreas Kellner; Sunna Torge; Jürgen te Vrugt

SmartKom-Home demonstrates the use and benefit of an intelligent multimodal interface when controlling entertainment devices like a TV, a recorder, and a jukebox, and when accessing entertainment services like an electronic program guide combining speech and a handheld display with touch input. One important point is emphasizing the functional aspect, i.e., the user’s needs, conveyed to the system in a natural way by speech and gesture, are satisfied. The user does not need to know device-specific features or service idiosyncrasies. The function modeling component in SmartKom-Home has the necessary knowledge to transform the abstract user request into device commands and service queries.

SmartKom | 2006

The Dynamic Lexicon

Silke Goronzy; Stefan Rapp; Martin Emele

The dynamic lexicon is one of the central knowledge sources in SmarTkom that provides the whole system with the capabability to dynamically update the vocabulary. The corresponding multilingual pronunciations, which are needed by all speech-related components, are automatically generated.

SmartKom | 2006

Class-Based Language Model Adaptation

Martin Emele; Zica Valsan; Yin Hay Lam; Silke Goronzy

In this paper we introduce and evaluate two class-based language model adaptation techniques for adapting general n-gram-based background language models to a specific spoken dialogue task. The required background language models are derived from available newspaper corpora and Internet newsgroup collections. We followed a standard mixture-based approach for language model adaptation by generating several clusters of topic-specific language models and combined them into a specific target language model using different weights depending on the chosen application domain. In addition, we developed a novel word n-gram pruning technique for domain adaptation and proposed a new approach for thematic text clustering. This method relies on a new discriminative n-gram-based key term selection process for document clustering. These key terms are then used to automatically cluster the whole document collection. By selecting only relevant text clusters for language model training, we addressed the problem of generating task-specific language models. Different key term selection methods are investigated using perplexity as the evaluation measure. Automatically computed clusters are compared with manually labeled genre clusters, and the results provide a significant performance improvement depending on the chosen key term selection method.

Journal of the Acoustical Society of America | 2006

Recognizing speech by selectively canceling model function mixture components

Ralf Kompe; Silke Goronzy

A method for recognizing speech is proposed wherein the process of recognition is started using the starting acoustic model (SAM) and wherein the current acoustic model (CAM) is modified by removing or cancelling model function mixture components (MFMjk) which are negligible for the description of the speaking behavior and quality of the current speaker. Therefore, the size of the acoustic model (SAM, CAM) is reduced by adaptation to the current speaker enabling fast performance and increased recognition efficiency.

Archive | 1998