Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tanja Schultz is active.

Publication


Featured researches published by Tanja Schultz.


Speech Communication | 2001

Language-independent and language-adaptive acoustic modeling for speech recognition

Tanja Schultz; Alex Waibel

Abstract With the distribution of speech technology products all over the world, the portability to new target languages becomes a practical concern. As a consequence our research focuses on the question of how to port large vocabulary continuous speech recognition (LVCSR) systems in a fast and efficient way. More specifically we want to estimate acoustic models for a new target language using speech data from varied source languages, but only limited data from the target language. For this purpose, we introduce different methods for multilingual acoustic model combination and a polyphone decision tree specialization procedure. Recognition results using language-dependent, independent and language-adaptive acoustic models are presented and discussed in the framework of our GlobalPhone project which investigates LVCSR systems in 15 languages.


international conference on acoustics, speech, and signal processing | 2001

Advances in automatic meeting record creation and access

Alex Waibel; Michael Bett; Florian Metze; Klaus Ries; Thomas Schaaf; Tanja Schultz; Hagen Soltau; Hua Yu; Klaus Zechner

Oral communication is transient, but many important decisions, social contracts and fact findings are first carried out in an oral setup, documented in written form and later retrieved. At Carnegie Mellon Universitys Interactive Systems Laboratories we have been experimenting with the documentation of meetings. The paper summarizes part of the progress that we have made in this test bed, specifically on the question of automatic transcription using large vocabulary continuous speech recognition, information access using non-keyword based methods, summarization and user interfaces. The system is capable of automatically constructing a searchable and browsable audio-visual database of meetings and provide access to these records.


international conference on acoustics, speech, and signal processing | 2003

Comparison of acoustic model adaptation techniques on non-native speech

Zhirong Wang; Tanja Schultz; Alex Waibel

The performance of speech recognition systems is consistently poor on non-native speech. The challenge for non-native speech recognition is to maximize the recognition performance with a small amount of available non-native data. We report on acoustic modeling adaptation for the recognition of non-native speech. Using non-native data from German speakers, we investigate how bilingual models, speaker adaptation, acoustic model interpolation and polyphone decision tree specialization methods can help to improve the recognizer performance. Results obtained from the experiments demonstrate the feasibility of these methods.


international conference on acoustics speech and signal processing | 1998

Recognition of music types

Hagen Soltau; Tanja Schultz; Martin Westphal; Alex Waibel

This paper describes a music type recognition system that can be used to index and search in multimedia databases. A new approach to temporal structure modeling is supposed. The so called ETM-NN (explicit time modelling with neural network) method uses abstraction of acoustical events to the hidden units of a neural network. This new set of abstract features representing temporal structures, can be then learned via a traditional neural networks to discriminate between different types of music. The experiments show that this method outperforms HMMs significantly.


international conference on multimodal interfaces | 2004

Identifying the addressee in human-human-robot interactions based on head pose and speech

Michael Katzenmaier; Rainer Stiefelhagen; Tanja Schultz

In this work we investigate the power of acoustic and visual cues, and their combination, to identify the addressee in a human-human-robot interaction. Based on eighteen audio-visual recordings of two human beings and a (simulated) robot we discriminate the interaction of the two humans from the interaction of one human with the robot. The paper compares the result of three approaches. The first approach uses purely acoustic cues to find the addressees. Low level, feature based cues as well as higher-level cues are examined. In the second approach we test whether the humans head pose is a suitable cue. Our results show that visually estimated head pose is a more reliable cue for the identification of the addressee in the human-human-robot interaction. In the third approach we combine the acoustic and visual cues which results in significant improvements.


Speech Communication | 2014

Automatic speech recognition for under-resourced languages: A survey

Laurent Besacier; Etienne Barnard; Alexey Karpov; Tanja Schultz

Speech processing for under-resourced languages is an active field of research, which has experienced significant progress during the past decade. We propose, in this paper, a survey that focuses on automatic speech recognition (ASR) for these languages. The definition of under-resourced languages and the challenges associated to them are first defined. The main part of the paper is a literature review of the recent (last 8years) contributions made in ASR for under-resourced languages. Examples of past projects and future trends when dealing with under-resourced languages are also presented. We believe that this paper will be a good starting point for anyone interested to initiate research in (or operational development of) ASR for one or several under-resourced languages. It should be clear, however, that many of the issues and approaches presented here, apply to speech technology in general (text-to-speech synthesis for instance).


ieee automatic speech recognition and understanding workshop | 2005

Session independent non-audible speech recognition using surface electromyography

Lena Maier-Hein; Florian Metze; Tanja Schultz; Alex Waibel

In this paper we introduce a speech recognition system based on myoelectric signals. The system handles audible and non-audible speech. Major challenges in surface electromyography based speech recognition ensue from repositioning electrodes between recording sessions, environmental temperature changes, and skin tissue properties of the speaker. In order to reduce the impact of these factors, we investigate a variety of signal normalization and model adaptation methods. An average word accuracy of 97.3% is achieved using seven EMG channels and the same electrode positions. The performance drops to 76.2% after repositioning the electrodes if no normalization or adaptation is performed. By applying our adaptation methods we manage to restore the recognition rates to 87.1%. Furthermore, we compare audibly to non-audibly spoken speech. The results suggest that large differences exist between the corresponding muscle movements. Still, our recognition system recognizes both speech manners accurately when trained on pooled data


Speech Communication | 2010

Modeling coarticulation in EMG-based continuous speech recognition

Tanja Schultz; Michael Wand

This paper discusses the use of surface electromyography for automatic speech recognition. Electromyographic signals captured at the facial muscles record the activity of the human articulatory apparatus and thus allow to trace back a speech signal even if it is spoken silently. Since speech is captured before it gets airborne, the resulting signal is not masked by ambient noise. The resulting Silent Speech Interface has the potential to overcome major limitations of conventional speech-driven interfaces: it is not prone to any environmental noise, allows to silently transmit confidential information, and does not disturb bystanders. We describe our new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition and report results on the EMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which we recently collected. Our results on speaker-dependent and speaker-independent setups show that modeling the interdependence of phonetic features reduces the word error rate of the baseline system by over 33% relative. Our final system achieves 10% word error rate for the best-recognized speaker on a 101-word vocabulary task, bringing EMG-based speech recognition within a useful range for the application of Silent Speech Interfaces.


Proceedings of the IEEE | 2000

Multilinguality in speech and spoken language systems

Alex Waibel; Petra Geutner; Laura Mayfield Tomokiyo; Tanja Schultz; Monika Woszczyna

Building modern speech and language systems currently requires large data resources such as texts, voice recordings, pronunciation lexicons, morphological decomposition information and parsing grammars. Based on a study of the most important differences between language groups, we introduce approaches to efficiently deal with the enormous task of covering even a small percentage of the worlds languages. For speech recognition, we have reduced the resource requirements by applying acoustic model combination, bootstrapping and adaption techniques. Similar algorithms have been applied to improve the recognition of foreign accents. Segmenting language into appropriate units reduces the amount of data required to robustly estimate statistical models. The underlying morphological principles are also used to automatically adapt the coverage of our speech recognition dictionaries with the Hypothesis-Driven Lexical Adaptation (HDLA) algorithm. This reduces the out-of-vocabulary problems encountered in agglutinative languages. Speech recognition results are reported for the read GlobalPhone database and some broadcast news data. For speech translation, using a task-oriented Interlingua allows to build a system with N languages with linear, rather than quadratic effort. We have introduced a modular grammar design to maximize reusability and portability. End-to-end translation results are reported on a travel-domain task in the framework of C-STAR.


affective computing and intelligent interaction | 2009

Towards emotion recognition from electroencephalographic signals

Kristina Schaaff; Tanja Schultz

During the last decades, information about the emotional state of users has become more and more important in human-computer interaction. Automatic emotion recognition enables the computer to recognize a users emotional state and thus allows for appropriate reaction, which may pave the way for computers to act emotionally in the future. In the current study, we investigate different feature sets to build an emotion recognition system from electroencephalo-graphic signals. We used pictures from the International Affective Picture System to induce three emotional states: pleasant, neutral, and unpleasant. We designed a headband with four build-in electrodes at the forehead, which was used to record data from five subjects. Compared to standard EEG-caps, the headband is comfortable to wear and easy to attach, which makes it more suitable for everyday life conditions. To solve the recognition task we developed a system based on support vector machines. With this system we were able to achieve an average recognition rate up to 66.7% on subject dependent recognition, solely based on EEG signals.

Collaboration


Dive into the Tanja Schultz's collaboration.

Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Felix Putze

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ngoc Thang Vu

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Dominic Heger

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tim Schlippe

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Qin Jin

Renmin University of China

View shared research outputs
Top Co-Authors

Avatar

Florian Metze

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge