Otto Schmidbauer
Siemens
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Otto Schmidbauer.
international conference on acoustics, speech, and signal processing | 1991
J. Tebelskis; Alex Waibel; Bojan Petek; Otto Schmidbauer
The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs). The system uses neural networks as predictors of speech frames, yielding distortion measures which can be used by the one-stage DTW algorithm to perform continuous speech recognition. The system currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested. It was also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. The strengths and weaknesses of the predictive approach are discussed.<<ETX>>
international conference on acoustics, speech, and signal processing | 1992
Otto Schmidbauer; J. Tebelskis
A novel type of hierarchical phoneme model for speaker adaptation, based on both hidden Markov models (HMM) and learned vector quantization (LVQ) networks is presented. Low-level tied LVQ phoneme models are trained speaker-dependently and independently, yielding a pool of speaker-biased phoneme models which can be mixed into high-level speaker-adaptive phoneme models. Rapid speaker adaptation is performed by finding an optimal mixture for these models at recognition time, given only a small amount of speech data; subsequently, the models are fine-tuned to the new speakers voice by further parameter reestimation. In preliminary experiments with a continuous speech task using 40 context-free phoneme models at task perplexity 111, the authors achieved 82% word accuracy for speaker-dependent recognition and 73% in the speaker-adaptive mode.<<ETX>>
international conference on acoustics, speech, and signal processing | 1989
Otto Schmidbauer
A system is described that takes advantage of the combination of properties of feature- and rule-based systems (evaluating systematic acoustic-articulatory dependencies) with properties of statistic-based methods (automatic training, uniform scoring). The main sources of variabilities in the acoustic speech signal, which are undoubtedly coarticulation and assimilation, are studied. Experimental results show that, by exploiting systematic acoustic-articulatory relations, it is possible to improve the performance of common pattern recognition methods. This is accomplished by introducing an articulatory feature vector in the acoustic-phonetic decoding scheme, as a feature level lying between the acoustic and phonemic level.<<ETX>>
international conference on acoustics, speech, and signal processing | 1987
Otto Schmidbauer
Syllable based segmentation systems using the energy contour approach are limited to carefully uttered speech. This paper describes a syllable based acoustic-phonetic frontend incorporating knowledge of articulatory aspects of syllable production, in order to detect syllabic nuclei in quickly uttered speech with high reliability. In an iterative procedure hypotheses for syllabic nuclei and their phonetic context are established, using robust gross articulatory classes and features. Remaining intersyllabic consonant clusters are segmented into initial- and final consonant clusters. Due to unavoidable acoustic-phonetic ambiguities alternative segmentations are permitted and several weighted hypotheses for initial-and final consonant clusters are produced.
international conference on acoustics, speech, and signal processing | 1986
Harald Höge; Bernhard Dipl Ing Littel; Erwin Marschall; Otto Schmidbauer; R. Sommer
The acoustic modul of the speech understanding system SPICOS which is designed as a German language man-machine-dialogue interface to a databank is described. Continuous speech is recognized in a speaker dependent mode using a syllabic approach. Each syllable is split acoustically into the phonetic units initial consonant clusters, syllabic nuclei and final consonant clusters. The lexicon contains 900 words. It is represented by a graph structure modelling different pronunciations of word strings and ambiguous segmentation of the speech signal into phonetic units.
international conference on acoustics, speech, and signal processing | 1990
Abdulmesih Aktas; Otto Schmidbauer; K.H. Maier; W.H. Feix
A comparison of the temporal flow model (TFM) as a connectionist approach with statistical methods like the hidden Markov model (HMM) and the maximum-likelihood (ML) classifier on the basis of frame and segment recognition experiments is presented. All three methods were applied to a coarse phonetic classification task in a speaker-dependent mode. The seven coarse phonetic categories (CPCs) used correspond to the categories of manner of articulation. The experiments were performed on manually labeled continuous-speech data incorporating two versions of 50 phonetically balanced sentences. A short time cepstral representation of the speech data was chosen as the basis for all classification experiments. The best results were achieved with a context-dependent HMM. Experiments without the use of segment context noticeably yield better overall results for the TFM. Both are found to be superior to the ML classifier.<<ETX>>
Grundlagen und Anwendungen der Künstlichen Intelligenz, 17. Fachtagung für Künstliche Intelligenz, Humboldt-Universität zu | 1993
Manfred Gehrke; Otto Schmidbauer
The CSTAR project focusses on research to show the feasibility of speech-to-speech translation in telephone services. This paper describes the architecture and the main components of this system which translates spoken German utterances into spoken Japanese utterances.
Mustererkennung 1985, DAGM-Symposium | 1985
Harald Höge; Ernst Marschall; Otto Schmidbauer; R. Sommer
Die akustische Komponente einschlieslich der Worthypothesengenerierung in einem System zur Erkennung fliesender Rede wird beschrieben. Das Sprachsignal wird explizit in Konsonantencluster und Vokalkerne segmentiert. In einem bottom-up Ansatz liefert der Klassifikator Phonemclusterhypothesen, aus denen in der nachsten Stufe eine Liste von Worthypothesen generiert wird. Dies geschieht unter Ruckgriff auf ein phonologisches Netzwerk, in dem relevante phonologische Phanomene wie Intra- und Interwortverschleifungen, alternative Segmentierungen u.a. berucksichtigt sind. Erste Tests mit einem ca. 1000 Vollformen umfas-senden Lexikon zeigen, das der Umfang der auf den verschiedenen Ebenen generierten Hypothesen klein genug bleibt, um von den nachfolgenden Modulen bearbeitet werden zu konnen.
Proceedings of the DAGM/ÖAGM Symposium | 1984
Otto Schmidbauer
Trotz intensiver Anstrengungen auf dem Gebiet der Spracherkennung mit dem Computer, existieren zwischen Mensch und Maschine in der Erkennung gesprochener Sprache noch grose Unterschiede bezuglich der Leistungsfahigkeit: Die menschliche Spracherkennung ist robust und flexibel und weitgehend unabhangig vom Sprecher. Im Vergleich dazu mussen maschinelle Spracherkennungssysteme auf einzelne Sprecher trainiert werden und bringen nur dann befriedigende Ergebnisse, wenn das zu erkennende Vokabular genugend grose phonetische Unterschiede aufweist. Erganzend ist zu bemerken, das heutige pracherkennungssysteme, im Gegensatz zum Menschen, weitgehend noch keinen Gebrauch von hoheren Wissensquellen, wie Syntax, Semantik, u.a. Gebrauch machen.
neural information processing systems | 1991
Alex Waibel; Ajay N. Jain; Arthur E. McNair; Joe Tebelskis; Louise Osterholtz; Hiroaki Saito; Otto Schmidbauer; Tilo Sloboda; Monika Woszczyna