Frank Seide | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Seide is active.

Explore More

Publication

Featured researches published by Frank Seide.

Speech Communication | 1995

The Philips automatic train timetable information system

Harald Aust; Martin Oerder; Frank Seide; Volker Steinbiss

Abstract In this article, we describe an automatic system for train timetable information over the telephone that provides accurate connections between 1200 German cities. The caller can talk to it in unrestricted, natural, and fluent speech, very much like he or she would communicate with a human operator, and is not given any instructions in advance. The systems four main components, speech recognition, speech understanding, dialogue control, and speech output, are separated into independent modules that are executed sequentially. Word graphs form the interface between recognition and understanding; an atttributed stochastic context-free grammar is then used to determine the meaning of a spoken sentence. In an ongoing field trial, this system has been made available to the general public, both to gather speech data and to evaluate its performance. This field test was organized as a bootstrapping process: initially, the system was trained with just the developers voices, then the telephone number was passed around within the department, the company and, finally, the outside world. After each step, the newly collected material was used for retraining, as well as for general improvements.

international conference on spoken language processing | 1996

Improving speech understanding by incorporating database constraints and dialogue history

Frank Seide; Bernhard Rueber; Andreas Kellner

In the course of a (man-machine) dialogue, the systems belief concerning the users intention is continuously being built up. Moreover, restricting the discourse to a narrow application domain further constrains the variety of possible user reactions. We show how these knowledge sources may be utilized in a stochastic framework to improve speech understanding. On field test data collected with our automatic exchange board prototype PADIS, a relative reduction of attribute errors by 27% was obtained.

international conference on acoustics, speech, and signal processing | 2000

Pitch tracking and tone features for Mandarin speech recognition

Hank C.-H. Huang; Frank Seide

Tone modeling is a critical component for Mandarin large-vocabulary continuous-speech recognition systems. This paper presents an efficient real-time pitch tracker and a set of tone features that achieve a vast 30% reduction of the character error rate (CER), compared to the non-tonal baseline. To our knowledge, this is the highest improvement from tones ever reported for Mandarin. The paper first discusses adapting a known pitch-tracking algorithm for real-time operation. Second, we study the derivation of tone features for Mandarin LVCSR. Compared to the baseline vector (F/sub 0/, /spl Delta/F/sub 0/), our best tone features lead to a 28% reduction of tone errors. Results are shown for three LVCSR databases, including the Chinese 1998 National Performance Assessment (Project 863) and the Taiwan telephony database MAT. Performance of Western-language systems is reached, and for the 863 System Performance Test, our system achieves 1.5% CER.

Speech Communication | 1997

PADIS—an automatic telephone switchboard and directory information system

Andreas Kellner; Bernhard Rueber; Frank Seide; Bach-Hiep Tran

PADIS, le systeme de standard telephonique automatique et dinformation annuaire de Philips offre une interface utilisateur en langage naturel pour acceder a une base de donnees telephoniques. En utilisant les technologies de reconnaissance de la parole et de comprehension de langage, le systeme permet dobtenir les numeros de telephone, les numeros de fax, les adresses electroniques, les numeros de pieces ainsi que letablissement direct dappel vers le numero desire. Dans cet article, nous presentons le cadre probabiliste sous-jacent, larchitecture du systeme, et les modules individuels de reconnaissance de parole, de comprehension du langage, de controle du dialogue, et de sortie vocale. De plus, nous rapportons des resultats sur les performances et le comportement des usagers obtenus a partir dun test terrain realise dans notre laboratoire de recherche avec une base de donnees de 600 entrees. Nous derivons une nouvelle regle de decision basee sur le critere de maximum a posteriori qui incorpore des connaissances sur la base de donnees et sur lhistorique du dialogue comme des contraintes pour la reconnaissance de la parole et la comprehension du langage. Ceci a permis dameliorer la comprehension de la parole de 19% (en termes de taux derreur), et de reduire de 38% les erreurs de substitution des attributs (par exemple reconnaissance dun nom errone). La regle de decision est implantee dans une approche multiniveaux correspondant a une combinaison dune reconnaissance de parole au niveau de letat de lart, dune recherche grammaticale partielle dans une grammaire a attributs hors contexte et stochastique, et dun algorithme de recherche des N-meilleures solutions, qui est egalement decrit dans cet article. Le systeme conduit un dialogue dinitiative mixte flexible au lieu dutiliser un schema rigide de remplissage de formulaires. Il incorpore des connaissances sur la base de donnees afin doptimiser le deroulement du dialogue.

Journal of the Acoustical Society of America | 2000

Method and device for recognizing speech in a spelling mode including word qualifiers

Andreas Kellner; Frank Seide

A method and device for recognizing speech that has a sequence of words each including one or more letters. The word and letters form a recognition data base. The method receives and recognizes the speech by preliminary modelling among various probably recognized sequences. The method selects one or more model sequences as result. In particular, the method allows in a model sequence of exclusively letters, various words as a subset. Such words are used to qualify one or more neighbouring or included letters in the sequence. An applicable model is a mixed information unit model.

international conference on spoken language processing | 1996

A word graph based N-best search in continuous speech recognition

B.-H. Tran; Frank Seide; T. Steinbiss

The authors introduce an efficient algorithm for the exhaustive search of N-best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the first pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N-best word sequences from the word graph takes place during the second pass. With the implementation of a tree-organized N-best list, the search is performed directly on the resulting word graph. Therefore, the parallel bookkeeping of N hypotheses at each processing step during the search is not necessary. It is important to point out that the proposed N-best search algorithm produces an exact N-best list as defined by the word graph structure. Possible errors can only result from pruning during the construction of the word graph. In a postprocessing step, the N candidates can be rescored with a more complex language model with highly reduced computational cost. This algorithm is also applied in speech understanding to select the most likely sentence hypothesis that satisfies some additional constraints.

Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications | 1994

Experience with the Philips automatic train timetable information system

Harald Aust; Martin Oerder; Frank Seide; Volker Steinbiss

Introduces an automatic system for train timetable information over the telephone that provides accurate connections between 1200 German cities. The caller can talk to it in unrestricted, natural, and fluent speech, very much like he or she would communicate with a human operator, and is not given any instructions in advance. In an ongoing field trial, this system has been made available to the general public, both to gather speech data and to evaluate its performance. This field test was organized as a bootstrapping process: initially, the system was trained with just the developers voices, then the telephone number was passed around within the department, the company, and finally, the outside world. After each step, the newly collected material was used for retraining and general improvements. The observations and results from this test are reported.<<ETX>>

Journal of the Acoustical Society of America | 2000

Method of and apparatus for deriving a plurality of sequences of words from a speech signal

Bach-Hiep Tran; Frank Seide; Volker Steinbiss

The determination of a plurality of sequences of words from a speech signal with a decreasing probability of correspondence utilizes the best word sequence as a basis and as further word sequences there are determined only those which enclose a part of the best word sequence, that is to say the remainder of these word sequences. To this end, the recognition involves first the formation of a word graph and the best word sequence is separately stored as a tree which initially has one branch only. The word boundaries of this word sequence form nodes in this tree. Because only nodes of this tree have to be taken into account for the next-best word sequences, the calculation is substantially simpler than if the complete word graph were first completely expanded in the form of a tree and completely searched again for each new word sequence.

Philips Journal of Research | 1995

A spoken language inquiry system for automatic train timetable information

Harald Aust; Martin Oerder; Frank Seide; Volker Steinbiss

Abstract This article describes the Philips automatic train timetable information system which enables the user to call up accurate information about train connections between 1200 German cities over the telephone. In contrast to most of the inquiry systems available so far, the caller can talk to our system in unrestricted, natural and fluent speech, very much like talking to a human operator. No instructions are given beforehand. The system consists of four main components: speech recognition, speech understanding, dialogue control, and speech output. They are separated into independent modules and executed sequentially. The speech recogniser creates a word graph from the spoken input. This word graph is then passed to the understanding component which computes the meaning, using an attributed stochastic context-free grammar. A dialogue manager analyses the results and either accesses the database or comes up with another question if necessary. The system has been made available to the general public in an ongoing field test, both to gather speech data and to evaluate its performance.

ieee automatic speech recognition and understanding workshop | 1997

With a little help from the database-developing voice-controlled directory information systems

Andreas Kellner; Frank Seide; Bernhard Rueber

Automated directory information is amongst the most challenging applications of automatic speech recognition. We present some basic techniques that try to overcome the deficiencies of the speech recognizer by incorporating as much additional knowledge as possible, such as the telephone directory. We derive a maximum a-posteriori decision rule which explicitly uses the telephone directory knowledge as well as the dialogue history to improve speech understanding accuracy. The rule allows us to take a combined decision on the joint probability over multiple dialogue turns, which yields good results in combination with spelling. Our spelling architecture permits continuous spelling of names and uses a context-free grammar to parse common spelling expressions. We review two different real time prototypes, on which we evaluated our decision rule. One (PADIS) operates on a small database and one (PADIS-XL) on a database with 130000 entries.

Explore More