Gábor Olaszy
Budapest University of Technology and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gábor Olaszy.
International Journal of Speech Technology | 2000
Gábor Olaszy; Géza Németh; Péter Olaszi; Géza Kiss; Csaba Zainkó; Géza Gordos
The latest Hungarian text-to-speech (TTS) system developed for telephone-based applications is described. The main features are intelligible human-like voice; robust software designed for continuous running; fully automatic conversion of declarative (short and very long) sentences and questions; and real time parallel operation, running on minimum 30 channels. The concept of prosody generation and sound duration processing is introduced. Also, the development environment of Profivox is presented. The market-leader Hungarian mobile service provider applies the TTS system in an automatic e-mail reading application.
Archive | 1999
Gábor Olaszy; Géza Németh
This paper describes the phonetic analysis of spoken numbers and a special approach used to achieve high quality number-to-speech (NTS) synthesis for IVR systems. The new solution provides the possibility of combining synthesized numbers with stored speech messages for professional teleinformatic applications where numbers have to be pronounced automatically (telebanking systems, ordering services, industrial information systems). Examples for English, German, Portuguese and Hungarian are given.
Speech Communication | 1997
Gábor Olaszy; Géza Németh
Abstract The work described in the paper was carried out in the SPEAK! project (Speech Generation in Multimodal Information Systems). The aim of the project was to improve the quality of synthesised speech output to be used in dialogue systems as an additional element of multimodal man-machine interfaces. German text and dialogue interaction analysis (theoretical research) has been carried out to predict the tone groups (TGs), the phrase boundaries in sentences and the place of the focus in the phrase. Tone groups represent the general intonation structure of the phrase not taking into account word level intonation. The results of this research are the intonation markers described in (Teich et al., 1997). The CTS synthesiser constructs the main intonation patterns from texts containing these additional markers. This paper describes the research results on German intonation, including the construction of intonation rules, combined with the study on timing adjustments, pause generation for rhythm (both for segmental and suprasegmental levels) for the MULTIVOX-SPEAK! system. Detailed rules and a new tone-group based prosody generation module are also introduced: these have been integrated into the MULTIVOX TTS system. Preliminary evaluation results are also given.
text, speech and dialogue | 2006
Márk Fék; Péter Pesti; Géza Németh; Csaba Zainkó; Gábor Olaszy
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian The experimental system generates weather forecasts in Hungarian 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech A Hungarian speech recognizer was applied to label speech sound boundaries Word boundaries were also marked automatically The unit selection follows a top-down hierarchical scheme using words and speech sounds as units A simple prosody model is used, based on the relative position of words within a prosodic phrase The quality of the system was compared to two earlier Hungarian TTS systems A subjective listening test was performed by 221 listeners The experimental system scored 3.92 on a five-point mean opinion score (MOS) scale The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.
International Journal of Speech Technology | 2000
Géza Németh; Csaba Zainkó; László Fekete; Gábor Olaszy; Gábor Endrédi; Péter Olaszi; Géza Kiss; Péter Kis
The markets leading Hungarian Global System for Mobile Communications (GSM) operator—Westel—has recently introduced a Hungarian e-mail reading system as a regular service. It was implemented on the basis of an experimental system developed at the Department of Telecommunications and Telematics of the Budapest University of Technology and Economics (DTT BUTE). In this article, the considerations involved in the design and implementation decisions of both the experimental and the industrial systems will be described. Results of the first 10 weeks of regular use of the industrial system will also be given.
International Journal of Speech Technology | 2000
Gábor Olaszy
Prosody is the change of F0 and intensity in time and the speed of articulation. The presence or absence of the realization of word accent is also examined as an important feature in prosody generation. During verbal communication various prosody forms contribute to the expression of the textual content of the message on the one hand and of the personal intention of the speaker on the other. In many cases in dialogues the same text can be (must be) pronounced with different intentions. Our goal was to find what kind of prosody patterns and rules are characteristic of these utterance types and what the acoustic relationship among them is for Hungarian. In this article the prosody structures of the most important dialogue components are described, and invariant structures are derived and verified by speech synthesis. Rules are also stated as generalized function structures to show the acoustic relationship of the prosody of these expressions to the prosody of statements. Using these rules, it is possible to convert the prosody of a given utterance type to another one by preserving the naturalness of the speech. The rules can be used in text to speech (TTS) conversion to generate spoken dialogues.
Archive | 2008
Géza Németh; Géza Kiss; Csaba Zainkó; Gábor Olaszy; Bálint Tóth
Mobile phones became indispensable friends for many people. They are being used in all spaces of life including the car. The security risk of this situation has motivated severe regulation of use on one hand and on the other hand, increased attention to built-in speech recognition. Far less attention has been paid however to possible advantages of automatic speech generation by phones including text-to-speech (TTS). This chapter addresses this domain. It will examine the general concepts and application areas of speaking mobile phones. In addition to the well known advantages for visually impaired, blind or speech impaired people such functionalities may help in the case of other hands-busy or eyes-busy situations (e.g., cooking in the kitchen). The advancement of this area is due to the appearance of mobile phone operating systems (Symbian, Palm OS, MS Smartphone and Linux Mobile) which can run applications created by developers independent from the phone manufacturers. A case study of a speaking aid mobile phone application and the first automatic SMS-reading mobile phone application introduced in Hungary in October 2003 will also be presented. It is shown that the proper combination of careful user interface design and high quality TTS should be supplemented by automatic language identification and other modules as well. Analysis of these supplementary modules is also presented.
international conference natural language processing | 2003
Géza Németh; Csaba Zainkó; Géza Kiss; Márk Fék; Géza Gordos; Gábor Olaszy
Name and address reading is an important combined application area of language processing and text-to-speech (TTS) systems. It is the cornerstone of both traditional reverse directory telephone services and new, location based, traffic and tour guide applications. The language processing aspects of a solution for Hungarian is described. The work was based on the analysis of a subscriber database containing about 3 million records (there are about 10 million Hungarian citizens). Categories of name and address elements were defined. A program for the automatic classification of database records was developed. Statistical parameters were derived about proper/legal names and addresses. Based on these results text corpora for enriching the TTS acoustic database were designed. Reading strategies and related special algorithms and tables were developed for the description of complex name categories. Our results may be applied for similar tasks of other languages with comparable linguistic and statistical features.
International Journal of Speech Technology | 2000
Ilona Koutny; Gábor Olaszy; Péter Olaszi
Proper prosodic structure is crucial for natural-sounding synthesized speech. Because of the lack of other information on discourse structure, we have to rely on syntactic structure in order to predict the main prosodic items for normal speech. To meet this requirement, a dependency-based parser has been developed for Hungarian that assigns the boundaries of functional constituents in the sentence, in other words, the places where new intonation patterns start and breaks can be inserted. We determine stress distribution in the sentence, using four levels including focus. The practical realization of the prosodic predictor also relies on statistical and empirical data. The intonation units (tone groups) with proper melody (e.g., falling, slowly falling, level, rising, slowly rising, rising-falling, and falling-rising) are established on the base of syntactic properties in declarative, interrogative, and imperative sentences. The results are embedded in an experimental Hungarian text-to-speech (TTS) system.
international conference on computers helping people with special needs | 2008
Géza Németh; Gábor Olaszy; Mátyás Bartalis; Géza Kiss; Csaba Zainkó; Péter Mihajlik; Csaba Haraszti
Aged and visually impaired persons belong to those groups of people, who can get information about drugs not so easily, as others. Although in Hungary lately Braille prints (containing the name of the medicament) are placed on the boxes of the drugs, but getting detailed information about the drug, i.e. to access the content of the written Patient Information Leaflets (PIL), is complicated. The Medicine Line (MLN) service may help in solving this problem. This automatic telephone information system was developed and put into operation in Hungary in December 2006. The computer system speaks and understands Hungarian, so the aged and visually impaired can get the information about the drug by voice. Adaptation to other languages is also possible. As we know, no such system is available in the European Union.