Géza Németh
Budapest University of Technology and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Géza Németh.
Procedia Computer Science | 2014
António J. S. Teixeira; Annika Hämäläinen; Jairo Avelar; Nuno Almeida; Géza Németh; Tibor Fegyó; Csaba Zainkó; Tamás Gábor Csapó; Bálint Tóth; André Oliveira; Miguel Sales Dias
Abstract The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of services from weather information to social networking. This paper presents the multimodal architecture of the PLA, the services provided by the PLA, and the work done in the area of speech input and output modalities, which play a key role in the application.
International Journal of Speech Technology | 2000
Gábor Olaszy; Géza Németh; Péter Olaszi; Géza Kiss; Csaba Zainkó; Géza Gordos
The latest Hungarian text-to-speech (TTS) system developed for telephone-based applications is described. The main features are intelligible human-like voice; robust software designed for continuous running; fully automatic conversion of declarative (short and very long) sentences and questions; and real time parallel operation, running on minimum 30 channels. The concept of prosody generation and sound duration processing is introduced. Also, the development environment of Profivox is presented. The market-leader Hungarian mobile service provider applies the TTS system in an automatic e-mail reading application.
Archive | 1999
Gábor Olaszy; Géza Németh
This paper describes the phonetic analysis of spoken numbers and a special approach used to achieve high quality number-to-speech (NTS) synthesis for IVR systems. The new solution provides the possibility of combining synthesized numbers with stored speech messages for professional teleinformatic applications where numbers have to be pronounced automatically (telebanking systems, ordering services, industrial information systems). Examples for English, German, Portuguese and Hungarian are given.
Speech Communication | 1997
Gábor Olaszy; Géza Németh
Abstract The work described in the paper was carried out in the SPEAK! project (Speech Generation in Multimodal Information Systems). The aim of the project was to improve the quality of synthesised speech output to be used in dialogue systems as an additional element of multimodal man-machine interfaces. German text and dialogue interaction analysis (theoretical research) has been carried out to predict the tone groups (TGs), the phrase boundaries in sentences and the place of the focus in the phrase. Tone groups represent the general intonation structure of the phrase not taking into account word level intonation. The results of this research are the intonation markers described in (Teich et al., 1997). The CTS synthesiser constructs the main intonation patterns from texts containing these additional markers. This paper describes the research results on German intonation, including the construction of intonation rules, combined with the study on timing adjustments, pause generation for rhythm (both for segmental and suprasegmental levels) for the MULTIVOX-SPEAK! system. Detailed rules and a new tone-group based prosody generation module are also introduced: these have been integrated into the MULTIVOX TTS system. Preliminary evaluation results are also given.
text, speech and dialogue | 2006
Márk Fék; Péter Pesti; Géza Németh; Csaba Zainkó; Gábor Olaszy
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian The experimental system generates weather forecasts in Hungarian 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech A Hungarian speech recognizer was applied to label speech sound boundaries Word boundaries were also marked automatically The unit selection follows a top-down hierarchical scheme using words and speech sounds as units A simple prosody model is used, based on the relative position of words within a prosodic phrase The quality of the system was compared to two earlier Hungarian TTS systems A subjective listening test was performed by 221 listeners The experimental system scored 3.92 on a five-point mean opinion score (MOS) scale The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.
International Journal of Speech Technology | 2000
Géza Németh; Csaba Zainkó; László Fekete; Gábor Olaszy; Gábor Endrédi; Péter Olaszi; Géza Kiss; Péter Kis
The markets leading Hungarian Global System for Mobile Communications (GSM) operator—Westel—has recently introduced a Hungarian e-mail reading system as a regular service. It was implemented on the basis of an experimental system developed at the Department of Telecommunications and Telematics of the Budapest University of Technology and Economics (DTT BUTE). In this article, the considerations involved in the design and implementation decisions of both the experimental and the industrial systems will be described. Results of the first 10 weeks of regular use of the industrial system will also be given.
IEEE Journal of Selected Topics in Signal Processing | 2014
Tamás Gábor Csapó; Géza Németh
Statistical parametric text-to-speech synthesis is optimized for regular voices and may not create high-quality output with speakers producing irregular phonation frequently. A number of excitation models have been proposed recently in the hidden Markov-model speech synthesis framework, but few of them deal with the occurrence of this phenomenon. The baseline system of this study is our previous residual codebook based excitation model, which uses frames of pitch-synchronous residuals. To model the irregular voice typically occurring in phrase boundaries or sentence endings, two alternative extensions are proposed. The first, rule-based method applies pitch halving, amplitude scaling of residual periods with random factors and spectral distortion. The second, data-driven approach uses a corpus of residuals extracted from irregularly phonated vowels and unit selection is applied during synthesis. In perception tests of short speech segments, both methods have been found to improve the baseline excitation in preference and similarity to the original speaker. An acoustic experiment has shown that both methods can synthesize irregular voice that is close to original irregular phonation in terms of open quotient. The proposed methods may contribute to building natural, expressive and personalized speech synthesis systems.
non-linear speech processing | 2009
Tamás M. Bőhm; Zoltán Both; Géza Németh
Irregular phonation (also called creaky voice, glottalization and laryngealization) may have various communicative functions in speech. Thus the automatic classification of phonation type into regular and irregular can have a number of applications in speech technology. In this paper, we propose such a classifier that extracts six acoustic cues from vowels and then labels them as regular or irregular by means of a support vector machine. We integrated cues from earlier phonation type classifiers and improved their performance in five out of the six cases. The classifier with the improved cue set produced a 98.85% hit rate and a 3.47% false alarm rate on a subset of the TIMIT corpus.
international conference on advanced robotics | 2007
Jan Koch; Holger Jung; Jens Wettach; Géza Németh; Karsten Berns
Research in mobile service robotics aims on development of intuitive speech interfaces for human-robot interaction. We see a service robot as a part of an intelligent environment and want to step forward discussing a concept where a robot does not only offer its own features via natural speech interaction but also becomes a transactive agent featuring other services’ interfaces. The provided framework makes provisions for the dynamic registration of speech interfaces to allow a loosely-coupled flexible and scalable environment. An intelligent environment can evolve out of multimedia devices, home automation, communication, security, and emergency technology. These appliances offer typical wireless or stationary control interfaces. The number of different control paradigms and differently lay-outed control devices gives a certain border in usability. As speech interfaces offer a more natural way to interact intuitively with technology we propose to centralize a general speech engine on a robotic unit. This has two reasons: The acceptance to talk to a mobile unit is estimated to be higher rather than to talk to an ambient system where no communication partner is visible. Additionally the devices or functionalities to be controlled in most cases do not provide a speech interface but offer only proprietary access.
Journal of the Acoustical Society of America | 2008
Tamás M. Bőhm; Nicolas Audibert; Stefanie Shattuck-Hufnagel; Géza Németh; Véronique Aubergé
Irregular phonation can serve as a cue to segmental contrasts and prosodic structure as well as to the affective state and identity of the speaker. Thus algorithms for transforming between voice qualities, such as regular and irregular phonation, may contribute to building more natural sounding, expressive and personalized speech synthesizers. We describe a semiautomatic transformation method that introduces irregular pitch periods into a modal speech signal by amplitude scaling of the individual cycles. First the periods are separated by windowing, then multiplied by scaling factors, and finally overlapped and added. Thus, amplitude irregularities are introduced via boosting or attenuating selected cycles. The abrupt, substantial changes in cycle lengths that are characteristic of naturally‐occurring irregular phonation can be achieved by removing (scaling to zero) one or more consecutive periods. A freely available graphical tool has been developed for copying stylized pulse patterns (glottal pulse spac...