Csaba Zainkó
Budapest University of Technology and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Csaba Zainkó.
Procedia Computer Science | 2014
António J. S. Teixeira; Annika Hämäläinen; Jairo Avelar; Nuno Almeida; Géza Németh; Tibor Fegyó; Csaba Zainkó; Tamás Gábor Csapó; Bálint Tóth; André Oliveira; Miguel Sales Dias
Abstract The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of services from weather information to social networking. This paper presents the multimodal architecture of the PLA, the services provided by the PLA, and the work done in the area of speech input and output modalities, which play a key role in the application.
International Journal of Speech Technology | 2000
Gábor Olaszy; Géza Németh; Péter Olaszi; Géza Kiss; Csaba Zainkó; Géza Gordos
The latest Hungarian text-to-speech (TTS) system developed for telephone-based applications is described. The main features are intelligible human-like voice; robust software designed for continuous running; fully automatic conversion of declarative (short and very long) sentences and questions; and real time parallel operation, running on minimum 30 channels. The concept of prosody generation and sound duration processing is introduced. Also, the development environment of Profivox is presented. The market-leader Hungarian mobile service provider applies the TTS system in an automatic e-mail reading application.
text, speech and dialogue | 2006
Márk Fék; Péter Pesti; Géza Németh; Csaba Zainkó; Gábor Olaszy
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian The experimental system generates weather forecasts in Hungarian 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech A Hungarian speech recognizer was applied to label speech sound boundaries Word boundaries were also marked automatically The unit selection follows a top-down hierarchical scheme using words and speech sounds as units A simple prosody model is used, based on the relative position of words within a prosodic phrase The quality of the system was compared to two earlier Hungarian TTS systems A subjective listening test was performed by 221 listeners The experimental system scored 3.92 on a five-point mean opinion score (MOS) scale The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.
International Journal of Speech Technology | 2000
Géza Németh; Csaba Zainkó; László Fekete; Gábor Olaszy; Gábor Endrédi; Péter Olaszi; Géza Kiss; Péter Kis
The markets leading Hungarian Global System for Mobile Communications (GSM) operator—Westel—has recently introduced a Hungarian e-mail reading system as a regular service. It was implemented on the basis of an experimental system developed at the Department of Telecommunications and Telematics of the Budapest University of Technology and Economics (DTT BUTE). In this article, the considerations involved in the design and implementation decisions of both the experimental and the industrial systems will be described. Results of the first 10 weeks of regular use of the industrial system will also be given.
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction | 2008
Csaba Zainkó; Márk Fék; Géza Németh
In this paper we explore the use of emotion-specific speech inventories for expressive speech synthesis. We recorded a semantically neutral sentence and 26 logatoms containing all the diphones and CVC triphones necessary to synthesize the same sentence. The speech material was produced by a professional actress expressing all logatoms and the sentence with the six basic emotions and in neutral tone. 7 emotion-dependent inventories were constructed from the logatoms. The 7 inventories paired with the prosody extracted from the 7 natural sentences were used to synthesize 49 sentences. 194 listeners evaluated the emotions expressed in the logatoms and in the natural and synthetic sentences. The intended emotion was recognized above chance level for 99% of the logatoms and for all natural sentences. Recognition rates significantly above chance level were obtained for each emotion. The recognition rate for some synthetic sentences exceeded that of natural ones.
Archive | 2008
Géza Németh; Géza Kiss; Csaba Zainkó; Gábor Olaszy; Bálint Tóth
Mobile phones became indispensable friends for many people. They are being used in all spaces of life including the car. The security risk of this situation has motivated severe regulation of use on one hand and on the other hand, increased attention to built-in speech recognition. Far less attention has been paid however to possible advantages of automatic speech generation by phones including text-to-speech (TTS). This chapter addresses this domain. It will examine the general concepts and application areas of speaking mobile phones. In addition to the well known advantages for visually impaired, blind or speech impaired people such functionalities may help in the case of other hands-busy or eyes-busy situations (e.g., cooking in the kitchen). The advancement of this area is due to the appearance of mobile phone operating systems (Symbian, Palm OS, MS Smartphone and Linux Mobile) which can run applications created by developers independent from the phone manufacturers. A case study of a speaking aid mobile phone application and the first automatic SMS-reading mobile phone application introduced in Hungary in October 2003 will also be presented. It is shown that the proper combination of careful user interface design and high quality TTS should be supplemented by automatic language identification and other modules as well. Analysis of these supplementary modules is also presented.
international conference natural language processing | 2003
Géza Németh; Csaba Zainkó; Géza Kiss; Márk Fék; Géza Gordos; Gábor Olaszy
Name and address reading is an important combined application area of language processing and text-to-speech (TTS) systems. It is the cornerstone of both traditional reverse directory telephone services and new, location based, traffic and tour guide applications. The language processing aspects of a solution for Hungarian is described. The work was based on the analysis of a subscriber database containing about 3 million records (there are about 10 million Hungarian citizens). Categories of name and address elements were defined. A program for the automatic classification of database records was developed. Statistical parameters were derived about proper/legal names and addresses. Based on these results text corpora for enriching the TTS acoustic database were designed. Reading strategies and related special algorithms and tables were developed for the description of complex name categories. Our results may be applied for similar tasks of other languages with comparable linguistic and statistical features.
international conference on computers helping people with special needs | 2008
Géza Németh; Gábor Olaszy; Mátyás Bartalis; Géza Kiss; Csaba Zainkó; Péter Mihajlik; Csaba Haraszti
Aged and visually impaired persons belong to those groups of people, who can get information about drugs not so easily, as others. Although in Hungary lately Braille prints (containing the name of the medicament) are placed on the boxes of the drugs, but getting detailed information about the drug, i.e. to access the content of the written Patient Information Leaflets (PIL), is complicated. The Medicine Line (MLN) service may help in solving this problem. This automatic telephone information system was developed and put into operation in Hungary in December 2006. The computer system speaks and understands Hungarian, so the aged and visually impaired can get the information about the drug by voice. Adaptation to other languages is also possible. As we know, no such system is available in the European Union.
2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013
Csaba Zainkó; Bálint Tóth; Mátyás Bartalis; Géza Németh; Tibor Fegyó
Senior citizens are in the focus of current research in Europe. This paper investigates the usability aspects of synthetic voices intended for elderly people in Ambient Assisted Living (AAL) systems. The first topic of the study is the selection of an appropriate age of Personal Life Assistants (PLA) voice intended for active seniors. The second topic is whether the users own voice is feasible in personal messages. Third, the use of rather short speech corpora from elderly people for HMM speaker adaptation is studied. The question is whether adapted voice is categorized to the same age group by listeners as the original. Corpus based unit-selection TTS and adapted HMM-TTS voices were created from elderly speech samples and these are compared to other middle-aged and elderly voices. In listening tests the synthesized sentences were evaluated and compared to natural speech samples by elderly test subjects. The authors found that the TTS voices of more pleasant (younger) speakers are preferred, HMM-TTS adapted voices of elderly speakers retained age identification features of the original recordings and are suitable for personal messages.
Acta Linguistica Hungarica | 2002
Géza Németh; Csaba Zainkó