Is this you? Create Your Porfile

Bálint Tóth

Budapest University of Technology and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bálint Tóth is active.

Explore More

Publication

Featured researches published by Bálint Tóth.

Procedia Computer Science | 2014

Speech-centric Multimodal Interaction for Easy-to-access Online Services – A Personal Life Assistant for the Elderly

António J. S. Teixeira; Annika Hämäläinen; Jairo Avelar; Nuno Almeida; Géza Németh; Tibor Fegyó; Csaba Zainkó; Tamás Gábor Csapó; Bálint Tóth; André Oliveira; Miguel Sales Dias

Abstract The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of services from weather information to social networking. This paper presents the multimodal architecture of the PLA, the services provided by the PLA, and the work done in the area of speech input and output modalities, which play a key role in the application.

spoken language technology workshop | 2012

Synthesizing expressive speech from amateur audiobook recordings

Éva Székely; Tamás Gábor Csapó; Bálint Tóth; Péter Mihajlik; Julie Carson-Berndsen

Freely available audiobooks are a rich resource of expressive speech recordings that can be used for the purposes of speech synthesis. Natural sounding, expressive synthetic voices have previously been built from audiobooks that contained large amounts of highly expressive speech recorded from a professionally trained speaker. The majority of freely available audiobooks, however, are read by amateur speakers, are shorter and contain less expressive (less emphatic, less emotional, etc.) speech both in terms of quality and quantity. Synthesizing expressive speech from a typical online audiobook therefore poses many challenges. In this work we address these challenges by applying a method consisting of minimally supervised techniques to align the text with the recorded speech, select groups of expressive speech segments and build expressive voices for hidden Markov-model based synthesis using speaker adaptation. Subjective listening tests have shown that the expressive synthetic speech generated with this method is often able to produce utterances suited to an emotional message. We used a restricted amount of speech data in our experiment, in order to show that the method is generally applicable to most typical audiobooks widely available online.

Archive | 2008

Speech Generation in Mobile Phones

Géza Németh; Géza Kiss; Csaba Zainkó; Gábor Olaszy; Bálint Tóth

Mobile phones became indispensable friends for many people. They are being used in all spaces of life including the car. The security risk of this situation has motivated severe regulation of use on one hand and on the other hand, increased attention to built-in speech recognition. Far less attention has been paid however to possible advantages of automatic speech generation by phones including text-to-speech (TTS). This chapter addresses this domain. It will examine the general concepts and application areas of speaking mobile phones. In addition to the well known advantages for visually impaired, blind or speech impaired people such functionalities may help in the case of other hands-busy or eyes-busy situations (e.g., cooking in the kitchen). The advancement of this area is due to the appearance of mobile phone operating systems (Symbian, Palm OS, MS Smartphone and Linux Mobile) which can run applications created by developers independent from the phone manufacturers. A case study of a speaking aid mobile phone application and the first automatic SMS-reading mobile phone application introduced in Hungary in October 2003 will also be presented. It is shown that the proper combination of careful user interface design and high quality TTS should be supplemented by automatic language identification and other modules as well. Analysis of these supplementary modules is also presented.

Archive | 2007

Cross Platform Solution of Communication and Voice/Graphical User Interface for Mobile Devices in Vehicles

Géza Németh; Géza Kiss; Bálint Tóth

Two long-term goals of our study has been to develop a standardized communication interface between the mobile device and other onboard systems and to create a parametrical, scaleable user interface, both with voice and graphical user input/output. This chapter describes the main requirements, principles, and aspects of a voice/graphical user interface and of a Bluetooth based communication interface. Requirements and limitations for the implementation of speech synthesis on mobile devices will also be introduced. An SMS-reader application will be presented as a sample application of a mobile device on a vehicle.

international conference on computers for handicapped persons | 2004

Mobile devices converted into a speaking communication aid

Bálint Tóth; Géza Németh; Géza Kiss

The goal of the present study is to introduce a speaking interface of mobile devices for speech impaired people. The latest devices (including PDAs with integrated telephone, Smartphones, Tablet PCs) possess numerous favorable features: small size, portability, considerably fast processor speed, increased storage size, telephony, large display and convenient development environment. The majority of vocally handicapped users are elderly people who are often not familiar with computers. Many of them have other disorder(s) (e.g. motor) and/or impaired vision. The paper reports the design and implementation aspects of converting standard devices into a mobile speaking aid for face-to-face and telephone conversations. The device can be controlled and text is input by touch-screen and the output is generated by a text-to-speech system. The interface is configurable (screen colors and text size, speaking options, etc.) according to the users’ personal preferences.

ELMAR 2007 | 2007

Challenges of creating multimodal interfaces on mobile devices

Bálint Tóth; Géza Németh

Smart mobile devices support web technologies, audio and video playback, integrated extra features and 3rd party development beyond telephony. Multimodal user interfaces are favorable in mobile environments, when the traditional standard modality (keyboard and stylus) is supported by other modalities, primarily by speech input and output. Multimodality increases the usability of applications and makes them accessible for impaired persons, although creating multimodal interfaces on mobile devices is a challenging task. This paper investigates the main problems, briefly examines the upcoming issues of the graphical user interface and introduces the most important challenges of creating speech user interfaces on mobile devices. A new approach of creating multimodal, scaleable user interfaces for mobile devices is shortly discussed and some application scenarios are given, as well.

international conference on computers helping people with special needs | 2006

VoxAid 2006: telephone communication for hearing and/or vocally impaired people

Bálint Tóth; Géza Németh

Speech and/or hearing impaired people have difficulties with voice communication. In case of face-to-face conversation they can find a common communication channel (e.g. sign language, paper, etc.), but without an appropriate system they are unable to talk over the phone. The goal of the present study is to introduce the design and development steps of a system for vocally and/or hearing impaired people, which helps them to communicate via telephone with any person. Speech output is realized by text-to-speech (TTS) technology and speech input is provided by automatic speech recognition (ASR). The visual and the speech user interfaces enable users on both side of the phone line (a speech and hearing impaired person at one end, a non-speech-and-hearing-disabled person at the other end) to communicate

international conference on computers helping people with special needs | 2012

New features in the voxaid communication aid for speech impaired people

Bálint Tóth; Péter Nagy; Géza Németh

For speech impaired persons even daily communication may cause problems. In many common situations, where speech ability would be necessary, they are not able to hold on. An application that uses Text-To-Speech (TTS) conversion is usable not only in daily routine, but in treatment of speech impaired persons as a therapeutic application as well. The VoxAid framework from BME-TMIT gives solutions for these scenarios. This paper introduces the latest improvements of the Voxaid framework, including user tests and evaluation.

ist mobile and wireless communications summit | 2007

Creating XML Based Scalable Multimodal Interfaces for Mobile Devices

Bálint Tóth; Géza Németh

Portable smart devices are getting more and more popular. Many people prefer smart solutions over old devices or pen-and-paper. But developing applications for mobile devices is a challenging task. First, it is difficult to create an intuitive, easy to use interface; second as current mobile platforms are not compatible, the application must be rewritten on all of them. Furthermore it is not trivial to design multimodal interfaces (including speech and graphical I/O) which really improve software usability. For these reasons our goal is to create an XML based scalable, cross platform, multimodal user interface description format and the corresponding interpreter software for different platforms. This technology makes development for mobile devices much faster and easier.

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Some aspects of synthetic elderly voices in ambient assisted living systems

Csaba Zainkó; Bálint Tóth; Mátyás Bartalis; Géza Németh; Tibor Fegyó

Senior citizens are in the focus of current research in Europe. This paper investigates the usability aspects of synthetic voices intended for elderly people in Ambient Assisted Living (AAL) systems. The first topic of the study is the selection of an appropriate age of Personal Life Assistants (PLA) voice intended for active seniors. The second topic is whether the users own voice is feasible in personal messages. Third, the use of rather short speech corpora from elderly people for HMM speaker adaptation is studied. The question is whether adapted voice is categorized to the same age group by listeners as the original. Corpus based unit-selection TTS and adapted HMM-TTS voices were created from elderly speech samples and these are compared to other middle-aged and elderly voices. In listening tests the synthesized sentences were evaluated and compared to natural speech samples by elderly test subjects. The authors found that the TTS voices of more pleasant (younger) speakers are preferred, HMM-TTS adapted voices of elderly speakers retained age identification features of the original recordings and are suitable for personal messages.

Explore More