Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tibor Fegyó is active.

Publication


Featured researches published by Tibor Fegyó.


International Journal of Speech Technology | 2000

Automatic Recognition of Hungarian: Theory And Practice

Málté Szarvas; Tibor Fegyó; Péter Mihajlik; Péter Tatai

This article describes the problems encountered during the design and implementation of automatic speech recognition systems for the Hungarian language, proposes practical solutions for treating them, and evaluates their practicality using publicly available databases. The article introduces a rule-based system for modeling the phonological rules inside of words as well as at word boundaries and the notion of stochastic morphological analysis for the treatment of the vocabulary size problem. Finally, the implementation of the proposed methods by the FlexiVoice speech engine is described, and the results of the experimental evaluation on isolated and connected digit recognition, on a 2000-word recognition of Hungarian city names, and on inflected word recognition tasks are summarized.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task

Péter Mihajlik; Zoltán Tüske; Balázs Tarján; Bottyán Németh; Tibor Fegyó

Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme- and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.


Procedia Computer Science | 2014

Speech-centric Multimodal Interaction for Easy-to-access Online Services – A Personal Life Assistant for the Elderly

António J. S. Teixeira; Annika Hämäläinen; Jairo Avelar; Nuno Almeida; Géza Németh; Tibor Fegyó; Csaba Zainkó; Tamás Gábor Csapó; Bálint Tóth; André Oliveira; Miguel Sales Dias

Abstract The PaeLife project is a European industry-academia collaboration whose goal is to provide the elderly with easy access to online services that make their life easier and encourage their continued participation in the society. To reach this goal, the project partners are developing a multimodal virtual personal life assistant (PLA) offering a wide range of services from weather information to social networking. This paper presents the multimodal architecture of the PLA, the services provided by the PLA, and the work done in the area of speech input and output modalities, which play a key role in the application.


2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD) | 2011

Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment

Gellért Sárosi; Mihály Mozsáry; Péter Mihajlik; Tibor Fegyó

A crucial part of a speech recognizer is the acoustic feature extraction, especially when the application is intended to be used in noisy environment. In this paper we investigate several novel front-end techniques and compare them to multiple baselines. Recognition tests were performed on studio quality wide band recordings on Hungarian as well as on narrow band telephone speech including real-life noises collected in six languages: English, German, French, Italian, Spanish and Hungarian. The following baseline feature types were used with several settings: Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP) features implemented in HTK, SPHINX, or by ourselves. Novel methods include Perceptual Minimum Variance Distortionless Response (PMVDR) and multiple variations of the Power-Normalized Cepstral Coefficients (PNCC). Also, adaptive techniques are applied to reduce convolutive distortions. We have experienced a significant difference between the MFCC implementations, and there were major differences in the PNCC variations useful in the different bandwidths and noise conditions.


text speech and dialogue | 2007

Towards automatic transcription of large spoken archives in agglutinating languages - Hungarian ASR for the MALACH project

Péter Mihajlik; Tibor Fegyó; Bottyán Németh; Zoltán Tüske; Viktor Trón

The paper describes automatic speech recognition experiments and results on the spontaneous Hungarian MALACH speech corpus. A novel morph-based lexical modeling approach is compared to the traditional word-based one and to another, previously best performing morph-based one in terms of word and letter error rates. The applied language and acoustic modeling techniques are also detailed. Using unsupervised speaker adaptations along with morph based lexical models 14.4%-8.1% absolute word error rate reductions have been achieved on a 2 speakers, 2 hours test set as compared to the speaker independent baseline results.


Procedia Computer Science | 2015

Multilingual Speech Recognition for the Elderly: The AALFred Personal Life Assistant.

Annika Hämäläinen; António J. S. Teixeira; Nuno Almeida; Hugo Meinedo; Tibor Fegyó; Miguel Sales Dias

Abstract The PaeLife project is a European industry-academia collaboration in the framework of the Ambient Assisted Living Joint Programme (AAL JP), with a goal of developing a multimodal, multilingual virtual personal life assistant to help senior citizens remain active and socially integrated. Speech is one of the key interaction modalities of AALFred, the Windows application developed in the project; the application can be controlled using speech input in four European languages: French, Hungarian, Polish and Portuguese. This paper briefly presents the personal life assistant and then focuses on the speech-related achievements of the project. These include the collection, transcription and annotation of large corpora of elderly speech, the development of automatic speech recognisers optimised for elderly speakers, a speech modality component that can easily be reused in other applications, and an automatic grammar translation service that allows for fast expansion of the automatic speech recognition functionality to new languages.


international conference on speech and computer | 2015

Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Ádám Varga; Balázs Tarján; Zoltán Tobler; György Szaszák; Tibor Fegyó; Csaba Bordás; Péter Mihajlik

In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during decoding with various models tailored for Hungarian broadcast speech recognition. Two decoders are compared on the direct transcription task of broadcast conversation recordings, and setups employing re-speakers are also tested. Moreover, the models are evaluated on a broadcast news transcription task as well, and different language models (LMs) are tested in order to demonstrate the performance of our systems in settings when low memory consumption is a less crucial factor.


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Improved recognition of Hungarian call center conversations

Balázs Tarján; Gellért Sárosi; Tibor Fegyó; Péter Mihajlik

This paper summarizes our recent efforts made to automatically transcribe call center conversations in real-time. Data sparseness issue is addressed due to the small amount of transcribed training data. Accordingly, first the potentials in the inclusion of additional non-conventional training texts are investigated, and then morphological language models are introduced to handle data insufficiency. The baseline system is also extended with explicit models for non-verbal speech events such as hesitation or consent. In addition, all the above techniques are efficiently combined in the final system. The benefit of each approach is evaluated on real-life call center recordings. Results show that by utilizing morphological language models, significant error rate reduction can be achieved over the word baseline system, which is preserved across experimental setups. The results can be further improved if nonverbal events are also modeled.


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Some aspects of synthetic elderly voices in ambient assisted living systems

Csaba Zainkó; Bálint Tóth; Mátyás Bartalis; Géza Németh; Tibor Fegyó

Senior citizens are in the focus of current research in Europe. This paper investigates the usability aspects of synthetic voices intended for elderly people in Ambient Assisted Living (AAL) systems. The first topic of the study is the selection of an appropriate age of Personal Life Assistants (PLA) voice intended for active seniors. The second topic is whether the users own voice is feasible in personal messages. Third, the use of rather short speech corpora from elderly people for HMM speaker adaptation is studied. The question is whether adapted voice is categorized to the same age group by listeners as the original. Corpus based unit-selection TTS and adapted HMM-TTS voices were created from elderly speech samples and these are compared to other middle-aged and elderly voices. In listening tests the synthesized sentences were evaluated and compared to natural speech samples by elderly test subjects. The authors found that the TTS voices of more pleasant (younger) speakers are preferred, HMM-TTS adapted voices of elderly speakers retained age identification features of the original recordings and are suitable for personal messages.


text speech and dialogue | 2010

Some aspects of ASR transcription based unsupervised speaker adaptation for HMM speech synthesis

Bálint Tóth; Tibor Fegyó; Géza Németh

Statistical parametric synthesis offers numerous techniques to create new voices. Speaker adaptation is one of the most exciting ones. However, it still requires high quality audio data with low signal to noise ration and precise labeling. This paper presents an automatic speech recognition based unsupervised adaptation method for Hidden Markov Model (HMM) speech synthesis and its quality evaluation. The adaptation technique automatically controls the number of phone mismatches. The evaluation involves eight different HMM voices, including supervised and unsupervised speaker adaptation. The effects of segmentation and linguistic labeling errors in adaptation data are also investigated. The results show that unsupervised adaptation can contribute to speeding up the creation of new HMM voices with comparable quality to supervised adaptation.

Collaboration


Dive into the Tibor Fegyó's collaboration.

Top Co-Authors

Avatar

Péter Mihajlik

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Balázs Tarján

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Géza Németh

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Péter Tatai

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

András Balog

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Gellért Sárosi

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Bálint Tóth

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Csaba Zainkó

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Géza Gordos

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Máté Szarvas

Budapest University of Technology and Economics

View shared research outputs
Researchain Logo
Decentralizing Knowledge