Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Balázs Tarján is active.

Publication


Featured researches published by Balázs Tarján.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task

Péter Mihajlik; Zoltán Tüske; Balázs Tarján; Bottyán Németh; Tibor Fegyó

Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme- and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.


international conference on speech and computer | 2015

Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Ádám Varga; Balázs Tarján; Zoltán Tobler; György Szaszák; Tibor Fegyó; Csaba Bordás; Péter Mihajlik

In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during decoding with various models tailored for Hungarian broadcast speech recognition. Two decoders are compared on the direct transcription task of broadcast conversation recordings, and setups employing re-speakers are also tested. Moreover, the models are evaluated on a broadcast news transcription task as well, and different language models (LMs) are tested in order to demonstrate the performance of our systems in settings when low memory consumption is a less crucial factor.


2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD) | 2013

Improved recognition of Hungarian call center conversations

Balázs Tarján; Gellért Sárosi; Tibor Fegyó; Péter Mihajlik

This paper summarizes our recent efforts made to automatically transcribe call center conversations in real-time. Data sparseness issue is addressed due to the small amount of transcribed training data. Accordingly, first the potentials in the inclusion of additional non-conventional training texts are investigated, and then morphological language models are introduced to handle data insufficiency. The baseline system is also extended with explicit models for non-verbal speech events such as hesitation or consent. In addition, all the above techniques are efficiently combined in the final system. The benefit of each approach is evaluated on real-life call center recordings. Results show that by utilizing morphological language models, significant error rate reduction can be achieved over the word baseline system, which is preserved across experimental setups. The results can be further improved if nonverbal events are also modeled.


COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment | 2010

Recognition of multiple language voice navigation queries in traffic situations

Gellért Sárosi; Tamás Mozsolics; Balázs Tarján; András Balog; Péter Mihajlik; Tibor Fegyó

This paper introduces our work and results related to a multiple language continuous speech recognition task. The aim was to design a system that introduces tolerable amount of recognition errors for point of interest words in voice navigational queries even in the presence of real-life traffic noise. Additional challenges were that no task-specific training databases were available for language and acoustic modeling. Instead, general purpose acoustic database were obtained and (probabilistic) context free grammars were constructed for the acoustic and language models, respectively. Public pronunciation lexicon was used for the English language, whereas rule- and exception dictionary based pronunciation modeling was applied for French, German, Italian, Spanish and Hungarian. For the last four languages the classical phoneme-based pronunciation modeling approach was compared to grapheme-based pronunciation modeling technique, as well. Noise robustness was addressed by applying various feature extraction methods. The results show that achieving high word recognition accuracy is feasible if cooperative speakers can be assumed.


International Conference on Statistical Language and Speech Processing | 2017

Low Latency MaxEnt- and RNN-Based Word Sequence Models for Punctuation Restoration of Closed Caption Data

Máté Ákos Tündik; Balázs Tarján; György Szaszák

Automatic Speech Recognition (ASR) rarely addresses the punctuation of the obtained transcriptions. Recently, Recurrent Neural Network (RNN) based models were proposed in automatic punctuation exploiting wide word contexts. In real-time ASR tasks such as closed captioning of live TV streams, text based punctuation poses two particular challenges: a requirement for low latency (limiting the future context), and the propagation of ASR errors, seen more often for informal or spontaneous speech. This paper investigates Maximum Entropy (MaxEnt) and RNN punctuation models in such real-time conditions, but also compares the models to off-line setups. As expected, the RNN outperforms the MaxEnt baseline system. Limiting future context results only in a slighter performance drop, whereas ASR errors influence punctuation performance considerably. A genre analysis is also carried out w.r.t. the punctuation performance. Our approach is also evaluated on TED talks within the IWSLT English dataset providing comparable results to the state-of-the-art systems.


Intelligent Decision Technologies | 2014

Automated transcription of conversational Call Center speech – with respect to non-verbal acoustic events

Gellért Sárosi; Balázs Tarján; Tibor Fegyó; Péter Mihajlik

This paper summarizes our recent efforts made to transcribe real-life Call Center conversations automatically with respect to non-verbal acoustic events, as well. Future Call Centers – as cognitive infocom systems – must respond automatically not only for well formed utterances but also for spontaneous and non-word speaker manifestations and must be robust against sudden noises. Conversational telephony speech transcription itself is a big challenge, primarily we address this issue on real-life (Bank and Insurance) tasks. In addition, we introduce several non-word acoustic modeling approaches and their integration to LVCSR (Large Vocabulary Continuous Speech Recognition). In the experiments, one and two channel (client and agent speech merged into one or left in two separate audio stream) transcription results, cross-task results and the handling of transcription data insufficiency are investigated – in parallel with the non-verbal acoustic event modeling. On the agent side less than 15% word error rate could be achieved and the best error rate reduction is 20% (relative) due to the inclusion of various written corpora and due to acoustic event handling.


SLTU | 2010

On morph-based LVCSR improvements.

Balázs Tarján; Péter Mihajlik


2011 2nd International Conference on Cognitive Infocommunications (CogInfoCom) | 2011

Evaluation of lexical models for Hungarian Broadcast speech transcription and spoken term detection

Balázs Tarján; Péter Mihajlik; András Balog; Tibor Fegyó


ieee international conference on cognitive infocommunications | 2012

On modeling non-word events in Large Vocabulary Continuous Speech Recognition

Gellért Sárosi; Balázs Tarján; András Balog; T. Mozsolics; Péter Mihajlik; Tibor Fegyó


Acta Cybernetica | 2010

Speech recognition experiments with audiobooks

László Tóth; Balázs Tarján; Gellért Sárosi; Péter Mihajlik

Collaboration


Dive into the Balázs Tarján's collaboration.

Top Co-Authors

Avatar

Péter Mihajlik

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Tibor Fegyó

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

András Balog

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Gellért Sárosi

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

György Szaszák

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Lili Szabó

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Máté Ákos Tündik

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Bottyán Németh

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tamás Mozsolics

Budapest University of Technology and Economics

View shared research outputs
Researchain Logo
Decentralizing Knowledge