Gábor Gosztolya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gábor Gosztolya is active.

Explore More

Publication

Featured researches published by Gábor Gosztolya.

international symposium on applied machine intelligence and informatics | 2011

Spoken term detection based on the most probable phoneme sequence

Gábor Gosztolya; László Tóth

The aim of the spoken term detection task is to find the occurrence of user-entered keywords in an archive of audio recordings. In this area, besides the accuracy of hits returned, the speed of search is also very important, for which an intermediate representation of recordings is normally used. In this paper we evaluate a spoken term detection method which represents the speech signals by their most probable phoneme sequence, on which a dynamic search is then performed. As the accuracy of the phoneme recognizer used is vital, we shall test this method by using several approaches of phoneme identification. We found that our method already achieves satisfactory accuracy, although its run time is still rather high. We also found that this approach is heavily dependent on the performance of the phoneme recognizer.

international symposium on neural networks | 2004

Replicator Neural Networks for Outlier Modeling in Segmental Speech Recognition

László Tóth; Gábor Gosztolya

This paper deals with outlier modeling within a very special framework: a segment-based speech recognizer. The recognizer is built on a neural net that, besides classifying speech segments, has to identify outliers as well. One possibility is to artificially generate outlier samples, but this is tedious, error-prone and significantly increases the training time. This study examines the alternative of applying a replicator neural net for this task, originally proposed for outlier modeling in data mining. Our findings show that with a replicator net the recognizer is capable of a very similar performance, but this time without the need for a large amount of outlier data.

industrial and engineering applications of artificial intelligence and expert systems | 2003

Improving the multi-stack decoding algorithm in a segment-based speech recognizer

Gábor Gosztolya; András Kocsor

During automatic speech recognition selecting the best hypothesis over a combinatorially huge hypothesis space is a very hard task, so selecting fast and efficient heuristics is a reasonable strategy. In this paper a general purpose heuristic, the multi-stack decoding method, was refined in several ways. For comparison, these improved methods were tested along with the well-known Viterbi beam search algorithm on a Hungarian number recognition task where the aim was to minimize the scanned hypothesis elements during the search process. The test showed that our method runs 6 times faster than the basic multistack decoding method, and 9 times faster than the Viterbi beam search method.

text speech and dialogue | 2004

Aggregation Operators and Hypothesis Space Reductions in Speech Recognition

Gábor Gosztolya; András Kocsor

In this paper we deal with the heuristic exploration of general hypothesis spaces arising both in the HMM and segment-based approaches of speech recognition. The generated hypothesis space is a tree where we assign costs to its nodes. The tree and the costs are both generated in a top-down way where we have node extension rules and aggregation operators for the cost calculation. We introduce a special set of mean aggregation operators suitable for speech recognition tasks. Then we discuss the efficiency of some heuristic search methods like the Viterbi beam search, multi-stack decoding algorithm, and some improvements using these aggregation operators. The tests showed that this technique could significantly speed up the recognition process. The run-times we obtained were 2 times faster than the basic multi-stack decoding method, and 4 times faster than the Viterbi beam search method.

conference of the international speech communication association | 2016

Detecting mild cognitive impairment from spontaneous speech by correlation-based phonetic feature selection

Gábor Gosztolya; László Tóth; Tamás Grósz; Veronika Vincze; Ildikó Hoffmann; Gréta Szatlóczki; Magdolna Pákáski; János Kálmán

Mild Cognitive Impairment (MCI), sometimes regarded as a prodromal stage of Alzheimer’s disease, is a mental disorder that is difficult to diagnose. Recent studies reported that MCI causes slight changes in the speech of the patient. Our previous studies showed that MCI can be efficiently classified by machine learning methods such as Support-Vector Machines and Random Forest, using features describing the amount of pause in the spontaneous speech of the subject. Furthermore, as hesitation is the most important indicator of MCI, we took special care when handling filled pauses, which usually correspond to hesitation. In contrast to our previous studies which employed manually constructed feature sets, we now employ (automatic) correlation-based feature selection methods to find the relevant feature subset for MCI classification. By analyzing the selected feature subsets we also show that features related to filled pauses are useful for MCI detection from speech samples.

International Journal of Speech Technology | 2006

The use of speed-up techniques for a speech recognizer system

András Kocsor; Gábor Gosztolya

In speech recognition, not just the accuracy of an automatic speech recognition application is important, but also its speed. However, if we want to create a real-time speech recognizer, this requirement limits the time that is spent on searching for the best hypothesis, which can even affect the recognition accuracy. Thus the applied search method plays an important role in the speech recognition task, and so does its efficiency, i.e. how quickly it finds the uttered words. To speed up this search process, various ideas are available in the literature: we can use search heuristics, multi-pass search, or apply a family of aggregation operators. In this paper we test all these methods in turn, and combine them with a set of other novel speed-up ideas. The test results confirm that all of these techniques are valuable: using combinations of them helped make the speech recognition process over 12 times faster than the basic multi-stack decoding algorithm, and almost 11 times faster than the Viterbi beam search method.

conference of the international speech communication association | 2016

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis

Gábor Gosztolya; Tamás Grósz; György Szaszák; László Tóth

In the Sincerity Sub-Challenge of the Interspeech ComParE 2016 Challenge, the task is to estimate user-annotated sincerity scores for speech samples. We interpret this challenge as a ranklearning regression task, since the evaluation metric (Spearman’s correlation) is calculated from the rank of the instances. As a first approach, Deep Neural Networks are used by introducing a novel error criterion which maximizes the correlation metric directly. We obtained the best performance by combining the proposed error function with the conventional MSE error. This approach yielded results that outperform the baseline on the Challenge test set. Furthermore, we introduce a compact prosodic feature set based on a dynamic representation of F0, energy and sound duration. We extract syllable-based prosodic features which are used as the basis of another machine learning step. We show that a small set of prosodic features is capable of yielding a result very close to the baseline one and that by combining the predictions yielded by DNN and the prosodic feature set, further improvement can be reached, significantly outperforming the baseline SVR on the Challenge test set.

conference of the international speech communication association | 2016

Determining Native Language and Deception Using Phonetic Features and Classifier Combination.

Gábor Gosztolya; Tamás Grósz; Róbert Busa-Fekete; László Tóth

For several years, the Interspeech ComParE Challenge has focused on paralinguistic tasks of various kinds. In this paper we focus on the Native Language and the Deception subchallenges of ComParE 2016, where the goal is to identify the native language of the speaker, and to recognize deceptive speech. As both tasks can be treated as classification ones, we experiment with several state-of-the-art machine learning methods (Support-Vector Machines, AdaBoost.MH and Deep Neural Networks), and also test a simple-yet-robust combination method. Furthermore, we will assume that the native language of the speaker affects the pronunciation of specific phonemes in the language he is currently using. To exploit this, we extract phonetic features for the Native Language task. Moreover, for the Deception Sub-Challenge we compensate for the highly unbalanced class distribution by instance re-sampling. With these techniques we are able to significantly outperform the baseline SVM on the unpublished test set.

international symposium on intelligent systems and informatics | 2010

Low-complexity audio compression methods for wireless sensors

Gábor Gosztolya; Dénes Paczolay; László Tóth

Wireless sensors are frequently used for recording surrounding speech and then sending it to a base station. Their way of communication via radio waves makes it important to employ some form of audio compression, while their limited RAM and low-capacity CPU restrict the range of methods which can be applied. In this paper a number of such methods are tested, and show that they can indeed be effective: a 30% bandwidth saving was achieved practically without information loss and a 50% bandwidth reduction at the cost of some negligible information loss.

industrial and engineering applications of artificial intelligence and expert systems | 2005

Speeding up dynamic search methods in speech recognition

Gábor Gosztolya; András Kocsor

In speech recognition huge hypothesis spaces are generated. To overcome this problem dynamic programming can be used. In this paper we examine ways of speeding up this search process even more using heuristic search methods, multi-pass search and aggregation operators. The tests showed that these techniques can be applied together, and their combination could significantly speed up the recognition process. The run-times we obtained were 22 times faster than the basic dynamic search method, and 8 times faster than the multistack decoding method.

Explore More