Is this you? Create Your Porfile

Dávid Sztahó

Budapest University of Technology and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dávid Sztahó is active.

Explore More

Publication

Featured researches published by Dávid Sztahó.

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction | 2008

Speech Emotion Perception by Human and Machine

Szabolcs Levente Tóth; Dávid Sztahó; Klára Vicsi

The human speech contains and reflects information about the emotional state of the speaker. The importance of research of emotions is increasing in telematics, information technologies and even in health services. The research of the mean acoustical parameters of the emotions is a very complicated task. The emotions are mainly characterized by suprasegmental parameters, but other segmental factors can contribute to the perception of the emotions as well. These parameters are varying within one language, according to speakers etc. In the first part of our research work, human emotion perception was examined. Steps of creating an emotional speech database are presented. The database contains recordings of 3 Hungarian sentences with 8 basic emotions pronounced by nonprofessional speakers. Comparison of perception test results obtained with database recorded by nonprofessional speakers showed similar recognition results as an earlier perception test obtained with professional actors/actresses. It was also made clear, that a neutral sentence before listening to the expression of the emotion pronounced by the same speakers cannot help the perception of the emotion in a great extent. In the second part of our research work, an automatic emotion recognition system was developed. Statistical methods (HMM) were used to train different emotional models. The optimization of the recognition was done by changing the acoustic preprocessing parameters and the number of states of the Markov models.

SMART INNOVATION, SYSTEMS AND TECHNOLOGIES | 2016

Language independent detection possibilities of depression by speech

Gábor Kiss; Miklos Gabriel Tulics; Dávid Sztahó; Anna Esposito; Klára Vicsi

In this study, acoustic-phonetic analysis of continuous speech and statistical analyses were performed in order to find parameters in depressed speech that show significant differences compared to a healthy reference group. Read speech materials were gathered in the Hungarian and Italian languages from both healthy people and patients diagnosed with different degrees of depression. By statistical examination it was found that there are many parameters in the speech of depressed people that show significant differences compared to a healthy reference group. Moreover, most of those parameters behave similarly in other languages such as in Italian. For classification of the healthy and depressed speech, these parameters were used as an input for the classifiers. Two classification methods were compared: Support Vector Machine (SVM) and a two-layer feed-forward neural network (NN). No difference was found between the results of the two methods when trained and tested on Hungarian language (both SVM and NN classification accuracy was 75 %). In the case of training with Hungarian and testing with Italian healthy and depressed speech both classifiers reached 77 % of accuracy.

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues | 2010

Problems of the automatic emotion recognitions in spontaneous speech: an example for the recognition in a dispatcher center

Klára Vicsi; Dávid Sztahó

Numerous difficulties, in the examination of emotions occurring in continuous spontaneous speech, are discussed in this paper, than different emotion recognition experiments are presented, using clauses as the recognition unit. In a testing experiment it was examined that what kind of acoustical features are the most important for the characterization of emotions, using spontaneous speech database. An SVM classifier was built for the classification of 4 most frequent emotions. It was found that fundamental frequency, energy, and its dynamics in a clause are the main characteristic parameters for the emotions, and the average spectral information, as MFCC and harmonicity are also very important. In a real life experiment automatic recognition system was prepared for a telecommunication call center. Summing up the results of these experiments, we can say, that clauses can be an optimal unit of the recognition of emotions in continuous speech.

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment | 2010

Automatic classification of emotions in spontaneous speech

Dávid Sztahó; Viktor Imre; Klára Vicsi

Numerous examinations are performed related to automatic emotion recognition and speech detection in the Laboratory of Speech Acoustics. This article reviews results achieved for automatic emotion recognition experiments on spontaneous speech databases on the base of the acoustical information only. Different acoustic parameters were compared for the acoustical preprocessing, and Support Vector Machines were used for the classification. In spontaneous speech, before the automatic emotion recognition, speech detection and speech segmentation are needed to segment the audio material into the unit of recognition. At present, phrase was selected as a unit of segmentation. A special method was developed on the base of Hidden Markov Models, which can process the speech detection and automatic phrase segmentation simultaneously. The developed method was tested in a noisy spontaneous telephone speech database. The emotional classification was prepared on the detected and segmented speech.

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony | 2009

Subjective tests and automatic sentence modality recognition with recordings of speech impaired children

Dávid Sztahó; Katalin Nagy; Klára Vicsi

Prosody recognition experiments have been prepared in the Laboratory of Speech Acoustics, in which, among the others, we were searching for the possibilities of the recognition of sentence modalities. Due to our promising results in the sentence modality recognition, we adopted the method for children modality recognition, and looked for the possibility, how it can be used as an automatic feedback in an audio - visual pronunciation teaching and training system. Our goal was to develop a sentence intonation teaching and training system for speech handicapped children, helping them to learn the correct prosodic pronunciation of sentence. HMM models of modality types were built by training the recognizer with a correctly speaking children database. During the present work, a large database was collected from speech impaired children. Subjective tests were carried out with this database of speech impaired children, in order to examine how human listeners are able to categorize the heard recordings of sentence modalities. Then automatic sentence modality recognition experiments were done with the formerly trained HMM models. By the result of the subjective tests, the probability of acceptance of the sentence modality recognizer can be adjusted. Comparing the result of the subjective tests and the results of the automatic sentence modality recognition tests processed on the database of speech impaired children, it is showed that the automatic recognizer classified the recordings more strictly, but not worse. The introduced method could be implemented as a part of a speech teaching system.

Computer Speech & Language | 2018

Computer based speech prosody teaching system

Dávid Sztahó; Gábor Kiss; Klára Vicsi

Abstract Children who are born with a profound hearing loss have no or only distorted acoustic speech target to imitate and compare their own production with. Computer based visual feedback, visual presentation of speech on screen has shown to be an effective supplement of incomplete or distorted auditory feedback in the case of children with grave hearing-impairment. In this paper, we introduce a novel prosody teaching system where intensity (accent), intonation and rhythm are presented visually for the students (in both separate and combined display mode) as visual feedback and automatic assessment scores are given jointly and separately for the goodness of intonation and rhythm. Evaluation of the automatic assessment was done with cooperation of experts in the field of treatment of hard of hearing children. The results showed that the automatic assessment scores correspond to the subjective evaluations given by the teachers. The evaluation of the whole system was done in a school for hard of hearing children, by comparing the development of a group of students using our prosody teaching system with the development of a control group. The speaking ability of students were compared by a subjective listening experiment after a 3 months teaching course. The students who used the computer based prosody teaching software could produce nicer prosody than the students in the control group.

International Conference on Statistical Language and Speech Processing | 2016

Estimating the Severity of Parkinson’s Disease Using Voiced Ratio and Nonlinear Parameters

Dávid Sztahó; Klára Vicsi

Parkinson’s disease severity estimation analysis was carried out using speech database of Spanish speakers. Correlation measurements were performed between acoustic features and the UPDRS severity. The applied acoustic features were the followings: voicing ratio (VR), nonlinear recurrence: the normalized recurrence probability density entropy (\(H_{norm}\)) and fractal scaling: the scaling exponent (\(\alpha \)). High diversity is found according to the type of speech sound production, and hence according to the text and gender. Based on the results of correlation calculations, prediction of the UPDRS values was performed using regression technique applying neural networks. The results showed that the applied features are capable of estimating the severity of the PD. By assigning the mean predicted UPDRS for each corresponding speaker using the best correlated linguistic contents, the result of the Interspeech 2015 Sub-challenge winner was exceeded. By training NN models separately for males and females the accuracy was further increased.

Intelligent Decision Technologies | 2014

Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition

Dávid Sztahó; Klára Vicsi

In speech communication emotions play a great role in expressing information. These emotions are partly given as reactions to our environment, to our partners during a conversation. Understanding these reactions and recognizing them automatically is highly important. Through them, we can get a clearer picture of the response of our partner in a conversation. In Cognitive InfoCommunication this kind of information helps us to develop robots, devices that are more aware of the need of the user, making the device easy and enjoyable to use. In our laboratory we conducted automatic emotion classification and speech segmentation experiments. In order to develop an automatic emotion recognition system on the basis of speech, an automatic speech segmenter is also needed to separate the speech segments needed for the emotion analysis. In our former research we found that the intonational phrase can be a proper unit of emotion analysis. In this paper speech detection and segmentation methods are developed. For speech detection, Hidden Markov Models are used with various noise and speech acoustic models. The results show that the procedure is able to detect speech in the sound signal with more than 91% accuracy and segment it into intonational phrases.

Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions | 2009

Automatic Sentence Modality Recognition in Children's Speech, and Its Usage Potential in the Speech Therapy

Dávid Sztahó; Katalin Nagy; Klára Vicsi

In the Laboratory of Speech Acoustics prosody recognition experiments have been prepared, in which, among the others, we were searching for the possibilities of the recognition of sentence modalities. Due to our promising results in the sentence modality recognition, we adopted the method for children modality recognition, and looked for the possibility, how it can be used as an automatic feedback in an audio - visual pronunciation teaching and training system. Our goal was to develop a sentence intonation teaching and training system for speech handicapped children, helping them to learn the correct prosodic pronunciation of sentences. In the experiment basic sentence modality models have been developed and used. For the training of these models, we have recorded a speech prosody database with correctly speaking children, processed and segmented according to the types of modalities. At the recording of this database, 59 children read a text of one word sentences, simple and complex sentences. HMM models of modality types were built by training the recognizer with this correctly speaking children database. The result of the children sentence modality recognition was not adequate enough for the purpose of automatic feedback in case of pronunciation training. Thus another way of classification was prepared. This time the recordings of the children were sorted rigorously by the type of the intonation curves of sentences, which were different in many cases from the sentence modality classes. With the new classes, further tests were carried out. The trained HMM models were used, not for the recognition of the modality of sentences, but checking the correctness of the intonation of sentences pronounced by speech handicapped children. Therefore, an initial database, consisting of the recordings of the voices of two speech handicapped children had been prepared, similar to the database of healthy children.

ieee international conference on cognitive infocommunications | 2013