Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where András Beke is active.

Publication


Featured researches published by András Beke.


Journal of Language Modelling | 2012

Exploiting Prosody for Syntactic Analysis in Automatic Speech Understanding

György Szaszák; András Beke

The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-tospeech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.


text speech and dialogue | 2012

Unsupervised Clustering of Prosodic Patterns in Spontaneous Speech

András Beke; György Szaszák

Dealing with spontaneous speech constitutes big challenge both for linguistics and engineers of speech technology. For read speech, prosody was assessed as an automatic decomposition for phonological phrases using supervised method (HMM) in earlier experiments. However, when trying to adapt this automatic approach for spontaneous speech, the clustering of phonological phrase types becomes problematic: it is unknown which types can be characteristic and hence worth modelling. The authors decided to carry out a more flexible, unsupervised learning to cluster the data in order to evaluate and analyse whether some typical “spontaneous” patterns become selectable in spontaneous speech based on this automatic approach. This paper presents a method for clustering the typical prosody patterns of spontaneous speech based on k-means clustering.


text speech and dialogue | 2013

Automatic Laughter Detection in Spontaneous Speech Using GMM–SVM Method

Tilda Neuberger; András Beke

Spontaneous conversations frequently contain various non-verbal vocalizations (such as laughter). The accuracy of a speech recognizer may decrease in the case of spontaneous speech because of these non-verbal vocalization phenomena. The aim of the present research is to develop an accurate and efficient method in order to recognize laughter in spontaneous utterances. We used GMM in modeling the data and SVM for differentiating laughter from other speech events. The training and testing of the laughter detector were carried out using the BEA Hungarian spoken language database. The results show that the GMM–SVM system seems to be a particularly good method for solving this problem.


text speech and dialogue | 2014

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Tilda Neuberger; Dorottya Gyarmathy; Tekla Etelka Gráczi; Viktória Horváth; Mária Gósy; András Beke

In this paper, a large Hungarian spoken language database is introduced. This phonetically-based multi-purpose database contains various types of spontaneous and read speech from 333 monolingual speakers (about 50 minutes of speech sample per speaker). This study presents the background and motivation of the development of the BEA Hungarian database, describes its protocol and the transcription procedure, and also presents existing and proposed research using this database. Due to its recording protocol and the transcription it provides a challenging material for various comparisons of segmental structures of speech also across languages.


international conference on speech and computer | 2016

Automatic Summarization of Highly Spontaneous Speech

András Beke; György Szaszák

This paper addresses speech summarization of highly spontaneous speech. Speech is converted into text using an ASR, then segmented into tokens. Human made and automatic, prosody based tokenization are compared. The obtained sentence-like units are analysed by a syntactic parser to help automatic sentence selection for the summary. The preprocessed sentences are ranked based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence score is calculated as linear combination of the thematic term score and a sentence position score. To generate the summary, the top 10 candidates for the most informative/best summarizing sentences are selected. The system performance showed comparable results (recall: 0.62, precision: 0.79 and F-measure 0.68) with the prosody based tokenization approach. A subjective test is also carried out on a Likert scale.


Intelligent Decision Technologies | 2014

Phonetic analysis and automatic prediction of vowel duration in Hungarian spontaneous speech

András Beke; Mária Gósy

A large number of phonetic and phonology research papers analyzed segmental durations focusing on factors and interactions that determine their durations. The results often play an important role in Language Technology applications, for example in TTS (text-to-speech synthesis), ASR (automatic speech recognition) and are widely used in infocommunication. Speech sound duration depends on various factors such as phonetic quality, phonological context, phonological position in the word or in the utterance, speech style, etc. The multifunction dependence of vowel duration is more complex in those languages where vowel length is a distinctive feature like in Hungarian. The main goal of the present research was to analyze the physical durations of pairs of vowels in spontaneous speech that exhibit a phonological length opposition. In addition, we intended to develop an algorithm for automatic classification of the short and long vowels occurring in spontaneous speech. On the basis of these findings we intended to predict automatically the vowel durations based on three different methods. The measured data confirmed our hypothesis that phonologically short vs. long vowels would significantly differ in their physical durations in spontaneous speech. The results of the automatic vowel length classification also supported this finding. The third aspect of our investigations was to use different supervised learning methods in order to predict vowel duration, based on different feature vectors consisting of characteristic and spectral features. The best result was yielded by the combined features and FFNN were used. The correlation between the target and the predicted vowel duration was 0.79 while RMSE was 25 ms. The results obtained support the complexity of features affecting vowel duration, on the one hand, and indicate the temporal complexity of segments in spontaneous speech, as has been reported for Lithuanian, Czech, Hindi, Telugu and Korean, on the other hand.


Clinical Linguistics & Phonetics | 2018

Dichotic listening and sentence repetition performance in children with reading difficulties

Mária Gósy; Ruth Huntley Bahr; Dorottya Gyarmathy; András Beke

ABSTRACT Numerous investigations have identified weaknesses in speech processing and language skills in children with dyslexia; however, little is known about these abilities in children with reading difficulties (RD). The primary objective of this investigation was to determine the utility of auditory speech processing tasks in differentiating children with RD from those with typical reading skills. It was hypothesized that children, who perform below grade level in reading, would also show poorer performance on both dichotic listening and sentence repetition tasks because of the reciprocal influences of deficient auditory speech processing and language abilities. A total of 180 Hungarian-speaking, monolingual 8-, 9- and 10-year-old children, with and without RD, participated in dichotic listening and sentence repetition (modified by noise and morphosyntactic complexity) tasks. Performances were compared across ability groups, age and gender. Children with RD evidenced significantly poorer performance than controls on both tasks. Effects for age and gender were more noticeable in students with RD. Our findings support the notion that reading deficiencies are also associated with poor auditory speech processing and language abilities in cases where dyslexia is not diagnosed. We suggest that these tasks may be used as easy and fast screening tests in the identification of RD.


international joint conference on knowledge discovery knowledge engineering and knowledge management | 2016

Summarization of Spontaneous Speech using Automatic Speech Recognition and a Speech Prosody based Tokenizer

György Szaszák; Máté Ákos Tündik; András Beke

This paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity of the recognition task and high spontaneity of speech. An analysis is carried out to assess the propagation of speech recognition errors into syntactic parsing. We also propose an automatic, speech prosody based audio tokenization approach and compare it to human performance. The so obtained sentence-like tokens are analysed by the syntactic parser to help ranking based on thematic terms and sentence position. The thematic term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence scores are calculated as a linear combination of the thematic term score and a positional score. The summary is generated from the top 10 candidates. Results show that prosody based tokenization reaches human average performance and that speech recognition errors propagate moderately into syntactic parsing (POS tagging and dependency parsing). Nouns prove to be quite error resistant. Audio summarization shows 0.62 recall and 0.79 precision by an F-measure of 0.68, compared to human reference. A subjective test is also carried out on a Likert-scale. All results apply to spontaneous Hungarian.


text speech and dialogue | 2015

Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of Discontinuous Speech Prosody Modelling

György Szaszák; András Beke

Several studies use idealized, fluent utterances to comprehend spoken language. Disfluencies are often regarded to be just a noise in the speech flow. Other works argue that fragmented structures disfluencies, silent and filled pauses are important and can help better understanding. By extending the original concept of speech disfluency, the current paper involves the acoustic level and places the discontinuity of F0 in parallel with speech disfluencies. An exhaustive analysis of the advantages and disadvantages of using a continuous F0 estimate in prosodic event detection tasks is performed for formal and informal speaking styles. Results suggest that unlike in read formal speech, using a continuous, overall interpolated F0 curve is counterproductive in spontaneous informal speech. Comparing the behaviour of speech disfluencies and the effect of discontinuity of the F0 contour, results raise more general modelling philosophy considerations, as they suggest that disfluencies in informal speech may be by themselves informative entities, reflected also in the acoustic level organization of speech, which suggests that disfluencies in general are an important perceptual cue in human speech understanding.


language and technology conference | 2013

Boundary Markers in Spontaneous Hungarian Speech

András Beke; Mária Gósy; Viktória Horváth

The aim of this paper is an objective presentation of temporal features of spontaneous Hungarian narratives, as well as a characterization of separable portions of spontaneous speech. Ten speakers’ spontaneous speech materials taken from the BEA Hungarian Spontaneous Speech Database were analyzed in terms of hierarchical units of narratives (durations, speakers’ rates of articulation, number of words produced, and the interrelationships of all these). We conclude that (i) the majority of speakers organize their narratives in similar temporal structures, (ii) thematic units can be identified in terms of certain prosodic criteria, (iii) there are statistically valid correlations between factors like the duration of phrases, the word count of phrases, the rate of articulation of phrases, and pausing characteristics, and (iv) these parameters exhibit extensive variability both across and within speakers.

Collaboration


Dive into the András Beke's collaboration.

Top Co-Authors

Avatar

György Szaszák

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Mária Gósy

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Viktória Horváth

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Tilda Neuberger

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Dorottya Gyarmathy

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Tekla Etelka Gráczi

Hungarian Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Máté Ákos Tündik

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Tamás Gábor Csapó

Budapest University of Technology and Economics

View shared research outputs
Top Co-Authors

Avatar

Viola Váradi

Eötvös Loránd University

View shared research outputs
Researchain Logo
Decentralizing Knowledge