Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dmytro Prylipko is active.

Publication


Featured researches published by Dmytro Prylipko.


Computer Speech & Language | 2014

Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Bogdan Vlasenko; Dmytro Prylipko; Ronald Böck; Andreas Wendemuth

The role of automatic emotion recognition from speech is growing continuously because of the accepted importance of reacting to the emotional state of the user in human-computer interaction. Most state-of-the-art emotion recognition methods are based on turn- and frame-level analysis independent from phonetic transcription. Here, we are interested in a phoneme-based classification of the level of arousal in acted and spontaneous emotions. To start, we show that our previously published classification technique which showed high-level results in the Interspeech 2009 Emotion Challenge cannot provide sufficiently good classification in cross-corpora evaluation (a condition close to real-life applications). To prove the robustness of our emotion classification techniques we use cross-corpora evaluation for a simplified two-class problem; namely high and low arousal emotions. We use emotion classes on a phoneme-level for classification. We build our speaker-independent emotion classifier with HMMs, using GMMs-based production probabilities and MFCC features. This classifier performs equally well when using a complete phoneme set, as it does in the case of a reduced set of indicative vowels (7 out of 39 phonemes in the German SAM-PA list). Afterwards we compare emotion classification performance of the technique used in the Emotion Challenge with phoneme-based classification within the same experimental setup. With phoneme-level emotion classes we increase cross-corpora classification performance by about 3.15% absolute (4.69% relative) for models trained on acted emotions (EMO-DB dataset) and evaluated on spontaneous emotions (VAM dataset); within vice versa experimental conditions (trained on VAM, tested on EMO-DB) we obtain 15.43% absolute (23.20% relative) improvement. We show that using phoneme-level emotion classes can improve classification performance even with comparably low speech recognition performance obtained with scant a priori knowledge about the language, implemented as a zero-gram for word-level modeling and a bi-gram for phoneme-level modeling. Finally we compare our results with the state-of-the-art cross-corpora evaluations on the VAM database. For training our models, we use an almost 15 times smaller training set, consisting of 456 utterances (210 low and 246 high arousal emotions) instead of 6820 utterances (4685 high and 2135 low arousal emotions). We are yet able to increase cross-corpora classification performance by about 2.25% absolute (3.22% relative) from UA=69.7% obtained by Zhang et al. to UA=71.95%.


international conference on multimedia and expo | 2011

Vowels formants analysis allows straightforward detection of high arousal emotions

Bogdan Vlasenko; David Philippou-Hübner; Dmytro Prylipko; Ronald Böck; Ingo Siegert; Andreas Wendemuth

Recently, automatic emotion recognition from speech has achieved growing interest within the human-machine interaction research community. Most part of emotion recognition methods use context independent frame-level analysis or turn-level analysis. In this article, we introduce context dependent vowel level analysis applied for emotion classification. An average first formant value extracted on vowel level has been used as unidimensional acoustic feature vector. The Neyman-Pearson criterion has been used for classification purpose. Our classifier is able to detect high-arousal emotions with small error rates. Within our research we proved that the smallest emotional unit should be the vowel instead of the word. We find out that using vowel level analysis can be an important issue during developing a robust emotion classifier. Also, our research can be useful for developing robust affective speech recognition methods and high quality emotional speech synthesis systems.


Journal on Multimodal User Interfaces | 2014

Analysis of significant dialog events in realistic human–computer interaction

Dmytro Prylipko; Dietmar F. Rösner; Ingo Siegert; Stephan Günther; Rafael Friesen; Matthias Haase; Bogdan Vlasenko; Andreas Wendemuth

This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.


text speech and dialogue | 2011

Zanzibar OpenIVR: an open-source framework for development of spoken dialog systems

Dmytro Prylipko; Dirk Schnelle-Walka; Spencer Lord; Andreas Wendemuth

The maturity of standards and the availability of open source components for all levels of the MRCP stack provide us with new opportunities for the development of spoken dialog technology. In this paper a standard-based and modular architecture for interactive voice response (IVR) systems is presented together with its implementation - Zanzibar OpenIVR. The architecture, described in terms of components and standards, is compared to other existing frameworks. The usage of our framework is discussed regarding different aspects of spoken dialog technology such as speech recognition and synthesis, integration of the components, dialog management, natural language understanding. It is designed to work over VoIP as well as with usual telephony communication channels, thus provides an ability for web based access. Zanzibar OpenIVR is able to serve as a starting point for building dialog systems and research in voice-enabled technologies.


italian workshop on neural nets | 2014

Investigating the Form-Function-Relation of the Discourse Particle “hm” in a Naturalistic Human-Computer Interaction

Ingo Siegert; Dmytro Prylipko; Kim Hartmann; Ronald Böck; Andreas Wendemuth

For a successful speech-controlled human-computer interaction (HCI) the pure textual information as well as individual skills, preferences, and affective states of the user have to be known. However, verbal human interaction consists of several information layers. Apart from pure textual information, further details regarding the speaker’s feelings, believes, and social relations are transmitted. The additional information is encoded through acoustics. Especially, the intonation reveals details about the speakers communicative relation and their attitude towards the ongoing dialogue.


international conference on human-computer interaction | 2014

Discourse Particles and User Characteristics in Naturalistic Human-Computer Interaction

Ingo Siegert; Matthias Haase; Dmytro Prylipko; Andreas Wendemuth

In human-human interaction (HHI) the behaviour of the speaker is amongst others characterised by semantic and prosodic cues. These short feedback signals minimally communicate certain dialogue functions such as attention, understanding or other attitudinal reactions. Human-computer interaction (HCI) systems have failed to note and respond to these details so far, resulting in users trying to cope with and adapt to the machines behaviour. In order to enhance HCI, an adaptation to the user’s behaviour, individual skills, and the integration of a general human behaviour understanding is indispensable. Another issue is the question if the usage of feedback signals is influenced by the user’s individuality. In this paper, we investigate the influence of specific feedback signals, known as discourse particles (DPs), with communication style and psychological characteristics within a naturalistic HCI. This investigation showed that there is a significant difference in the usage of DPs for users of certain user characteristics.


international conference on acoustics, speech, and signal processing | 2012

Fine-tuning HMMS for nonverbal vocalizations in spontaneous speech: A multicorpus perspective

Dmytro Prylipko; Björn W. Schuller; Andreas Wendemuth

Phenomena like filled pauses, laughter, breathing, hesitation, etc. play significant role in everyday human-to-human conversation and have a significant influence on speech recognition accuracy [1]. Because of their nature (e. g. long duration), they should be modeled with different number of emitting states and Gaussian mixtures. In this paper we address this question and try to determine the most suitable method for finding these parameters: we provide an examination of two methods for optimization of hidden Markov model (HMM) configurations for better classification and recognition of nonverbal vocalizations within speech. Experiments were conducted on three conversational databases: TUM AVIC, Verbmobil, and SmartKom. These experiments show that with HMMs configurations tailored to a particular database we can achieve 1-3% improvement in speech recognition accuracy with comparison to a baseline topology. An in-depth analysis of discussed methods is provided.


text speech and dialogue | 2012

Language Modeling of Nonverbal Vocalizations in Spontaneous Speech

Dmytro Prylipko; Bogdan Vlasenko; Andreas Stolcke; Andreas Wendemuth

Nonverbal vocalizations are one of the characteristics of spontaneous speech distinguishing it from written text. These phenomena are sometimes regarded as a problem in language and acoustic modeling. However, vocalizations such as filled pauses enhance language models at the local level and serve some additional functions (marking linguistic boundaries, signaling hesitation). In this paper we investigate a wider range of nonverbals and investigate their potential for language modeling of conversational speech, and compare different modeling approaches. We find that all nonverbal sounds, with the exception of breath, have little effect on the overall results. Due to its specific nature, as well as its frequency in the data, modeling of breath as a regular language model event leads to a substantial improvement in both perplexity and speech recognition accuracy.


Archive | 2015

Emotion and Disposition Detection in Medical Machines: Chances and Challenges

Kim Hartmann; Ingo Siegert; Dmytro Prylipko

Machines designed for medical applications beyond usual data acquisition and processing need to cooperate with and adapt to humans in order to fulfill their supportive tasks. Technically, medical machines are therefore considered as affective systems, capable of detecting, assessing and adapting to emotional states and dispositional changes in users. One of the upcoming applications of affective systems is the use as supportive machines involved in the psychiatric disorder diagnose and therapy process. These machines have the additional requirement of being capable to control persuasive dialogues in order to obtain relevant patient data despite disadvantageous set-ups. These automated abilities of technical systems combined with enhanced processing, storage and observational capabilities raise both chances and challenges in medical applications. We focus on analyzing the objectivity, reliability and validity of current techniques used to determine the emotional states of speakers from speech and the arising implications. We discuss the underlying technical and psychological models and analyze recent machine assessment results of emotional states obtained through dialogues. Conclusively we discuss the involvement of affective systems as medical machines in the psychiatric diagnostics process and therapy sessions with respect to the technical and ethical circumstances.


conference of the international speech communication association | 2011

Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions.

Bogdan Vlasenko; Dmytro Prylipko; David Philippou-Hübner; Andreas Wendemuth

Collaboration


Dive into the Dmytro Prylipko's collaboration.

Top Co-Authors

Avatar

Andreas Wendemuth

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Ingo Siegert

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Bogdan Vlasenko

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Ronald Böck

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

David Philippou-Hübner

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Kim Hartmann

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Matthias Haase

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Dietmar F. Rösner

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Dirk Schnelle-Walka

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Olga Egorow

Otto-von-Guericke University Magdeburg

View shared research outputs
Researchain Logo
Decentralizing Knowledge