Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tim Polzehl is active.

Publication


Featured researches published by Tim Polzehl.


Speech Communication | 2011

Anger recognition in speech using acoustic and linguistic cues

Tim Polzehl; Alexander Schmitt; Florian Metze; Michael Wagner

The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seem most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-based models of words and phrases, e.g. Bag-of-Words (BOW), Term Frequency (TF), Term Frequency - Inverse Document Frequency (TF.IDF) and the Self-Referential Information (SRI). SRI clearly outperforms vector space models. Modeling phrases slightly improves the scores. After classification of both acoustic and linguistic information on separated levels we fuse information on decision level adding confidences. We compare the obtained scores on three different databases. Two databases are taken from the IVR customer care domain, another database accounts for a WoZ data collection. All corpora are of realistic speech condition. We observe promising results for the IVR databases while the WoZ database shows lower scores overall. In order to provide comparability between the results we evaluate classification success using the f1 measurement in addition to overall accuracy figures. As a result, acoustic modeling clearly outperforms linguistic modeling. Fusion slightly improves overall scores. With a baseline of approximately 60% accuracy and .40 f1-measurement by constant majority class voting we obtain an accuracy of 75% with respective .70 f1 for the WoZ database. For the IVR databases we obtain approximately 79% accuracy with respective .78 f1 over a baseline of 60% accuracy with respective .38 f1.


international conference on acoustics, speech, and signal processing | 2009

Detecting real life anger

Felix Burkhardt; Tim Polzehl; Joachim Stegmann; Florian Metze; Richard Huber

Acoustic anger detection in voice portals can help to enhance human computer interaction. A comprehensive voice portal data collection has been carried out and gives new insight on the nature of real life data. Manual labeling revealed a high percentage of non-classifiable data. Experiments with a statistical classifier indicate that, in contrast to pitch and energy related features, duration measures do not play an important role for this data while cepstral information does. Also in a direct comparison between Gaussian Mixture Models and Support Vector Machines the latter gave better results.


international conference on acoustics, speech, and signal processing | 2012

Articulatory features for expressive speech synthesis

Alan W. Black; H. Timothy Bunnell; Ying Dou; Prasanna Kumar Muthukumar; Florian Metze; Daniel J. Perry; Tim Polzehl; Kishore Prahallad; Stefan Steidl; Callie Vaughn

This paper describes some of the results from the project entitled “New Parameterization for Emotional Speech Synthesis” held at the Summer 2011 JHU CLSP workshop. We describe experiments on how to use articulatory features as a meaningful intermediate representation for speech synthesis. This parameterization not only allows us to reproduce natural sounding speech but also allows us to generate stylistically varying speech.


Spoken Dialogue Systems Technology and Design | 2011

Salient Features for Anger Recognition in German and English IVR Portals

Tim Polzehl; Alexander Schmitt; Florian Metze

Anger recognition in speech dialogue systems can help to enhance human commputer interaction. In this chapter we report on the setup and performance opti-izationtechniques for successful anger classification using acoustic cues. We evaluate the performance of a broad variety of features on both a German and an American English voice portal database which contain “real” (i.e. non-acted) continuous speech of narrow-band quality. Starting with a large-scale feature extraction, we determine optimal sets of feature combinations for each language, by applying an Information-Gain based ranking scheme. Analyzing the ranking we notice that a large proportion of the most promising features for both databases are derived from MFCC and loudness. In contrast to this similarity also pitch features proved importance for the English database. We further calculate classification scores for our setups using discriminative training and Support-Vector Machine classification. The developed systems show that anger


international workshop on spoken dialogue systems technology | 2010

Facing reality: simulating deployment of anger recognition in IVR systems

Alexander Schmitt; Tim Polzehl; Wolfgang Minker

With the availability of real-life corpora studies dealing with speech-based emotion recognition have turned towards recognition of angry users on turn level. Based on acoustic, linguistic and sometimes contextual features classifiers yield performance values of 0.7-0.8 f-score when classifying angry vs. non-angry user turns. The effect of deploying anger classifiers in real systems still remains an open point and has not been examined so far. Is the current performance of anger detection already adequate enough for a change in dialogue strategy or even an escalation to an operator? In this study we explore the impact of an anger classifier that has been published in a previous study on specific dialogues. We introduce a cost-sensitive classifier that reduces the number of misclassified non-angry user turns significantly.


5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016) | 2016

Non-intrusive Estimation of Noisiness as a Perceptual Quality Dimension of Transmitted Speech

Friedemann Köster; Gabriel Mittag; Tim Polzehl; Sebastian Möller

This article presents a new approach to the non-intrusive quality estimation of transmitted speech. Traditional estimation methods exhibit limitations to providing diagnostic information and for practical monitoring purposes. The new approach merges solutions to overcome the existing limitations and intends to provide a new user-friendly estimator. We present an overview and the planned structure of the proposed model. In order to provide diagnostic information, the method of assessing perceptual quality-relevant dimensions is applied. One of these quality dimensions is Noisiness, which describes degradations like background noise, circuit noise, or coding noise. As a fundamental component of the proposed model, a non-intrusive parametric Noisiness estimator is presented. The estimator is based on nine different features extracted from the output signal only. Using a linear regression, the features are mapped onto the Noisiness. The Noisiness estimator is trained on two and tested on three individual subjective databases. In addition, the performance of the resulting estimator is compared to the diagnostic intrusive estimator DIAL (Diagnostic Intrusive Assessment of Listening quality). The results prove that the presented estimator provides high reliability and indicate the applicability and value for non-intrusive diagnostic quality estimation.


Proceedings of the 2014 International ACM Workshop on Crowdsourcing for Multimedia | 2014

Development and Validation of Extrinsic Motivation Scale for Crowdsourcing Micro-task Platforms

Babak Naderi; Ina Wechsung; Tim Polzehl; Sebastian Möller

In this paper, we introduce a scale for measuring the extrinsic motivation of crowd workers. The new questionnaire is strongly based on the Work Extrinsic Intrinsic Motivation Scale (WEIMS) [17] and theoretically follows the Self-Determination Theory (SDT) of motivation. The questionnaire has been applied and validated in a crowdsourcing micro-task platform. This instrument can be used for studying the dynamics of extrinsic motivation by taking into account individual differences and provide meaningful insights which will help to design a proper incentives framework for each crowd worker that eventually leads to a better performance, an increased well-being, and higher overall quality.


Archive | 2010

“For Heaven’s Sake, Gimme a Live Person!” Designing Emotion-Detection Customer Care Voice Applications in Automated Call Centers

Alexander Schmitt; Roberto Pieraccini; Tim Polzehl

With increasing complexity of automated telephone-based applications, we require new means to detect problems occurring in the dialog between system and user in order to support task completion. Anger and frustration are important symptoms indicating that task completion and user satisfaction may be endangered. This chapter describes extensively a variety of aspects that are relevant for performing anger detection in interactive voice response (IVR) systems and describes an anger detection system that takes into account several knowledge sources to robustly detect angry user turns. We consider acoustic, linguistic, and interaction parameter-based information that can be collected and exploited for anger detection. Further, we introduce a subcomponent that is able to estimate the emotional state of the caller based on the caller’s previous emotional state. Based on a corpus of 1,911 calls from an IVR system, we demonstrate the various aspects of angry and frustrated callers.


Archive | 2015

Discussion of the Results

Tim Polzehl

After having presented detailed results from personality modeling out of the perspective of exploiting different data sets with different inherent characteristics this chapter concludes on and discusses general tendencies across all differences in the data structure, i.e. results are now presented and discussed along a personality-centered perspective for personality classification and individual trait score prediction success. Finally, the chapter adds a comprehensive analysis of influencing factors and unexpected observations during processing.


Archive | 2015

Analysis of Human Personality Perception

Tim Polzehl

This chapter provides exploratory insights into the personality-related expressions and interdependencies in the datasets. At the same time, the question of whether or not the chosen assessment scheme can be applied is analyzed. Eventually, the discussion of the overall high consistencies, the observation of normal distributions in the ratings, the comparable correlation patterns in between the traits, the very congruent latent factor structure as well as the significant differences in between the target groups show that the induced personality expressions are perceived as intended by human listeners.

Collaboration


Dive into the Tim Polzehl's collaboration.

Top Co-Authors

Avatar

Florian Metze

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Sebastian Möller

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Babak Naderi

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Stefan Steidl

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar

Friedemann Köster

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Alan W. Black

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge