Korbinian Riedhammer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Korbinian Riedhammer is active.

Explore More

Publication

Featured researches published by Korbinian Riedhammer.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

The CALO Meeting Assistant System

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; Stanley Peters; Dilek Hakkani-Tür; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.

spoken language technology workshop | 2008

The CALO meeting speech recognition and understanding system

Gökhan Tür; Andreas Stolcke; L. Lynn Voss; John Dowding; Benoit Favre; Raquel Fernández; Matthew Frampton; Michael W. Frandsen; Clint Frederickson; Martin Graciarena; Dilek Hakkani-Tür; Donald Kintzing; Kyle Leveque; Shane Mason; John Niekrasz; Stanley Peters; Matthew Purver; Korbinian Riedhammer; Elizabeth Shriberg; Jing Tien; Dimitra Vergyri; Fan Yang

The CALO meeting assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.

international conference on acoustics, speech, and signal processing | 2014

A pitch extraction algorithm tuned for automatic speech recognition

Pegah Ghahremani; Bagher BabaAli; Daniel Povey; Korbinian Riedhammer; Jan Trmal; Sanjeev Khudanpur

In this paper we present an algorithm that produces pitch and probability-of-voicing estimates for use as features in automatic speech recognition systems. These features give large performance improvements on tonal languages for ASR systems, and even substantial improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR toolkit), is a highly modified version of the getf0 (RAPT) algorithm. Unlike the original getf0 we do not make a hard decision whether any given frame is voiced or unvoiced; instead, we assign a pitch even to unvoiced frames while constraining the pitch trajectory to be continuous. Our algorithm also produces a quantity that can be used as a probability of voicing measure; it is based on the normalized autocorrelation measure that our pitch extractor uses. We present results on data from various languages in the BABEL project, and show a large improvement over systems without tonal features and systems where pitch and POV information was obtained from SAcC or getf0.

international conference on acoustics, speech, and signal processing | 2012

Generating exact lattices in the WFST framework

Daniel Povey; Mirko Hannemann; Gilles Boulianne; Lukas Burget; Arnab Ghoshal; Milos Janda; Martin Karafiát; Stefan Kombrink; Petr Motlicek; Yanmin Qian; Korbinian Riedhammer; Karel Vesely; Ngoc Thang Vu

We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is “fully expanded”, i.e. where the arcs correspond to HMM transitions. It outputs lattices that include HMM-state-level alignments as well as word labels. The general idea is to create a state-level lattice during decoding, and to do a special form of determinization that retains only the best-scoring path for each word sequence. This special determinization algorithm is a solution to the following problem: Given a WFST A, compute a WFST B that, for each input-symbol-sequence of A, contains just the lowest-cost path through A.

Speech Communication | 2010

Long story short - Global unsupervised models for keyphrase based meeting summarization

Korbinian Riedhammer; Benoit Favre; Dilek Hakkani-Tür

We analyze and compare two different methods for unsupervised extractive spontaneous speech summarization in the meeting domain. Based on utterance comparison, we introduce an optimal formulation for the widely used greedy maximum marginal relevance (MMR) algorithm. Following the idea that information is spread over the utterances in form of concepts, we describe a system which finds an optimal selection of utterances covering as many unique important concepts as possible. Both optimization problems are formulated as an integer linear program (ILP) and solved using public domain software. We analyze and discuss the performance of both approaches using various evaluation setups on two well studied meeting corpora. We conclude on the benefits and drawbacks of the presented models and give an outlook on future aspects to improve extractive meeting summarization.

international conference on acoustics, speech, and signal processing | 2009

A global optimization framework for meeting summarization

Daniel Gillick; Korbinian Riedhammer; Benoit Favre; Dilek Hakkani-Tür

We introduce a model for extractive meeting summarization based on the hypothesis that utterances convey bits of information, or concepts. Using keyphrases as concepts weighted by frequency, and an integer linear program to determine the best set of utterances, that is, covering as many concepts as possible while satisfying a length constraint, we achieve ROUGE scores at least as good as a ROUGE-based oracle derived from human summaries. This brings us to a critical discussion of ROUGE and the future of extractive meeting summarization.

Journal of Voice | 2012

Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling

Tobias Bocklet; Korbinian Riedhammer; Elmar Nöth; Ulrich Eysholdt; Tino Haderlein

OBJECTIVE One aspect of voice and speech evaluation after laryngeal cancer is acoustic analysis. Perceptual evaluation by expert raters is a standard in the clinical environment for global criteria such as overall quality or intelligibility. So far, automatic approaches evaluate acoustic properties of pathologic voices based on voiced/unvoiced distinction and fundamental frequency analysis of sustained vowels. Because of the high amount of noisy components and the increasing aperiodicity of highly pathologic voices, a fully automatic analysis of fundamental frequency is difficult. We introduce a purely data-driven system for the acoustic analysis of pathologic voices based on recordings of a standard text. METHODS Short-time segments of the speech signal are analyzed in the spectral domain, and speaker models based on this information are built. These speaker models act as a clustered representation of the acoustic properties of a persons voice and are thus characteristic for speakers with different kinds and degrees of pathologic conditions. The system is evaluated on two different data sets with speakers reading standardized texts. One data set contains 77 speakers after laryngeal cancer treated with partial removal of the larynx. The other data set contains 54 totally laryngectomized patients, equipped with a Provox shunt valve. Each speaker was rated by five expert listeners regarding three different criteria: strain, voice quality, and speech intelligibility. RESULTS/CONCLUSION We show correlations for each data set with r and ρ≥0.8 between the automatic system and the mean value of the five raters. The interrater correlation of one rater to the mean value of the remaining raters is in the same range. We thus assume that for selected evaluation criteria, the system can serve as a validated objective support for acoustic voice and speech analysis.

spoken language technology workshop | 2008

A keyphrase based approach to interactive meeting summarization

Korbinian Riedhammer; Benoit Favre; Dilek Hakkani-Tür

Rooted in multi-document summarization, maximum marginal relevance (MMR) is a widely used algorithm for meeting summarization (MS). A major problem in extractive MS using MMR is finding a proper query: the centroid based query which is commonly used in the absence of a manually specified query, can not significantly outperform a simple baseline system. We introduce a simple yet robust algorithm to automatically extract keyphrases (KP) from a meeting which can then be used as a query in the MMR algorithm. We show that the KP based system significantly outperforms both baseline and centroid based systems. As human refined KPs show even better summarization performance, we outline how to integrate the KP approach into a graphical user interface allowing interactive summarization to match the users needs in terms of summary length and topic focus.

international conference on acoustics, speech, and signal processing | 2012

Revisiting semi-continuous hidden Markov models

Korbinian Riedhammer; Tobias Bocklet; Arnab Ghoshal; Daniel Povey

In the past decade, semi-continuous hidden Markov models (SCHMMs) have not attracted much attention in the speech recognition community. Growing amounts of training data and increasing sophistication of model estimation led to the impression that continuous HMMs are the best choice of acoustic model. However, recent work on recognition of under-resourced languages faces the same old problem of estimating a large number of parameters from limited amounts of transcribed speech. This has led to a renewed interest in methods of reducing the number of parameters while maintaining or extending the modeling capabilities of continuous models. In this work, we compare classic and multiple-codebook semi-continuous models using diagonal and full covariance matrices with continuous HMMs and subspace Gaussian mixture models. Experiments on the RM and WSJ corpora show that while a classical semicontinuous system does not perform as well as a continuous one, multiple-codebook semi-continuous systems can perform better, particular when using full-covariance Gaussians.

Folia Phoniatrica Et Logopaedica | 2009

Application of automatic speech recognition to quantitative assessment of tracheoesophageal speech with different signal quality.

Tino Haderlein; Korbinian Riedhammer; Elmar Nöth; Hikmet Toy; Maria Schuster; Ulrich Eysholdt; Joachim Hornegger; Frank Rosanowski

Objective: Tracheoesophageal voice is state-of-the-art in voice rehabilitation after laryngectomy. Intelligibility on a telephone is an important evaluation criterion as it is a crucial part of social life. An objective measure of intelligibility when talking on a telephone is desirable in the field of postlaryngectomy speech therapy and its evaluation. Patients and Methods: Based upon successful earlier studies with broadband speech, an automatic speech recognition (ASR) system was applied to 41 recordings of postlaryngectomy patients. Recordings were available in different signal qualities; quality was the crucial criterion for this study. Results: Compared to the intelligibility rating of 5 human experts, the ASR system had a correlation coefficient of r = –0.87 and Krippendorff’s α of 0.65 when broadband speech was processed. The rater group alone achieved α = 0.66. With the test recordings in telephone quality, the system reached r = –0.79 and α = 0.67. Conclusion: For medical purposes, a comprehensive diagnostic approach to (substitute) voice has to cover both subjective and objective tests. An automatic recognition system such as the one proposed in this study can be used for objective intelligibility rating with results comparable to those of human experts. This holds for broadband speech as well as for automatic evaluation via telephone.

Explore More