Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Timo Baumann is active.

Publication


Featured researches published by Timo Baumann.


annual meeting of the special interest group on discourse and dialogue | 2009

Incremental Reference Resolution: The Task, Metrics for Evaluation, and a Bayesian Filtering Model that is Sensitive to Disfluencies

David Schlangen; Timo Baumann; Michaela Atterer

In this paper we do two things: a) we discuss in general terms the task of incremental reference resolution (IRR), in particular resolution of exophoric reference, and specify metrics for measuring the performance of dialogue system components tackling this task, and b) we present a simple Bayesian filtering model of IRR that performs reasonably well just using words directly (no structure information and no hand-coded semantics): it picks the right referent out of 12 for around 50% of real-world dialogue utterances in our test corpus. It is also able to learn to interpret not only words but also hesitations, just as humans have shown to do in similar situations, namely as markers of references to hard-to-describe entities.


north american chapter of the association for computational linguistics | 2009

Assessing and Improving the Performance of Speech Recognition for Incremental Systems

Timo Baumann; Michaela Atterer; David Schlangen

In incremental spoken dialogue systems, partial hypotheses about what was said are required even while the utterance is still ongoing. We define measures for evaluating the quality of incremental ASR components with respect to the relative correctness of the partial hypotheses compared to hypotheses that can optimize over the complete input, the timing of hypothesis formation relative to the portion of the input they are about, and hypothesis stability, defined as the number of times they are revised. We show that simple incremental post-processing can improve stability dramatically, at the cost of timeliness (from 90 % of edits of hypotheses being spurious down to 10 % at a lag of 320 ms). The measures are not independent, and we show how system designers can find a desired operating point for their ASR. To our knowledge, we are the first to suggest and examine a variety of measures for assessing incremental ASR and improve performance on this basis.


Speech Communication | 2012

Prosodic and temporal features for language modeling for dialog

Nigel Ward; Alejandro Vega; Timo Baumann

If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus.


conference of the european chapter of the association for computational linguistics | 2014

Situationally Aware In-Car Information Presentation Using Incremental Speech Generation: Safer, and More Effective

Spyridon Kousidis; Casey Kennington; Timo Baumann; Hendrik Buschmeier; Stefan Kopp; David Schlangen

Holding non-co-located conversations while driving is dangerous (Horrey and Wickens, 2006; Strayer et al., 2006), much more so than conversations with physically present, “situated” interlocutors (Drews et al., 2004). In-car dialogue systems typically resemble non-co-located conversations more, and share their negative impact (Strayer et al., 2013). We implemented and tested a simple strategy for making in-car dialogue systems aware of the driving situation, by giving them the capability to interrupt themselves when a dangerous situation is detected, and resume when over. We show that this improves both driving performance and recall of system-presented information, compared to a non-adaptive strategy.


automotive user interfaces and interactive vehicular applications | 2014

Better Driving and Recall When In-car Information Presentation Uses Situationally-Aware Incremental Speech Output Generation

Casey Kennington; Spyridon Kousidis; Timo Baumann; Hendrik Buschmeier; Stefan Kopp; David Schlangen

It is established that driver distraction is the result of sharing cognitive resources between the primary task (driving) and any other secondary task. In the case of holding conversations, a human passenger who is aware of the driving conditions can choose to interrupt his speech in situations potentially requiring more attention from the driver, but in-car information systems typically do not exhibit such sensitivity. We have designed and tested such a system in a driving simulation environment. Unlike other systems, our system delivers information via speech (calendar entries with scheduled meetings) but is able to react to signals from the environment to interrupt when the driver needs to be fully attentive to the driving task and subsequently resume its delivery. Distraction is measured by a secondary short-term memory task. In both tasks, drivers perform significantly worse when the system does not adapt its speech, while they perform equally well to control conditions (no concurrent task) when the system intelligently interrupts and resumes.


annual meeting of the special interest group on discourse and dialogue | 2009

TELIDA: A Package for Manipulation and Visualization of Timed Linguistic Data

Titus von der Malsburg; Timo Baumann; David Schlangen

We present a toolkit for manipulating and visualising time-aligned linguistic data such as dialogue transcripts or language processing data. The package complements existing editing tools by allowing for conversion between their formats, information extraction from the raw files, and by adding sophisticated, and easily extended methods for visualising the dynamics of dialogue processing. To illustrate the versatility of the package, we describe its use in three different projects at our site.


Proceedings of the International Workshop Series on Spoken Dialogue Systems Technology (IWSDS) 2016 | 2017

Recognising Conversational Speech: What an Incremental ASR Should Do for a Dialogue System and How to Get There

Timo Baumann; Casey Kennington; Julian Hough; David Schlangen

Automatic speech recognition (asr) is not only becoming increasingly accurate, but also increasingly adapted for producing timely, incremental output. However, overall accuracy and timeliness alone are insufficient when it comes to interactive dialogue systems which require stability in the output and responsivity to the utterance as it is unfolding. Furthermore, for a dialogue system to deal with phenomena such as disfluencies, to achieve deep understanding of user utterances these should be preserved or marked up for use by downstream components, such as language understanding, rather than be filtered out. Similarly, word timing can be informative for analyzing deictic expressions in a situated environment and should be available for analysis. Here we investigate the overall accuracy and incremental performance of three widely used systems and discuss their suitability for the aforementioned perspectives. From the differing performance along these measures we provide a picture of the requirements for incremental asr in dialogue systems and describe freely available tools for using and evaluating incremental asr.


SLPAT 2016 Workshop on Speech and Language Processing for Assistive Technologies | 2016

Navigating the Spoken Wikipedia

Marcel Rohde; Timo Baumann

The Spoken Wikipedia project unites volunteer readers of encyclopedic entries. Their recordings make encyclopedic knowledge accessible to persons who are unable to read (out of alexia, visual impairment, or because their sight is currently occupied, e. g. while driving). However, on Wikipedia, recordings are available as raw audio files that can only be consumed linearly, without the possibility for targeted navigation or search. We present a reading application which uses an alignment between the recording, text and article structure and which allows to navigate spoken articles, through a graphical or voice-based user interface (or a combination thereof). We present the results of a usability study in which we compare the two interaction modalities. We find that both types of interaction enable users to navigate articles and to find specific information much more quickly compared to a sequential presentation of the full article. In particular when the VUI is not restricted by speech recognition and understanding issues, this interface is on par with the graphical interface and thus a real option for browsing the Wikipedia without the need for vision or reading. Im Projekt gesprochene Wikipedia werden Leser vereint die auf freiwilliger Basis Enzyklopadie-Artikel vorlesen. Die dabei entstehenden Aufnahmen machen das Wissen der Enzyklopadie fur Leser, die nicht in der Lage sind zu lesen (z.B. Aufgrund von Alexie, Sehbeeintrachtigungen oder einfach weil sich ihre Augen gerade mit anderen Dingen -z.B. Autofahren- beschaftigt sind) verfugbar. Auf Wikipedia sind die Aufnahmen als RAW Audio verfugbar, welches lediglich linear konsumiert werden kann. Eine Moglichkeit zur gezielten Navigation oder zur Suche ist nicht vorhanden. Wir stellen eine Anwendung vor, welche ein Alignment, das zwischen Aufnahme, Text und Artikelstruktur Zusammenhange beschreibt, nutzt. Dieses Alignment erlaubt es in gesprochenen Artikeln mit Hilfe von grafisch- und sprachbasierten Benutzerschnittstellen (oder Kombinationen aus beiden) zu navigieren. Zudem stellen wir die Ergebnisse einer Nutzerstudie vor, welche die beiden Nutzungsmodalitaten miteinander vergleicht. Wir finden, dass beide Interaktionsarten es dem Nutzer erlauben in gesprochenen Artikeln zu navigieren und im Vergleich zum linearen Zuhoren deutlich schneller spezifische Informationen zu finden. Insbesondere dann, wenn die sprachbasierte Benutzerschnittstelle (Voice-User-Interface) nicht durch Probleme mit der Erkennung und Interpretation von Sprachbefehlen eingeschrankt wird, ist sie gleichauf mit der grafischen Benutzerschnittstelle und somit eine echte Alternative um die Wikipedia ohne Notwenigkeit des visuellen Kanals konsumieren zu konnen.


international conference on social robotics | 2015

Incremental Speech Production for Polite and Natural Personal-Space Intrusion

Timo Baumann; Felix Lindner

We propose to use a model of personal space to initiate communication while passing a human thereby acknowledging that humans are not just a special kind of obstacle to be avoided but potential interaction partners. As a simple form of interaction, our system communicates an apology while closely passing a human. To this end, we present a software architecture that integrates a social-spaces knowledge base and a component for incremental speech production. Incrementality ensures that the robot’s utterance can be adapted to fit the developing situation in a natural way. Observer ratings show that personal-space intrusion is perceived as both natural and polite if the robot has the capability to utter and adapt an apology in an incremental way whereas it is perceived as unfriendly if the robot intrudes personal space without saying anything. Moreover, the robot is perceived as less natural if it does not adapt.


TAL2018, Sixth International Symposium on Tonal Aspects of Languages | 2018

Tonality in Language: The Generative Theory of Tonal Music as a Framework for Prosodic Analysis of Poetry

Hussein Hussein; Burkhard Meyer-Sickendiek; Timo Baumann

This contribution focuses on structural similarities between tonality and cadences in music on the one hand, and rhythmical patterns in poetic languages respectively poetry on the other hand. We investigate two exemplary rhythmical patterns in modern and postmodern poetry to detect these tonality-like features in poetic language: The Parlando and the Variable Foot. German poems readout from the original poets are collected from the webpage of our partner lyrikline. We compared these rhythmical features with tonality rules, explained in two important theoretical volumes: The Generative Theory of Tonal Music and the Rhythmic Phrasing in English Verse. Using both volumes, we focused on a certain combination of four different features: The grouping structure, the metrical structure, the time-span-variation and the prolongation, in order to detect the two important rhythmical patterns which use tonality-like features in poetic language (Parlando and Variable Foot). Different features including pause and parser information are used in this classification process. The best classification result, calculated by the f-measure, for Parlando and Variable Foot is 0.69.

Collaboration


Dive into the Timo Baumann's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hussein Hussein

Dresden University of Technology

View shared research outputs
Top Co-Authors

Avatar

Okko Buss

University of Potsdam

View shared research outputs
Top Co-Authors

Avatar

Okko Buß

University of Potsdam

View shared research outputs
Researchain Logo
Decentralizing Knowledge