Louis ten Bosch
Radboud University Nijmegen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Louis ten Bosch.
Speech Communication | 2003
Louis ten Bosch
Automatic recognition and understanding of speech are crucial steps towards natural human-machine interaction. Apart from the recognition of the word sequence, the recognition of properties such as prosody, emotion tags or stress tags may be of particular importance in this communication process. This paper discusses the possibilities to recognize emotion from the speech signal, primarily from the viewpoint of automatic speech recognition (ASR). The general focus is on the extraction of acoustic features from the speech signal that can be used for the detection of the emotional state or stress state of the speaker.After the introduction, a short overview of the ASR framework is presented. Next, we discuss the relation between recognition of emotion and ASR, and the different approaches found in the literature that deal with the correspondence between emotions and acoustic features. The conclusion is that automatic emotional tagging of the speech signal is difficult to perform with high accuracy, but prosodic information is nevertheless potentially useful to improve the dialogue handling in ASR tasks on a limited domain.
Speech Communication | 2005
Louis ten Bosch; Nelleke Oostdijk; Lou Boves
Abstract In this short communication we show how shallow annotations in large speech corpora can be used to derive data about the temporal aspects of turn taking. Within the limitations of such a speech corpus, we show that the average durations of between-turn pauses made by speakers in a dyad are statistically related, and our data suggest the existence of gender effects in the temporal aspects of turn taking. Also, clear differences in turn taking behaviour between face-to-face and telephone dialogues can be detected using shallow analyses. We discuss the most important limitations imposed by the shallowness of the annotations in large corpora, and the possibility for enriching those annotations in a semi-automatic iterative manner.
Cognitive Science | 2005
Odette Scharenborg; Dennis Norris; Louis ten Bosch; James M. McQueen
Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that research in these related fields has focused on the mechanics of how speech can be recognized. In Marrs (1982) terms, emphasis has been on the algorithmic and implementational levels rather than on the computational level. In this article, we provide a computational-level analysis of the task of speech recognition, which reveals the close parallels between research concerned with HSR and ASR. We illustrate this relation by presenting a new computational model of human spoken-word recognition, built using techniques from the field of ASR that, in contrast to current existing models of HSR, recognizes words from real speech input.
text speech and dialogue | 2004
Louis ten Bosch; Nelleke Oostdijk; Jan de Ruiter
On the basis of two-speaker spontaneous conversations, it is shown that the distributions of both pauses and speech-overlaps of telephone and face-to-face dialogues have different statistical properties. Pauses in a face-to-face dialogue last up to 4 times longer than pauses in telephone conversations in functionally comparable conditions. There is a high correlation (0.88 or larger) between the average pause duration for the two speakers across face-to-face dialogues and telephone dialogues. The data provided form a first quantitative analysis of the complex turn-taking mechanism evidenced in the dialogues available in the 9-million-word Spoken Dutch Corpus.
Speech Communication | 1996
L.C.W. Pols; X. Wang; Louis ten Bosch
Abstract As indicated by Bourlard et al. (1996), the best and simplest solution so far in standard ASR technology to implement durational knowledge, seems to consist of imposing a (trained) minimum segment duration, simply by duplicating or adding states that cannot be skipped. We want to argue that recognition performance can be further improved by incorporating “specific knowledge” (such as duration and pitch) into the recognizer. This can be achieved by optimising the probabilistic acoustic and language models, and probably also by a postprocessing step that is fully based on this specific knowledge. The widely available, hand-segmented, TIMIT database was used by us to extract duration regularities, that persist despite the great speaker variability. Two main approaches were used. In the first approach, duration distributions are considered for single phones, as well as for various broader classes, such as those specified by long or short vowels, word stress, syllable position within the word and within an utterance, post-vocalic consonants, and utterance speaking rate. The other approach is to use a hierarchically structured analysis of variance to study the numerical contributions of 11 different factors to the variation in duration. Several systematic effects have been found, but several other effects appeared to be obscured by the inherent variability in this speech material. Whether this specific use of knowledge about duration in a post-processor will actually improve recognition performance still has to be shown. However, in line with the prophetic message of Bourlard et al.s paper, we here consider the improvement of performance as of secondary importance.
Folia Phoniatrica Et Logopaedica | 2009
Marieke J. de Bruijn; Louis ten Bosch; Dirk J. Kuik; Hugo Quené; Johannes A. Langendijk; C. René Leemans; Irma M. Verdonck-de Leeuw
Objective: Speech impairment often occurs in patients after treatment for head and neck cancer. New treatment modalities such as surgical reconstruction or (chemo)radiation techniques aim at sparing anatomical structures that are correlated with speech and swallowing. In randomized trials investigating efficacy of various treatment modalities or speech rehabilitation, objective speech analysis techniques may add to improve speech outcome assessment. The goal of the present study is to investigate the role of objective acoustic-phonetic analyses in a multidimensional speech assessment protocol. Patients and Methods: Speech recordings of 51 patients (6 months after reconstructive surgery and postoperative radiotherapy for oral or oropharyngeal cancer) and of 18 control speakers were subjectively evaluated regarding intelligibility, nasal resonance, articulation, and patient-reported speech outcome (speech subscale of the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire-Head and Neck 35 module). Acoustic-phonetic analyses were performed to calculate formant values of the vowels /a, i, u/, vowel space, air pressure release of /k/ and spectral slope of /x/. Results: Intelligibility, articulation, and nasal resonance were best predicted by vowel space and /k/. Within patients, /k/ and /x/ differentiated tumor site and stage. Various objective speech parameters were related to speech problems as reported by patients. Conclusion: Objective acoustic-phonetic analysis of speech of patients is feasible and contributes to further development of a speech assessment protocol.
Journal of the Acoustical Society of America | 2009
Annika Hämäläinen; Michele Gubian; Louis ten Bosch; Lou Boves
Articulatory and acoustic reduction can manifest itself in the temporal and spectral domains. This study introduces a measure of spectral reduction, which is based on the speech decoding techniques commonly used in automatic speech recognizers. Using data for four frequent Dutch affixes from a large corpus of spontaneous face-to-face conversations, it builds on an earlier study examining the effects of lexical frequency on durational reduction in spoken Dutch [Pluymaekers, M. et al. (2005). J. Acoust. Soc. Am. 118, 2561-2569], and compares the proposed measure of spectral reduction with duration as a measure of reduction. The results suggest that the spectral reduction scores capture other aspects of reduction than duration. While duration can--albeit to a moderate degree--be predicted by a number of linguistically motivated variables (such as word frequency, segmental context, and speech rate), the spectral reduction scores cannot. This suggests that the spectral reduction scores capture information that is not directly accounted for by the linguistically motivated variables. The results also show that the spectral reduction scores are able to predict a substantial amount of the variation in duration that the linguistically motivated variables do not account for.
Lecture Notes in Computer Science | 2004
Lou Boves; A. Neumann; Louis Vuurpijl; Louis ten Bosch; Stéphane Rossignol; Ralf Engel; Norbert Pfleger
In this paper we report on ongoing experiments with an advanced multimodal system for applications in architectural design. The system supports uninformed users in entering the relevant data about a bathroom that must be refurnished, and is tested with 28 subjects. First, we describe the IST project COMIC, which is the context of the research. We explain how the work in COMIC goes beyond previous research in multimodal interaction for eWork and eCommerce applications that combine speech and pen input with speech and graphics output: in design applications one cannot assume that uninformed users know what they must do to satisfy the system’s expectations. Consequently, substantial system guidance is necessary, which in its turn creates the need to design a system architecture and an interaction strategy that allow the system to control and guide the interaction. The results of the user tests show that the appreciation of the system is mainly determined by the accuracy of the pen and speech input recognisers. In addition, the turn taking protocol needs to be improved.
Speech Communication | 2007
Louis ten Bosch; Katrin Kirchhoff
Although it seems to go effortlessly, speech recognition by humans is a computationally intensive cognitive task. During the last decades, psycholinguistic research has unravelled many aspects of the complex mechanisms underlying human speech processing by observing human behaviour in psycholinguistic experiments and designing theories to describe and explain this behaviour. A number of theories have been implemented in the form of models for the simulation and explanation of human speech recognition. Computational models of word recognition (such as TRACE, McClelland and Elman, 1986; Shortlist, Norris, 1994; the Neighborhood Activation Model, Luce and Pisoni, 1998) led to the in-depth investigation of the decoding of the speech signal in terms of words and the underlying processes of word activation and competition. It is tempting to compare at a high level the speech decoding process by humans with the processing that takes place in automatic speech recognition. Functionally, both processes have many aspects in common. For example, both transfer audio input into some subsymbolic representation, both require a lexicon, a mechanism for matching models of words against the input speech signal, and a word search process based on competition between hypotheses. At a conceptual level, humans and computers perform the same task when decoding a signal into a sequence of lexical items. It is therefore not surprising that while there are substantial differences with respect to methods and aims between the research areas of human speech recognition (HSR) and automatic speech recognition (ASR), in each there is a growing interest in the possible cross-fertilisation by exploiting techniques and theories from the other field (e.g. Moore and Cutler, 2001). Actually, this issue is in part a direct result of this increased mutual interest. The theme of this issue is Bridging the Gap. Here, the word gap has two interpretations. In its first interpretation it refers to the gap mentioned above, i.e. the gap between HSR and (conventional) ASR: while HSR focuses on the fundamental understanding of human speech processing, ASR aims at the automatic decoding of the speech signal using statistical models so as to minimise word error rates.
Speech Communication | 2005
Els den Os; Lou Boves; Stéphane Rossignol; Louis ten Bosch; Louis Vuurpijl
In this paper we investigate the usability of speech-centric multimodal interaction by comparing two systems that support the same unfamiliar task, viz. bathroom design. One version implements a conversational agent (CA) metaphor, while the alternative one is based on direct manipulation (DM). Twenty subjects, 10 males and 10 females, none of whom had recent experience with bathroom (re-)design completed the same task with both systems. After each task we collected objective measures (task completion time, task completion rate, number of actions performed, speech and pen recognition errors) and subjective measures in the form of Likert Scale ratings. We found that the task completion rate for the CA system is higher than for the DM system. Nevertheless, subjects did not agree on their preference for one of the systems: those subjects who were able to use the DM system effectively preferred that system, mainly because it was faster for them, and they felt more in control. We conclude that for multimodal CA systems to become widely accepted substantial improvements in system architecture and in the performance of almost all individual modules are needed.