Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roland Kuhn is active.

Publication


Featured researches published by Roland Kuhn.


IEEE Transactions on Speech and Audio Processing | 2000

Rapid speaker adaptation in eigenvoice space

Roland Kuhn; Jean-Claude Junqua; Patrick Nguyen; Nancy Niedzielski

This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These eigenvoice basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26% relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work.


Journal of the Acoustical Society of America | 2003

Universal remote control allowing natural language modality for television and multimedia searches and requests

Roland Kuhn; Tony Davis; Jean-Claude Junqua; Yi Zhao; Weiying Li

The remote control unit supports multi-modal dialog with the user, through which the user can easily select programs for viewing or recording. The remote control houses a microphone into which the user can input natural language speech. The input speech is recognized and interpreted by a natural language parser that extracts the semantic content of the users speech. The parser works in conjunction with an electronic program guide, through which the remote control system is able to ascertain what programs are available for viewing or recording and supply appropriate prompts to the user. In one embodiment, the remote control includes a touch screen display upon which the user may view prompts or make selections by pen input or tapping. Selections made on the touch screen automatically limit the context of the ongoing dialog between user and remote control, allowing the user to interact naturally with the unit. The remote control unit can control virtually any audio-video component, including those designed before the current technology. The remote control system can be packaged entirely within the remote control handheld unit, or components may be distributed in other systems attached to the users multimedia equipment.


international conference on acoustics speech and signal processing | 1999

Fast speaker adaptation using a priori knowledge

Roland Kuhn; Patrick Nguyen; Jean-Claude Junqua; Robert Boman; Nancy Niedzielski; Steven Fincke; Kenneth L. Field; Matteo Contolini

Previously, we presented a radically new class of fast adaptation techniques for speech recognition, based on prior knowledge of speaker variation. To obtain this prior knowledge, one applies a dimensionality reduction technique to T vectors of dimension D derived from T speaker-dependent (SD) models. This offline step yields T basis vectors, the eigenvoices. We constrain the model for new speaker S to be located in the space spanned by the first K eigenvoices. Speaker adaptation involves estimating K eigenvoice coefficients for the new speaker; typically, K is very small compared to original dimension D. Here, we review how to find the eigenvoices, give a maximum-likelihood estimator for the new speakers eigenvoice coefficients, and summarize mean adaptation experiments carried out on the Isolet database. We present new results which assess the impact on performance of changes in training of the SD models. Finally, we interpret the first few eigenvoices obtained.


multimedia signal processing | 1998

Eigenfaces and eigenvoices: dimensionality reduction for specialized pattern recognition

Roland Kuhn; Patrick Nguyen; Jean-Claude Junqua; Lloyd Goldwasser

There are hidden analogies between two dissimilar research areas: face recognition and speech recognition. The standard representations for faces and voices misleadingly suggest that they have a high number of degrees of freedom. However, human faces have two eyes, a nose, and a mouth in predictable locations; such constraints ensure that possible images of faces occupy a tiny portion of the space of possible 2D images. Similarly, physical and cultural constraints on acoustic realizations of words uttered by a particular speaker imply that the true number of degrees of freedom for speaker-dependent hidden Markov models (HMMs) is quite small. Face recognition researchers have adopted representations that make explicit the underlying low dimensionality of the task, greatly improving the performance of their systems while reducing computational costs. We argue that speech researchers should use similar techniques to represent variation between speakers, and discuss applications to speaker adaptation, speaker identification and speaker verification.


Journal of the Acoustical Society of America | 2004

Dialogue device for call screening and Classification

Roland Kuhn; Matteo Contolini; Robert Boman

The call screener employs a telephone system interface connected between a telephone network and a telephone device of a user. The interface selectively routes calls (and refrain from routing calls) based on the results from the dialogue system. The dialogue system elicits speech from an incoming caller and causes the telephone system interface to route calls from the incoming caller based on a comparison of the elicited speech with a set of stored speaker models. The stored speaker models may be maintained automatically by the system, using either a passive mode, in which calls exceeding a predetermined duration are assumed to be “acceptable” callers; and a proactive mode in which the system prompts the user at the end of the call to elect whether to save the speech models developed during that call in the acceptable user database. If desired, the user can attach other attributes or special tags to the stored models, indicating special handling or call routing rules to be applied when that caller calls again.


international conference on acoustics, speech, and signal processing | 2001

Very fast adaptation with a compact context-dependent eigenvoice model

Roland Kuhn; Florent Perronnin; Patrick Nguyen; Jean-Claude Junqua; Luca Rigazio

The eigenvoice technique achieves rapid speaker adaptation by employing prior knowledge of speaker space obtained from reference speakers to place strong constraints on the initial model for each new speaker. It has previously been shown to yield very fast adaptation for a large-vocabulary system. In this paper, we describe a new way of applying the eigenvoice technique to context-dependent acoustic modeling, called the eigencentroid plus delta trees (EDT) model. Here, the context-dependent model is defined so that it consists of a speaker-dependent component with a small number of parameters linked to a speaker-independent component with far more parameters. The eigenvoice technique can then be applied to the speaker-dependent component alone to attain very fast adaptation of the entire context-dependent model (e.g., 10% relative reduction in error rate after 3 sentences). EDT requires only a small number of parameters to represent speaker space and works even if only a small amount of data is available per reference speaker.


international conference on acoustics, speech, and signal processing | 1997

Approaches to phoneme-based topic spotting: an experimental comparison

Roland Kuhn; Peter Nowell; Caroline Drouin

Topic spotting is often performed on the output of a large vocabulary recognizer or a keyword spotter. However, this requires detailed knowledge about the vocabulary, and transcribed training data. If portability to new topics and languages is important, then a topic spotter based on phoneme recognition is preferable. A phoneme recognizer is run on training data consisting of audio files labeled by topic alone-no word transcripts are required. Phoneme sub-sequences which help to predict the topic are then extracted automatically. The work described was carried out by two teams exploring three very different approaches to phoneme-based topic spotting: the DP-ngram, the decision tree, and the Euclidean approach. Results obtained by each team on the ARM (Airborne Reconnaissance Mission) and Switchboard data sets were compared by means of receiver operating characteristic (ROC) curves. The best performance for each team was obtained via a similar type of discriminative training.


Annales Des Télécommunications | 2000

Eigenvoices: A compact representation of speakers in model space

Patrick Nguyen; Roland Kuhn; Jean-Claude Junqua; Nancy Niedzielski; Christian Wellekens

In this article, we present a new approach to modeling speaker-dependent systems. The approach was inspired by the eigenfaces techniques used in face recognition. We build a linear vector space of low dimensionality, called eigenspace, in which speakers are located. The basis vectors of this space are called eigenvoices. Each eigenvoice models a direction of inter-speaker variability. The eigenspace is built during the training phase. Then, any speaker model can be expressed as a linear combination of eigenvoices. The benefits of this technique as set forth in this article reside in the reduction of the number of parameters that describe a model. Thereby we are able to reduce the number of parameters to estimate, as well as computation and/or storage costs. We apply the approach to speaker adaptation and speaker recognition. Some experimental results are supplied.RésuméCet article présente une nouvelle approche inspirée de la reconnaissance d’images, adaptée et appliquée à la parole. Un espace vectoriel de dimension réduite, appelé espace propre (eigenspace), dans lequel les locuteurs se trouvent confinés est construit. Les vecteurs de base de cet espace sont appelés voix propres (eigenvoices). Chaque voix propre modélise une composante de variabilité inter-locuteur. L’espace propre est construit lors de la phase d’apprentissage classique pour des systèmes liés à la parole. Un modèle du locuteur est par la suite associé à une combinaison linéaire des vecteurs de l’espace réduit des locuteurs. L’avantage de cette méthode, mis en avant dans l’article, est la réduction du nombre de paramètres caractéristiques d’un modèle. De ce fait, le nombre de paramètres à estimer est réduit, ainsi que le temps de calcul et/ou de stockage. Cette technique est ici appliquée à l’adaptation du locuteur pour un système de reconnaissance automatique du locuteur et à la reconnaissance automatique du locuteur. Quelques résultats expérimentaux sont présentés à cette occasion.


Archive | 1999

System and method for assessing TV-related information over the internet

Roland Kuhn; Jean-Claude Junqua; Tony Davis; Weiying Li; Yi Zhao


Archive | 2004

Method and parental control and monitoring of usage of devices connected to home network

Roland Kuhn; Philippe Morin; Brian A. Hanson

Collaboration


Dive into the Roland Kuhn's collaboration.

Researchain Logo
Decentralizing Knowledge