José Lopes
INESC-ID
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José Lopes.
international conference on acoustics, speech, and signal processing | 2013
José Lopes; Maxine Eskenazi; Isabel Trancoso
This paper proposes an approach to the use of lexical entrainment in Spoken Dialog Systems. This approach aims to increase the dialog success rate by adapting the lexical choices of the system to the users lexical choices. If the system finds that the users lexical choice degrades the performance, it will try to establish a new conceptual pact, proposing other words that the user may adopt, in order to be more successful in task completion. The approach was implemented and tested in two different systems. Tests showed a relative dialog estimated error rate reduction of 10% and a relative reduction in the average number of turns per session of 6%.
annual meeting of the special interest group on discourse and dialogue | 2015
Raveesh Meena; José Lopes; Gabriel Skantze; Joakim Gustafson
In this paper, we present a data-driven approach for detecting instances of miscommunication in dialogue system interactions. A range of generic features that are both automatically extractable and manually annotated were used to train two models for online detection and one for offline analysis. Online detection could be used to raise the error awareness of the system, whereas offline detection could be used by a system designer to identify potential flaws in the dialogue design. In experimental evaluations on system logs from three different dialogue systems that vary in their dialogue strategy, the proposed models performed substantially better than the majority class baseline models.
international conference on acoustics, speech, and signal processing | 2011
José Lopes; Isabel Trancoso; Alberto Abad
This paper presents a nativeness classifier for English. The detector was developed and tested with TED Talks collected from the web, where the major non-native cues are in terms of segmental aspects and prosody. The first experiments were made using only acoustic features, with Gaussian supervectors for training a classifier based on support vector machines. These experiments resulted in an equal error rate of 13.11%. The following experiments based on prosodic features alone did not yield good results. However, a fused system, combining acoustic and prosodic cues, achieved an equal error rate of 10.58%. A small human benchmark was conducted, showing an inter-rater agreement of 0.88. This value is also very close to the agreement value between humans and the best fused system.
ieee automatic speech recognition and understanding workshop | 2011
José Lopes; Maxine Eskenazi; Isabel Trancoso
When humans and computers use the same terms (primes, when they entrain to one another), spoken dialogs proceed more smoothly. The goal of this paper is to describe initial steps we have found that will enable us to eventually automatically choose better primes in spoken dialog system prompts. Two different sets of prompts were used to understand what makes one prime more suitable than another. The impact of the primes chosen in speech recognition was evaluated. In addition, results reveal that users did adopt the new vocabulary introduced in the new system prompts. As a result of this, performance of the system improved, providing clues for the trade off needed when choosing between adequate primes in prompts and speech recognition performance.
processing of the portuguese language | 2008
José Lopes; Cláudio Neves; Arlindo Veiga; Alexandre M. A. Maciel; Carla Lopes; Fernando Perdigão; Luis A. S. V. de Sa
This paper describes the development of a robust speech recognition using a database collected in the scope of the Tecnovoz project. The speech recognition system is speaker independent, robust to noise and operates in a small footprint embedded hardware platform. Some issues about the database, the training of the acoustic models, the noise suppression front-end and the recognizers confidence measure are addressed in the paper. Although the database was especially designed for specific small-vocabulary tasks, the best system performance was obtained using triphone models rather than whole-word models.
spoken language technology workshop | 2010
José Lopes; Isabel Trancoso; Rui Correia; Thomas Pellegrini; Hugo Meinedo; Nuno J. Mamede; Maxine Eskenazi
This paper describes the integration of multimedia documents in the Portuguese version of REAP, a tutoring system for vocabulary learning. The documents result from the pipeline processing of Broadcast News videos that automatically segments the audio files, transcribes them, adds punctuation and capitalization, and breaks them into stories classified by topics. The integration of these materials in REAP was done in a way that tries to decrease the impact of potential errors of the automatic chain in the learning process.
international conference on multimodal interfaces | 2016
Catharine Oertel; José Lopes; Yu Yu; Kenneth Alberto Funes Mora; Joakim Gustafson; Alan W. Black; Jean-Marc Odobez
Current dialogue systems typically lack a variation of audio-visual feedback tokens. Either they do not encompass feedback tokens at all, or only support a limited set of stereotypical functions. However, this does not mirror the subtleties of spontaneous conversations. If we want to be able to build an artificial listener, as a first step towards building an empathetic artificial agent, we also need to be able to synthesize more subtle audio-visual feedback tokens. In this study, we devised an array of monomodal and multimodal binary comparison perception tests and experiments to understand how different realisations of verbal and visual feedback tokens influence third-party perception of the degree of attentiveness. This allowed us to investigate i) which features (amplitude, frequency, duration...) of the visual feedback influences attentiveness perception; ii) whether visual or verbal backchannels are perceived to be more attentive iii) whether the fusion of unimodal tokens with low perceived attentiveness increases the degree of perceived attentiveness compared to unimodal tokens with high perceived attentiveness taken alone; iv) the automatic ranking of audio-visual feedback token in terms of conveyed degree of attentiveness.
9th IFIP WG 5.5 International Summer Workshop on Multimodal Interfaces, eNTERFACE 2013, Lisbon, Portugal, July 15 – August 9, 2013 | 2014
Samer Al Moubayed; Jonas Beskow; Bajibabu Bollepalli; Ahmed Hussen-Abdelaziz; Martin Johansson; Maria Koutsombogera; José Lopes; Jekaterina Novikova; Catharine Oertel; Gabriel Skantze; Kalin Stefanov; Gül Varol
This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets t ...
processing of the portuguese language | 2012
José Lopes; Maxine Eskenazi; Isabel Trancoso
The reliability of the confidence score is very important in Spoken Dialog System performance. This paper describes a set of experiments with previously collected off-line data, regarding the set of features that should be used in the computation of the confidence score. Three different regression methods to weight the features were used and the results show that the incorporation of the confidence score given by the speech recognizer improves the confidence measure.
conference of the international speech communication association | 2016
Spiros Georgiladakis; Georgia Athanasopoulou; Raveesh Meena; José Lopes; Arodami Chorianopoulou; Elisavet Palogiannidi; Elias Iosif; Gabriel Skantze; Alexandros Potamianos
A major challenge in Spoken Dialogue Systems (SDS) is the detection of problematic communication (hotspots), as well as the classification of these hotspots into different types (root cause analysi ...