M. Inés Torres | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where M. Inés Torres is active.

Explore More

Publication

Featured researches published by M. Inés Torres.

Knowledge Based Systems | 2014

Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web

Raquel Justo; Thomas Chase Corcoran; Stephanie M. Lukin; Marilyn A. Walker; M. Inés Torres

Automatic detection of emotions like sarcasm or nastiness in online written conversation is a difficult task. It requires a system that can manage some kind of knowledge to interpret that emotional language is being used. In this work, we try to provide this knowledge to the system by considering alternative sets of features obtained according to different criteria. We test a range of different feature sets using two different classifiers. Our results show that the sarcasm detection task benefits from the inclusion of linguistic and semantic information sources, while nasty language is more easily detected using only a set of surface patterns or indicators.

Pattern Analysis and Applications | 2009

Phrase classes in two-level language models for ASR

Raquel Justo; M. Inés Torres

In this work, we propose and compare two different approaches to a two-level language model. Both of them are based on phrase classes but they consider different ways of dealing with phrases into the classes. We provide a complete formulation consistent with the two approaches. The language models proposed were integrated into an Automatic Speech Recognition (ASR) system and evaluated in terms of Word Error Rate. Several series of experiments were carried out over a spontaneous human–machine dialogue corpus in Spanish, where users asked for information about long-distance trains by telephone. It can be extracted from the obtained results that the integration of phrases into classes when using the language models proposed leads to an improvement of the performance of an ASR system. Moreover, the obtained results seem to indicate that the history length with which the best performance is achieved is related to the features of the model itself. Thus, not all the models show the best results with the same value of history length.

ambient media and systems | 2008

Improving dialogue systems in a home automation environment

Raquel Justo; Oscar Saz; Víctor G. Guijarrubia; Antonio Miguel; M. Inés Torres; Eduardo Lleida

In this paper, a task of human-machine interaction based on speech is presented. The specific task consists on the use and control of a set of home appliances through a turn-based dialogue system. This work focuses on the first part of the dialogue system, the Automatic Speech Recognition (ASR) system. Two lines of work are taken into account to improve the performance of the ASR system. On one hand, the acoustic modeling required for the ASR is improved via Speaker Adaptation techniques. On the other hand, the Language Modeling in the system is improved by the use of class-based Language Models. The results show the good performance of both techniques to improve the ASR results, as the Word Error Rate (WER) drops from 5.81% using a close-talk microphone to a 0.99% and from 14.53% using a lapel microphone to a 1.52%. Also, an important reduction is achieved in terms of the Category Error Rate (CER), which measures the ability of the ASR system to extract the semantic information of the uttered sentence, dropping from 6.13% and 15.32% to 1.29% and 1.32% for the two microphones used in the experiments.

Speech Communication | 2008

Joining linguistic and statistical methods for Spanish-to-Basque speech translation

Alicia Pérez; M. Inés Torres; Francisco Casacuberta

The goal of this work is to develop a text and speech translation system from Spanish to Basque. This pair of languages shows quite odd characteristics as they differ extraordinarily in both morphology and syntax, thus, attractive challenges in machine translation are involved. Nevertheless, since both languages share official status in the Basque Country, the underlying motivation is not only academic but also practical. Finite-state transducers were adopted as basic translation models. The main contribution of this work involves the study of several techniques to improve probabilistic finite-state transducers by means of additional linguistic knowledge. Two methods to cope with both linguistics and statistics were proposed. The first one performed a morphological analysis in an attempt to benefit from atomic meaningful units when it comes to rendering the meaning from one language to the other. The second approach aimed at clustering words according to their syntactic role and used such phrases as translation unit. From the latter approach phrase-based finite-state transducers arose as a natural extension of classical ones. The models were assessed under a restricted domain task, very repetitive and with a small vocabulary. Experimental results shown that both morphological and syntactical approaches outperformed the baseline under different test sets and architectures for speech translation.

text speech and dialogue | 2004

A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition

Luis Javier Rodríguez; M. Inés Torres

In this paper a speaker adaptation methodology is proposed, which first automatically determines a number of speaker clusters in the training material, then estimates the parameters of the corresponding models, and finally applies a fast match strategy – based on the so called histogram models – to choose the optimal cluster for each test utterance. The fast match strategy is critical to make this methodology useful in real applications, since carrying out several recognition passes – one for each cluster of speakers – , and then selecting the decoded string with the highest likelihood, would be too costly. Preliminary experimentation over two speech databases in Spanish reveal that both the clustering algorithm and the fast match strategy are consistent and reliable. The histogram models, though being suboptimal – they succeeded in guessing the right cluster for unseen test speakers in 85% of the cases with read speech, and in 63% of the cases with spontaneous speech – , yielded around a 6% decrease in error rate in phonetic recognition experiments.

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition | 2012

Modeling spoken dialog systems under the interactive pattern recognition framework

M. Inés Torres; José-Miguel Benedí; Raquel Justo; Fabrizio Ghigi

The new Interactive Pattern Recognition (IPR) framework has been recently proposed. This proposal lets a human interact with a Pattern Recognition system allowing the system to learn from the interaction as well as adapt it to the human behavior. The aim of this paper is to apply the principles of IPR to the design of Spoken Dialog Systems (SDS). We propose a new formulation to present SDS as an IPR problem. To this end some extensions to the IPR approach are proposed. Additionally a user model based on the IPR paradigm is also defined. We applied the proposed formulation to compose a preliminary graphical model that has been experimentally developed to deal with a Spanish dialog task. An initial maximum likelihood strategy for the dialog manager actions along with a stochastic simulation of user behavior have allowed to get new dialogs. The preliminary evaluation of these results allowed us to consider this formulation as a promising framework to deal with SDS.

Pattern Recognition Letters | 2010

Text- and speech-based phonotactic models for spoken language identification of Basque and Spanish

Víctor G. Guijarrubia; M. Inés Torres

This paper presents a series of spoken language identification experiments involving Spanish and Basque. Spanish and Basque are both official languages in the Basque Country, a region located in northern Spain. We focused our research on the study of several phonotactic-based methodologies, analysing at the same time the performance of phonotactic models trained from text and speech samples and the use of phone and phone sequences as decoding units. Although we focus mainly on Spanish-Basque identification, the analysis is later extended to English, so that more generic conclusions can be drawn. From the bilingual results, we can conclude that the text-based phonotactic models can perform similarly to the audio-based ones when applied to read speech. Moreover, when using task-specific information it is also possible to achieve a high accuracy. The use of phone sequences as decoding units results, in most of the cases, in a decrease in performance and appears to be useful when constraining the phone decoders to those sequences. Similar conclusions can be drawn from the trilingual experiments.

iberian conference on pattern recognition and image analysis | 2011

Impact of the approaches involved on word-graph derivation from the ASR system

Raquel Justo; Alicia Pérez; M. Inés Torres

Finding the most likely sequence of symbols given a sequence of observations is a classical pattern recognition problem. This problem is frequently approached by means of the Viterbi algorithm, which aims at finding the most likely sequence of states within a trellis given a sequence of observations. Viterbi algorithm is widely used within the automatic speech recognition (ASR) framework to find the expected sequence of words given the acoustic utterance in spite of providing a suboptimal result. Word-graphs (WGs) are also frequently provided as the ASR output as a means of obtaining alternative hypotheses, hopefully more accurate than the one provided by the Viterbi algorithm. The trouble is that WGs can grow up in a very computationally inefficient manner. The aim of this work is to fully describe a specific method, computationally affordable, for getting a WG given the input utterance. The paper focuses specifically on the underlying approaches and their influence on both the spatial cost and the performance.

Natural Language Dialog Systems and Intelligent Assistants | 2015

Decision Making Strategies for Finite-State Bi-automaton in Dialog Management

Fabrizio Ghigi; M. Inés Torres

Stochastic regular bi-languages has been recently proposed to model the joint probability distributions appearing in some statistical approaches of spoken dialog systems. To this end a deterministic and probabilistic finite-state bi-automaton was defined to model the distribution probabilities for the dialog model. In this work we propose and evaluate decision strategies over the defined probabilistic finite-state bi-automaton to select the best system action at each step of the interaction. To this end the paper proposes some heuristic decision functions that consider both action probabilities learn from a corpus and number of known attributes at running time. We compare heuristics either based on a single next turn or based on entire paths over the automaton. Experimental evaluation was carried out to test the model and the strategies over the Let’s Go Bus Information system. The results obtained show good system performances. They also show that local decisions can lead to better system performances than best path-based decisions due to the unpredictability of the user behaviors.

Language and Speech | 2006

Spontaneous speech events in two speech databases of human-computer and human-human dialogs in Spanish.

Luis Javier Rodríguez; M. Inés Torres

Previous works in English have revealed that disfluencies follow regular patterns and that incorporating them into the language model of a speech recognizer leads to lower perplexities and sometimes to a better performance. Although work on disfluency modeling has been applied outside the English community (e.g., in Japanese), as far as we know there is no specific work dealing with disfluencies in Spanish. In this paper, we follow a data driven approach in exploring the potential benefit of modeling disfluencies in a speech recognizer in Spanish. Two databases of human-computer and human-human dialogs are considered, which allow the absolute and relative frequencies of disfluencies in the two situations to be compared. The rate of disfluencies in human-human dialogs is found to be very close to that found for similar databases in English. Due to setup factors, the rate of disfluencies found in human-computer dialogs was remarkably higher than that reported for similar databases in English. In any case, from the point of view of speech recognition, the high frequencies of disfluencies and the distinct features of the acoustic events related to them support the need for explicit acoustic models. The regularities observed in the distribution of filled pauses and speech repairs reveal that including them in the language model of the speech recognizer may be also helpful. The extent to which the number of events depends on utterance length and on the speaker is also explored. Statistics are shown that follow previous studies for English, and a sizeable space is devoted to comparing our results with them. Finally, various possible cues for the automatic detection of speech repairs—a key issue from the point of view of speech understanding—are explored: silent pauses, filled pauses, lengthenings, cut off words and discourse markers. As previously observed for English, none of them was found to be reliable by itself. More information, especially at the acoustic-prosodic level, is no doubt needed to reliably detect speech repairs.

Explore More