José-Miguel Benedí
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José-Miguel Benedí.
international conference on computational linguistics | 2009
Alberto Barrón-Cedeño; Paolo Rosso; José-Miguel Benedí
Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that a previous search space reduction stage, based on the Kullback-Leibler symmetric distance, reduces the search process time dramatically. Additionally, it improves the Precision and Recall obtained by a search strategy based on the exhaustive comparison of word n -grams.
international conference on computational linguistics | 2009
David Pinto; José-Miguel Benedí; Paolo Rosso
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. Although this measure is commonly used to calculate a distance between two probability distributions, we have adapted it in order to obtain a distance value between two documents. We have carried out experiments over two different narrow-domain corpora and our findings indicates that it is possible to use this measure for the addressed problem obtaining comparable results than those which use the Jaccard similarity measure.
Pattern Recognition Letters | 2014
Francisco Álvaro; Joan-Andreu Sánchez; José-Miguel Benedí
This paper describes a formal model for the recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Hidden Markov models are used to recognize mathematical symbols, and a stochastic context-free grammar is used to model the relation between these symbols. This formal model makes possible to use classic algorithms for parsing and stochastic estimation. In this way, first, the model is able to capture many of variability phenomena that appear in on-line handwritten mathematical expressions during the training process. And second, the parsing process can make decisions taking into account only stochastic information, and avoiding heuristic decisions. The proposed model participated in a contest of mathematical expression recognition and it obtained the best results at different levels.
Machine Translation | 2000
Juan Carlos Amengual; Asunción Castaño; Antonio Castellanos; Víctor M. Jiménez; David Llorens; Andrés Marzal; Federico Prat; Juan Miguel Vilar; José-Miguel Benedí; Francisco Casacuberta; Moisés Pastor; Enrique Vidal
The EuTransAll project aims at using example-based approaches for the automatic development of Machine Translation systems accepting text and speech input for limited-domain applications. During the first phase of the project, a speech-translation system that is based on the use of automatically learned subsequential transducers has been built. This paper contains a detailed and mostly self-contained overview of the transducer-learning algorithms and system architecture, along with a new approach for using categories representing words or short phrases in both input and output languages. Experimental results using this approach are reported for a task involving the recognition and translation of sentences in the hotel-receptioncommunication domain, with a vocabulary of 683 words in Spanish. Atranslation word-error rate of 1.97% is achieved in real-timefactor 2.7 on a Personal Computer.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1988
Enrique Vidal; H. Rulot; Francisco Casacuberta; José-Miguel Benedí
The approximating and eliminating search algorithm (AESA) presented was recently introduced for finding nearest neighbors in metric spaces. Although the AESA was originally developed for reducing the time complexity of dynamic time-warping isolated word recognition (DTW-IWR), only rather limited experiments had been previously carried out to check its performance in this task. A set of experiments aimed at filling this gap is reported. The main results show that the important features reflected in previous simulation experiments are also true for real speech samples. With single-speaker dictionaries of up to 200 words, and for most of the different speech parameterizations, local metrics, and DTW productions tested, the AEAS consistently found the appropriate prototype while requiring only an average of 7-12 DTW computations (94-96% savings for 200 words), with a strong tendency to need fewer computations if the samples are close to their corresponding prototypes. >
Speech Communication | 1996
Antonio Castellanos; José-Miguel Benedí; Francisco Casacuberta
Abstract A noisy environment usually degrades the intelligibility of a human speaker or the performance of a speech recognizer. Due to this noise, a phenomenon appears which is caused by the articulatory changes made by speakers in order to be more intelligible in the noisy environment: the Lombard effect. Over the last few years, special emphasis has been placed on analyzing and dealing with the Lombard effect within the framework of Automatic Speech Recognition. Thus, the first purpose of the work presented in this paper was to study the possible common tendencies of some acoustic features in different phonetic units for Lombard speech. Another goal was to study the influence of gender in the characterization of the above tendencies. Extensive statistical tests were carried out for each feature and each phonetic unit, using a large Spanish continuous speech corpus. The results reported here confirm the changes produced in Lombard speech with regard to normal speech. Nevertheless, some new tendencies have been observed from the outcome of the statistical tests.
Sensors | 2013
Carlos Fernandez-Llatas; José-Miguel Benedí; Juan Miguel García-Gómez; Vicente Traver
The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection.
Speech Communication | 2008
Carlos D. Martínez-Hinarejos; José-Miguel Benedí; Ramón Granell
Dialogue systems are one of the most interesting applications of speech and language technologies. There have recently been some attempts to build dialogue systems in Spanish, and some corpora have been acquired and annotated. Using these corpora, statistical machine learning methods can be applied to try to solve problems in spoken dialogue systems. In this paper, two statistical models based on the maximum likelihood assumption are presented, and two main applications of these models on a Spanish dialogue corpus are shown: labelling and decoding. The labelling application is useful for annotating new dialogue corpora. The decoding application is useful for implementing dialogue strategies in dialogue systems. Both applications centre on unsegmented dialogue turns. The obtained results show that, although limited, the proposed statistical models are appropriate for these applications.
Computer Speech & Language | 2005
José-Miguel Benedí; Joan-Andreu Sánchez
This paper is devoted to the estimation of stochastic context-free grammars (SCFGs) and their use as language models. Classical estimation algorithms, together with new ones that consider a certain subset of derivations in the estimation process, are presented in a unified framework. This set of derivations is chosen according to both structural and statistical criteria. The estimated SCFGs have been used in a new hybrid language model to combine both a word-based n-gram, which is used to capture the local relations between words, and a category-based SCFG together with a word distribution into categories, which is defined to represent the long-term relations between these categories. We describe methods for learning these stochastic models for complex tasks, and we present an algorithm for computing the word transition probability using this hybrid language model. Finally, experiments on the UPenn Treebank corpus show significant improvements in the test set perplexity with regard to the classical word trigram models.
international conference on document analysis and recognition | 2011
Francisco Álvaro; Joan-Andreu S´nchez; José-Miguel Benedí
In this work, a system for recognition of printed mathematical expressions has been developed. Hence, a statistical framework based on two-dimensional stochastic context-free grammars has been defined. This formal framework allows to jointly tackle the segmentation, symbol recognition and structural analysis of a mathematical expression by computing its most probable parsing. In order to test this approach a reproducible and comparable experiment has been carried out over a large publicly available (InftyCDB-1) database. Results are reported using a well-defined global dissimilitude measure. Experimental results show that this technique is able to properly recognize mathematical expressions, and that the structural information improves the symbol recognition step.