Carlos D. Martínez-Hinarejos
Polytechnic University of Valencia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Carlos D. Martínez-Hinarejos.
international conference on pattern recognition | 2000
Carlos D. Martínez-Hinarejos; Alfons Juan; Francisco Casacuberta
A string that minimizes the sum of distances to the strings of a given set is known as (generalized) median string of the set. This concept is important in pattern recognition for modelling a (large) set of garbled strings or patterns. The search of such a string is an NP-Hard problem and, therefore, no efficient algorithms to compute the median strings can be designed. A greedy approach has been proposed to compute an approximate median string of a set of strings. In this work an algorithm is proposed that iteratively improves the approximate solution given above. Experiments have been carried out on synthetic and real data to compare the performances of the approximate median string with the conventional set median. These experiments showed that the proposed median string is a better representation of a given set than the corresponding set median.
Pattern Recognition Letters | 2003
Carlos D. Martínez-Hinarejos; Alfons Juan; Francisco Casacuberta
Modelling a (large) set of garbled patterns with a prototype is an important issue in pattern recognition. When strings are used as object representations, the representative prototype can be a (generalized) median string. The median string of a set of strings can be defined as the string that minimizes the sum of distances to the strings of a given set. The search of such a string is a NP-Hard problem and, therefore, no efficient algorithms to compute the median strings can be designed. Thus, the use of the set median string, which is the string in the set that minimizes the sum of distances to the strings of the set, is very common.Recently, a greedy approach was proposed to compute good approximations to the median string of a set of strings. In this work, the use of approximated median strings with k-nearest-neighbours classifiers is presented.Exhaustive experiments have been carried out on a corpus of chromosomes. These experiments showed that the proposed approximations to the median string are a better representation of a given set than the corresponding set median.
Speech Communication | 2008
Carlos D. Martínez-Hinarejos; José-Miguel Benedí; Ramón Granell
Dialogue systems are one of the most interesting applications of speech and language technologies. There have recently been some attempts to build dialogue systems in Spanish, and some corpora have been acquired and annotated. Using these corpora, statistical machine learning methods can be applied to try to solve problems in spoken dialogue systems. In this paper, two statistical models based on the maximum likelihood assumption are presented, and two main applications of these models on a Spanish dialogue corpus are shown: labelling and decoding. The labelling application is useful for annotating new dialogue corpora. The decoding application is useful for implementing dialogue strategies in dialogue systems. Both applications centre on unsegmented dialogue turns. The obtained results show that, although limited, the proposed statistical models are appropriate for these applications.
international conference on document analysis and recognition | 2015
Emilio Granell; Carlos D. Martínez-Hinarejos
Transcription of historical documents is an interesting task for libraries in order to make available their funds. In the lasts years, the use of Handwritten Text Recognition allowed paleographs to speed up the manual transcription process, since they are able to correct on a draft transcription. Another alternative is obtaining the draft transcription by dictating the contents to an Automatic Speech Recognition system. When both sources (image and speech) are available, a multimodal combination is possible, and an iterative process can be used in order to refine the final hypothesis. In this work, a multimodal combination based on confusion networks is presented. Results on two different sets of data, with different difficulty level, show that the proposed technique provides similar or better draft transcriptions than a previously proposed approach, allowing for a faster transcription process.
Pattern Recognition Letters | 2014
Vicent Alabau; Carlos D. Martínez-Hinarejos; Verónica Romero; Antonio L. Lagarda
The transcription of historical documents is one of the most interesting tasks in which Handwritten Text Recognition can be applied, due to its interest in humanities research. One alternative for transcribing the ancient manuscripts is the use of speech dictation by using Automatic Speech Recognition techniques. In the two alternatives similar models (Hidden Markov Models and n-grams) and decoding processes (Viterbi decoding) are employed, which allows a possible combination of the two modalities with little difficulties. In this work, we explore the possibility of using recognition results of one modality to restrict the decoding process of the other modality, and apply this process iteratively. Results of these multimodal iterative alternatives are significantly better than the baseline uni-modal systems and better than the non-iterative alternatives.
document engineering | 2016
Emilio Granell; Carlos D. Martínez-Hinarejos
Transcription of handwritten historical documents is one of the main topics in document analysis systems, due to cultural reasons. State-of-the-art handwritten text recognition systems allow to speed up the transcription task. Currently, this automatic transcription is far from perfect, and human expert revision is required in order to obtain the actual transcription. In this context, crowdsourcing emerged as a powerful tool for massive transcription at a relatively low cost, since the supervision effort of professional transcribers may be dramatically reduced. However, current transcription crowdsourcing platforms are mainly limited to the use of non-mobile devices, since the use of keyboards in mobile devices is not friendly enough for most users. This work presents the alternative of using speech dictation of handwritten text lines as transcription source in a crowdsourcing platform. The experiments explore how an initial handwritten text recognition hypothesis can be improved by using the contribution of speech recognition from several speakers, providing as a final result a better hypothesis to be amended by a professional transcriber with less effort.
computer analysis of images and patterns | 2015
Emilio Granell; Carlos D. Martínez-Hinarejos
Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of \(14.3\%\) and \(16.6\%\), respectively, over the HTR baseline.
Natural Language Engineering | 2012
Vicent Tamarit; Carlos D. Martínez-Hinarejos; José-Miguel Benedí
In dialogue systems it is important to label the dialogue turns with dialogue-related meaning. Each turn is usually divided into segments and these segments are labelled with dialogue acts (DAs). A DA is a representation of the functional role of the segment. Each segment is labelled with one DA, representing its role in the ongoing discourse. The sequence of DAs given a dialogue turn is used by the dialogue manager to understand the turn. Probabilistic models that perform DA labelling can be used on segmented or unsegmented turns. The last option is more likely for a practical dialogue system, but it provides poorer results. In that case, a hypothesis for the number of segments can be provided to improve the results. We propose some methods to estimate the probability of the number of segments based on the transcription of the turn. The new labelling model includes the estimation of the probability of the number of segments in the turn. We tested this new approach with two different dialogue corpora: SwitchBoard and Dihana. The results show that this inclusion significantly improves the labelling accuracy.
Spoken Dialogue Systems Technology and Design | 2011
Vicent Tamarit; Carlos D. Martínez-Hinarejos; José-Miguel Benedí
The implementation of dialogue systems is one of the most interesting applications of language technologies. Statistical models can be used in this implementation, allowing for a more flexible approach than when using rules defined by a human expert. However, statistical models require large amounts of dialogues annotated with dialogue-function labels (usually Dialogue Acts), and theannotation process is hard and time-consuming. Consequently, the use of other statistical models to obtain faster annotations is really interesting for the development of dialogue systems. In this work we compare two statistical models for dialogue annotation, a more classical Hidden Markov Model (HMM) based model and the new N-gram Transducers (NGT) model. This comparison is performed on two corpora of different nature, the well-known SwitchBoard corpus and the DIHANA corpus. The results show that the NGT model produces a much more accurate annotation that the HMM-based model (even 11% less error in the SwitchBoard corpus).
IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics | 2009
Carlos D. Martínez-Hinarejos
A dialogue system is usually defined as a computer system that interacts with a human user to achieve a task using dialogue [5]. In these systems, the computer must know the meaning and intention of the user input, in order to give the appropriate answer. The user turns must be interpreted by the system, taking only into account the essential information, i.e, their semantics for the dialogue process and the task to be accomplished. This information is usually represented by labels called Dialogue Acts (DA) [2] which label different segments of the turn known as utterances [8]. The DA labels usually take into account the semantics of the utterance with respect to the dialogue process, but they can include semantic information related to the task the dialogue is about.