Francisco Casacuberta

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francisco Casacuberta is active.

Explore More

Publication

Featured researches published by Francisco Casacuberta.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Probabilistic finite-state machines - part II

Enrique Vidal; Franck Thollard; C. de la Higuera; Francisco Casacuberta; Rafael C. Carrasco

Probabilistic finite-state machines are used today in a variety of areas in pattern recognition or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part, we study the relations between probabilistic finite-state automata and other well-known devices that generate strings like hidden Markov models and n-grams and provide theorems, algorithms, and properties that represent a current state of the art of these objects.

Computational Linguistics | 2004

Machine Translation with Inferred Stochastic Finite-State Transducers

Francisco Casacuberta; Enrique Vidal

Finite-state transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attractive. Finite-state transducers are very adequate for use in constrained tasks in which training samples of pairs of sentences are available. A technique for inferring finite-state transducers is proposed in this article. This technique is based on formal relations between finite-state transducers and rational grammars. Given a training corpus of source-target pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an n-gram) is inferred. This grammar is finally converted into a finite-state transducer. The proposed methods are assessed through a series of machine translation experiments within the framework of the E u Trans project.

Computational Linguistics | 2009

Statistical approaches to computer-assisted translation

Sergio Barrachina; Oliver Bender; Francisco Casacuberta; Jorge Civera; Elsa Cubel; Shahram Khadivi; Antonio L. Lagarda; Hermann Ney; Jesús Tomás; Enrique Vidal; Juan Miguel Vilar

Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.

International Journal of Pattern Recognition and Artificial Intelligence | 2004

INTEGRATED HANDWRITING RECOGNITION AND INTERPRETATION USING FINITE-STATE MODELS

Alejandro Héctor Toselli; Alfons Juan; Jorge González; Ismael Salvador; Enrique Vidal; Francisco Casacuberta; Daniel Keysers; Hermann Ney

The interpretation of handwritten sentences is carried out using a holistic approach in which both text image recognition and the interpretation itself are tightly integrated. Conventional approaches follow a serial, first-recognition then-interpretation scheme which cannot adequately use semantic–pragmatic knowledge to recover from recognition errors. Stochastic finite-sate transducers are shown to be suitable models for this integration, permitting a full exploitation of the final interpretation constraints. Continuous-density hidden Markov models are embedded in the edges of the transducer to account for lexical and morphological constraints. Robustness with respect to stroke vertical variability is achieved by integrating tangent vectors into the emission densities of these models. Experimental results are reported on a syntax-constrained interpretation task which show the effectiveness of the proposed approaches. These results are also shown to be comparatively better than those achieved with other conventional, N-gram-based techniques which do not take advantage of full integration.

Theoretical Computer Science | 1999

Topology of strings: median string is NP-complete

C. de la Higuera; Francisco Casacuberta

Given a set of strings, the problem of finding a string that minimises its distance to the set is directly related with problems frequently encountered in areas involving Pattern recognition or computational biology. Based on the Levenshtein (or edit) distance, different definitions of distances between a string and a set of strings can be adopted. In particular, if this definition is the sum of the distances to each string of the set, the string that minimises this distance is the (generalised) median string. Finding this string corresponds in speech recognition to giving a model for a set of acoustic sequences, and in computational biology to constructing an optimal evolutionary tree when the given phylogeny is a star. Only efficient algorithms are known for finding approximate solutions. The results in this paper are combinatorial and negative. We prove that computing the median string corresponds to a NP-complete decision problems, thus proving that this problem is NP-hard.

international colloquium on grammatical inference | 2000

Computational Complexity of Problems on Probabilistic Grammars and Transducers

Francisco Casacuberta; Colin de la Higuera

Determinism plays an important role in grammatical inference. However, in practice, ambiguous grammars (and non determinism grammars in particular) are more used than determinism grammars. Computing the probability of parsing a given string or its most probable parse with stochastic regular grammars can be performed in linear time. However, the problem of finding the most probable string has yet not given any satisfactory answer. In this paper we prove that the problem is NP-hard and does not allow for a polynomial time approximation scheme. The result extends to stochastic regular syntax-directed translation schemes.

Computer Speech & Language | 2004

Some approaches to statistical and finite-state speech-to-speech translation

Francisco Casacuberta; Hermann Ney; Franz Josef Och; Enrique Vidal; Juan Miguel Vilar; Sergio Barrachina; I. Garcı́a-Varea; D. Llorens; César Martínez; Sirko Molau; Francisco Nevado; Moisés Pastor; David Picó; Alberto Sanchis; C. Tillmann

Abstract Speech-input translation can be properly approached as a pattern recognition problem by means of statistical alignment models and stochastic finite-state transducers. Under this general framework, some specific models are presented. One of the features of such models is their capability of automatically learning from training examples. Moreover, the stochastic finite-state transducers permit an integrated architecture similar to one used in speech recognition. In this case, the acoustic models (hidden Markov models) are embedded into the finite-state transducers, and the translation of a source utterance is the result of a (Viterbi) search on the integrated network. These approaches have been followed in the framework of the European project E u T rans . Translation experiments have been performed from Spanish to English and from Italian to English in an application involving the interaction of a customer with a receptionist at the frontdesk of a hotel.

Archive | 2011

Multimodal Interactive Pattern Recognition and Applications

Alejandro Héctor Toselli; Enrique Vidal; Francisco Casacuberta

This book presents a different approach to pattern recognition (PR) systems, in which users of a system are involved during the recognition process. This can help to avoid later errors and reduce the costs associated with post-processing. The book also examines a range of advanced multimodal interactions between the machine and the users, including handwriting, speech and gestures. Features: presents an introduction to the fundamental concepts and general PR approaches for multimodal interaction modeling and search (or inference); provides numerous examples and a helpful Glossary; discusses approaches for computer-assisted transcription of handwritten and spoken documents; examines systems for computer-assisted language translation, interactive text generation and parsing, relevance-based image retrieval, and interactive document layout analysis; reviews several full working prototypes of multimodal interactive PR applications, including live demonstrations that can be publicly accessed on the Internet.

International Journal of Pattern Recognition and Artificial Intelligence | 2002

CYCLIC SEQUENCE ALIGNMENTS: APPROXIMATE VERSUS OPTIMAL TECHNIQUES

Ramón Alberto Mollineda; Enrique Vidal; Francisco Casacuberta

The problem of cyclic sequence alignment is considered. Most existing optimal methods for comparing cyclic sequences are very time consuming. For applications where these alignments are intensively used, optimal methods are seldom a feasible choice. The alternative to an exact and costly solution is to use a close-to-optimal but cheaper approach. In previous works, we have presented three suboptimal techniques inspired on the quadratic-time suboptimal algorithm proposed by Bunke and Buhler. Do these approximate approaches come sufficiently close to the optimal solution, with a considerable reduction in computing time? Is it thus worthwhile investigating these approximate methods? This paper shows that approximate techniques are good alternatives to optimal methods.

Machine Translation | 2000

The EuTrans Spoken Language Translation System

Juan Carlos Amengual; Asunción Castaño; Antonio Castellanos; Víctor M. Jiménez; David Llorens; Andrés Marzal; Federico Prat; Juan Miguel Vilar; José-Miguel Benedí; Francisco Casacuberta; Moisés Pastor; Enrique Vidal

The EuTransAll project aims at using example-based approaches for the automatic development of Machine Translation systems accepting text and speech input for limited-domain applications. During the first phase of the project, a speech-translation system that is based on the use of automatically learned subsequential transducers has been built. This paper contains a detailed and mostly self-contained overview of the transducer-learning algorithms and system architecture, along with a new approach for using categories representing words or short phrases in both input and output languages. Experimental results using this approach are reported for a task involving the recognition and translation of sentences in the hotel-receptioncommunication domain, with a vocabulary of 683 words in Spanish. Atranslation word-error rate of 1.97% is achieved in real-timefactor 2.7 on a Personal Computer.

Explore More