Jorge Graña | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jorge Graña is active.

Explore More

Publication

Featured researches published by Jorge Graña.

international conference on computational linguistics | 2002

Formal Methods of Tokenization for Part-of-Speech Tagging

Jorge Graña; Francisco-Mario Barcala; Jesús Vilares Ferro

One of the most important prior tasks for robust part-of-speech tagging is the correct tokenization or segmentation of the texts. This task can involve processes which are much more complex than the simple identification of the different sentences in the text and each of their individual components, but it is often obviated in many current applications.Nevertheless, this preprocessing step is an indispensable task in practice, and it is particularly difficult to tackle it with scientific precision without falling repeatedly in the analysis of the specific casuistry of every phenomenon detected.In this work, we have developed a scheme of preprocessing oriented towards the disambiguation and robust tagging of Galician. Nevertheless, it is a proposal of a general architecture that can be applied to other languages, such as Spanish, with very slight modifications.

database and expert systems applications | 2002

Tokenization and proper noun recognition for information retrieval

F.M. Barcala; Jesús Vilares; Miguel A. Alonso; Jorge Graña; Manuel Vilares

In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns.

international conference on implementation and application of automata | 2001

Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries

Jorge Graña; Francisco-Mario Barcala; Miguel A. Alonso

We present a reflection on the evolution of the different methods for constructing minimal deterministic acyclic finite-state automata from a finite set of words. We outline the most important methods, including the traditional ones (which consist of the combination of two phases: insertion of words and minimization of the partial automaton) and the incremental algorithms (which add new words one by one and minimize the resulting automaton on-the-fly, being much faster and having significantly lower memory requirements). We analyze their main features in order to provide some improvements for incremental constructions, and a general architecture that is needed to implement large dictionaries in natural language processing (NLP) applications.

computer aided systems theory | 2007

Contextual spelling correction

Juan Otero; Jorge Graña; Manuel Vilares

Spelling correction is commonly a critical task for a variety of NLP tools. Some systems assist users by offering a set of possible corrections for a given misspelt word. An automatic spelling correction system would be able to choose only one or, at least, to rank them according to a certain criterion. We present a dynamic framework which allows us to combine spelling correction and Part-of-Speech tagging tasks in an efficient way. The result is a system capable of ranking the set of possible corrections taking the context of the erroneous words into account.

string processing and information retrieval | 2004

On Asymptotic Finite-State Error Repair

Manuel Vilares; Juan Otero; Jorge Graña

A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.A major issue when defining the efficiency of a spelling corrector is how far we need to examine the input string to validate the repairs. We claim that regional techniques provide a performance and quality comparable to that attained by global criteria, with a significant saving in time and space.

international conference on implementation and application of automata | 2004

Regional finite-state error repair

Manuel Vilares; Juan Otero; Jorge Graña

We describe an algorithm to deal with error repair over finite-state architectures. Such a technique is of interest in spelling correction as well as approximate string matching in a variety of applications related to natural language processing, such as information extraction/recovery or answer searching, where error-tolerant recognition allows misspelled input words to be integrated in the computational process. Our proposal relies on a regional least-cost repair strategy, dynamically gathering all relevant information in the context of the error location. The system guarantees asymptotic equivalence with global repair strategies.

international conference on implementation and application of automata | 2002

Compilation of constraint-based contextual rules for part-of-speech tagging into finite state transducers

Jorge Graña; Gloria Andrade; Jesús Vilares

With the aim of removing the residuary errors made by pure stochastic disambiguation models, we put forward a hybrid system in which linguist users introduce high level contextual rules to be applied in combination with a tagger based on a Hidden Markov Model. The design of these rules is inspired in the Constraint Grammars formalism. In the present work, we review this formalism in order to propose a more intuitive syntax and semantics for rules, and we develop a strategy to compile the rules under the form of Finite State Transducers, thus guaranteeing an efficient execution framework.

international conference on computational linguistics | 2001

Stochastic Parsing and Parallelism

Francisco-Mario Barcala; Oscar Sacristán; Jorge Graña

This work was partially supported by the European Union under FEDER project 1FD97-0047-C04-02, and by the Autonomous Government of Galicia under project PGIDT99XI1

international conference on computational linguistics | 2005

Regional versus global finite-state error repair

Manuel Vilares; Juan Otero; Jorge Graña

We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finite-state architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind.

computer aided systems theory | 2005

Spelling correction on technical documents

Manuel Vilares; Juan Otero; Jorge Graña

We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.We describe a novel approach to spelling correction applied on technical documents, a task that requires a number of especific properties such as efficiency, safety and maintenance. In opposite to previous works, we explore the region close to the point at which the recognition halts, gathering all relevant information for the repair process in order to avoid the phenomenom of errors in cascade. Our approach seems to reach the same quality provided by the most performance classic techniques, but with a significant reduction on both time and space costs.

Explore More