Is this you? Create Your Porfile

Ferran Pla

Polytechnic University of Valencia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ferran Pla is active.

Explore More

Publication

Featured researches published by Ferran Pla.

international conference on computational linguistics | 2000

Tagging and chunking with bigrams

Ferran Pla; Antonio Molina; Natividad Prieto

In this paper we present an integrated system for tagging and chunking texts from a certain language. The approach is based on stochastic finite-state models that are learnt automatically. This includes biagram models or finite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, this is a very flexible and portable system.In order to show the viability of our approach we present results for tagging and chunking using bigram models on the Wall Street Journal corpus. We have achieved an accuracy rate for tagging of 96.8%, and a precision rate for NP chunks of 94.6% with a recall rate of 93.6%.

Natural Language Engineering | 2004

Improving part-of-speech tagging using lexicalized HMMs

Ferran Pla; Antonio Molina

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.

applications of natural language to data bases | 2007

Biomedical named entity recognition: a poor knowledge HMM-based approach

Natalia Ponomareva; Ferran Pla; Antonio Molina; Paolo Rosso

With a recent quick development of a molecular biology domain it becomes indispensable to promote different resources as databases and ontologies that represent the formal knowledge of the domain. As these resources have to be permanently updated, due to a constant appearance of new data, the Information Extraction (IE) methods become very useful. Named Entity Recognition (NER), that is considered to be the easiest task of IE, still remains very challenging in molecular biology domain because of the special phenomena of biomedical entities. In this paper we present our Hidden Markov Model (HMM)-based biomedical NER system that takes into account only parts-of-speech as an additional feature, which are used both to tackle the problem of nonuniform distribution among biomedical entity classes and to provide the system with an additional information about entity boundaries. Our system, in spite of its poor knowledge, has proved to obtain better results than some of the state-of-the-art systems that employ a greater number of features.

international conference on computational linguistics | 2003

Automatic noun sense disambiguation

Paolo Rosso; Francesco Masulli; Davide Buscaldi; Ferran Pla; Antonio Molina

This paper explores a fully automatic knowledge-based method which performs the noun sense disambiguation relying only on the WordNet ontology. The basis of the method is the idea of conceptual density, that is, the correlation between the sense of a given word and its context. A new formula for calculating the conceptual density was proposed and was evaluated on the SemCor corpus.

text speech and dialogue | 2001

Language Understanding Using Two-Level Stochastic Models with POS and Semantic Units

Ferran Pla; Antonio Molina; Emilio Sanchis; Encarna Segarra; Fernando García

Over the last few years, stochastic models have been widely used in the natural language understanding modeling. Almost all of these works are based on the definition of segments of words as basic semantic units for the stochastic semantic models. In this work, we present a two-level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries. This approach will treat this problem in a way similar to the stochastic approach for the detection of syntactic structures (Shallow Parsing or Chunking) in natural language sentences; however, in this case, stochastic semantic language models are based on the detection of some semantic units from the user turns of the dialog. We give the results of the application of this approach to the construction of the understanding component of a dialog system, which answers queries about a railway timetable in Spanish.

conference on computational natural language learning | 2000

Improving chunking by means of lexical-contextual information in statistical language models

Ferran Pla; Antonio Molina; Natividad Prieto

In this work, we present a stochastic approach to shallow parsing. Most of the current approaches to shallow parsing have a common characteristic: they take the sequence of lexical tags proposed by a POS tagger as input for the chunking process. Our system produces tagging and chunking in a single process using an Integrated Language Model (ILM) formalized as Markov Models. This model integrates several knowledge sources: lexical probabilities, a contextual Language Model (LM) for every chunk, and a contextual LM for the sentences. We have extended the ILM by adding lexical information to the contextual LMs. We have applied this approach to the CoNLL-2000 shared task improving the performance of the chunker.

conference on intelligent text processing and computational linguistics | 2004

Information Retrieval and Text Categorization with Semantic Indexing

Paolo Rosso; Antonio Molina; Ferran Pla; Daniel Jiménez; Vicente Vidal

In this paper, we present the effect of the semantic indexing using WordNet senses on the Information Retrieval (IR) and Text Categorization (TC) tasks. The documents have been sense-tagged using a Word Sense Disambiguation (WSD) system based on Specialized Hidden Markov Models (SHMMs). The preliminary results showed that a small improvement of the performance was obtained only in the TC task.

applications of natural language to data bases | 2014

Sentiment Analysis in Twitter for Spanish

Ferran Pla; Lluís F. Hurtado

This paper describes a SVM-approach for Sentiment Analysis (SA) in Twitter for Spanish. This task was part of the TASS2013 workshop, which is a framework for SA that is focused on the Spanish language. We describe the approach used, and we present an experimental comparison of the approaches presented by the different teams that took part in the competition. We also describe the improvements that were added to our system after our participation in the competition. With these improvements, we obtained an accuracy of 62.88% and 70.25% on the SA test set for 5-level and 3-level tasks respectively. To our knowledge, these results are the best results published until now for the SA tasks of the TASS2013 workshop.

text speech and dialogue | 2000

An Integrated Statistical Model for Tagging and Chunking Unrestricted Text

Ferran Pla; Antonio Molina; Natividad Prieto

In this paper, we present a corpus-based approach for tagging and chunking. The formalism used is based on stochastic finite-state automata. Therefore, it can include n-grams models or any stochastic finite-state automata learnt using grammatical inference techniques. As the models involved in our system are learnt automatically, it allows for a very flexible and portable system for different languages and chunk definitions. In order to show the viability of our approach, we present results for tagging and chunking using different combinations of bigrams and other more complex automata learnt by means of the Error Correcting Grammatical Inference (ECGI) algorithm. The experimentation was carried out on the Wall Street Journal corpus for English and on the Lexesp corpus for Spanish.

international conference on computational linguistics | 2006

Verb sense disambiguation using support vector machines: impact of wordnet-extracted features

Davide Buscaldi; Paolo Rosso; Ferran Pla; Encarna Segarra; Emilio Sanchis Arnal

The disambiguation of verbs is usually considered to be more difficult with respect to other part-of-speech categories. This is due both to the high polysemy of verbs compared with the other categories, and to the lack of lexical resources providing relations between verbs and nouns. One of such resources is WordNet, which provides plenty of information and relationships for nouns, whereas it is less comprehensive with respect to verbs. In this paper we focus on the disambiguation of verbs by means of Support Vector Machines and the use of WordNet-extracted features, based on the hyperonyms of context nouns.

Explore More