Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aline Villavicencio is active.

Publication


Featured researches published by Aline Villavicencio.


international conference on computational linguistics | 2002

Extracting the unextractable: a case study on verb-particles

Timothy Baldwin; Aline Villavicencio

This paper proposes a series of techniques for extracting English verb--particle constructions from raw text corpora. We initially propose three basic methods, based on tagger output, chunker output and a chunk grammar, respectively, with the chunk grammar method optionally combining with an attachment resolution module to determine the syntactic structure of verb--preposition pairs in ambiguous constructs. We then combine the three methods together into a single classifier, and add in a number of extra lexical and frequentistic features, producing a final F-score of 0.865 over the WSJ.


Computational Linguistics | 2009

Prepositions in applications: A survey and introduction to the special issue

Timothy Baldwin; Valia Kordoni; Aline Villavicencio

Prepositions1—as well as prepositional phrases (PPs) and markers of various sorts— have a mixed history in computational linguistics (CL), as well as related fields such as artificial intelligence, information retrieval (IR), and computational psycholinguistics: On the one hand they have been championed as being vital to precise language understanding (e.g., in information extraction), and on the other they have been ignored on the grounds of being syntactically promiscuous and semantically vacuous, and relegated to the ignominious rank of “stop word” (e.g., in text classification and IR). Although NLP in general has benefitted from advances in those areas where prepositions have received attention, there are still many issues to be addressed. For example, in machine translation, generating a preposition (or “case marker” in languages such as Japanese) incorrectly in the target language can lead to critical semantic divergences over the source language string. Equivalently in information retrieval and information extraction, it would seem desirable to be able to predict that book on NLP and book about NLPmean largely the same thing, but paranoid about drugs and paranoid on drugs suggest very different things. Prepositions are often among the most frequent words in a language. For example, based on the British National Corpus (BNC; Burnard 2000), four out of the top-ten most-frequent words in English are prepositions (of, to, in, and for). In terms of both parsing and generation, therefore, accurate models of preposition usage are essential to avoid repeatedly making errors. Despite their frequency, however, they are notoriously difficult to master, even for humans (Chodorow, Tetreault, and Han 2007). For example, Lindstromberg (2001) estimates that less than 10% of upper-level English as a Second


meeting of the association for computational linguistics | 2003

Verb-Particle Constructions and Lexical Resources

Aline Villavicencio

In this paper we investigate the phenomenon of verb-particle constructions, discussing their characteristics and their availability for use with NLP systems. We concentrate in particular on the coverage provided by some electronic resources. Given the constantly growing number of verb-particle combinations, possible ways of extending the coverage of the available resources are investigated, taking into account regular patterns found in some productive combinations of verbs and particles. We discuss, in particular, the use of Levins (1993) classes of verbs as a means to obtain productive verb-particle constructions, and discuss the issues involved in adopting such an approach.


Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties | 2006

Automated Multiword Expression Prediction for Grammar Engineering

Yi Zhang; Valia Kordoni; Aline Villavicencio; Marco Idiart

However large a hand-crafted wide-coverage grammar is, there are always going to be words and constructions that are not included in it and are going to cause parse failure. Due to their heterogeneous and flexible nature, Multiword Expressions (MWEs) provide an endless source of parse failures. As the number of such expressions in a speakers lexicon is equiparable to the number of single word units (Jackendoff, 1997), one major challenge for robust natural language processing systems is to be able to deal with MWEs. In this paper we propose to semi-automatically detect MWE candidates in texts using some error mining techniques and validating them using a combination of the World Wide Web as a corpus and some statistical measures. For the remaining candidates possible lexico-syntactic types are predicted, and they are subsequently added to the grammar as new lexical entries. This approach provides a significant increase in the coverage of these expressions.


Computer Speech & Language | 2005

Editorial: Introduction to the special issue on multiword expressions: Having a crack at a hard nut

Aline Villavicencio; Francis Bond; Anna Korhonen; Diana McCarthy

Multiword expressions are an integral part of language. Their heterogeneous characteristics have proved a challenge to both linguistic and computational analysis. Their importance to language technology has long been recognised. In this special issue we include ten papers which propose a variety of approaches for finding and handling these expressions, both for building general purpose lexical resources and in the context of specific applications. In this introduction we give a brief summary of what multiword expressions are, the challenges that they pose and some open areas of research. We then highlight the contributions that the ten papers make to these areas.


language resources and evaluation | 2010

Alignment-based extraction of multiword expressions

Helena Medeiros de Caseli; Carlos Ramisch; Maria das Graças Volpe Nunes; Aline Villavicencio

Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.


MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing | 2004

Lexical encoding of MWEs

Aline Villavicencio; Ann A. Copestake; Benjamin Waldron; Fabre Lambeau

Multiword Expressions present a challenge for language technology, given their flexible nature. Each type of multiword expression has its own characteristics, and providing a uniform lexical encoding for them is a difficult task to undertake. Nonetheless, in this paper we present an architecture for the lexical encoding of these expressions in a database, that takes into account their flexibility. This encoding extends in a straightforward manner the one required for simplex (single) words, and maximises the information contained for them in the description of multiwords.


Hippocampus | 2012

Alternating predictive and short-term memory modes of entorhinal grid cells.

Licurgo de Almeida; Marco Idiart; Aline Villavicencio; John E. Lisman

Several lines of evidence indicate that the entorhinal cortex has memory functions, but such functions have not been previously found in grid cells, a cell type that provides major input to the hippocampus. We examined the firing of grid cells as rats crossed (runs) through grid cell vertices. We found that on some runs, firing tended to occur mostly inbound as the rat approached a vertex center while on other runs firing occurred mainly outbound. These results suggest that cells have a predictive mode (inbound firing) in which they represent a position ahead of the animal and a short term memory (STM) mode (outbound firing) in which they represent positions just passed through. Analysis of cell pairs showed that when vertex crossings were less than 1 second apart, the two cells tended to have the same mode. This indicates that modes are a network property. The tendency to have the same mode disappeared if crossings were separated by 2‐3 seconds, suggesting that modes alternate on the time scale of seconds. There was a small but statistically significant behavioral correlate of modes: velocity was slightly less in the STM mode. Both modes were organized by theta and gamma oscillations. The results suggest that the dual requirement for hippocampal storage and recall is met by rapidly alternating modes appropriate for predicting the future and storing the recent past.


Computer Speech & Language | 2005

The availability of verb-particle constructions in lexical resources: How much is enough?

Aline Villavicencio

In this paper, we investigate the phenomenon of verb-particle constructions, discussing their characteristics and availability in some lexical resources. Given the limited coverage provided by these resources and the constantly growing number of verb-particle combinations, possible ways of extending their coverage are investigated, taking into account regular patterns found in some productive combinations of verbs and particles. We propose, in particular, the use of a semantic classification of verbs (such as that defined by [English verb classes and alternations - a preliminary investigation, The University of Chicago Press]) as a means to obtain productive verb-particle constructions and the use of the World Wide Web to validate them, and discuss the issues involved in adopting such an approach.


Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications | 2009

Statistically-Driven Alignment-Based Multiword Expression Identification for Technical Domains

Helena de Medeiros Caseli; Aline Villavicencio; André Machado; Maria José Bocorny Finatto

Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. Particularly, the lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors. This is especially problematic in technical domains, where a significant portion of the vocabulary is composed of MWEs. This paper investigates the use of a statistically-driven alignment-based approach to the identification of MWEs in technical corpora. We look at the use of several sources of data, including parallel corpora, using English and Portuguese data from a corpus of Pediatrics, and examining how a second language can provide relevant cues for this tasks. We report results obtained by a combination of statistical measures and linguistic information, and compare these to the reported in the literature. Such an approach to the (semi-)automatic identification of MWEs can considerably speed up lexicographic work, providing a more targeted list of MWE candidates.

Collaboration


Dive into the Aline Villavicencio's collaboration.

Top Co-Authors

Avatar

Carlos Ramisch

Universidade Federal do Rio Grande do Sul

View shared research outputs
Top Co-Authors

Avatar

Rodrigo Wilkens

Universidade Federal do Rio Grande do Sul

View shared research outputs
Top Co-Authors

Avatar

Marco Idiart

Universidade Federal do Rio Grande do Sul

View shared research outputs
Top Co-Authors

Avatar

Leonardo Zilio

Universidade Federal do Rio Grande do Sul

View shared research outputs
Top Co-Authors

Avatar

Maria José Bocorny Finatto

Universidade Federal do Rio Grande do Sul

View shared research outputs
Top Co-Authors

Avatar

Thierry Poibeau

École Normale Supérieure

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Helena de Medeiros Caseli

Federal University of São Carlos

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge