Erick Rocha Fonseca | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erick Rocha Fonseca is active.

Explore More

Publication

Featured researches published by Erick Rocha Fonseca.

Journal of the Brazilian Computer Society | 2015

Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese

Erick Rocha Fonseca; João Luís Garcia Rosa; Sandra Maria Aluísio

BackgroundPart-of-speech tagging is an important preprocessing step in many natural language processing applications. Despite much work already carried out in this field, there is still room for improvement, especially in Portuguese. We experiment here with an architecture based on neural networks and word embeddings, and that has achieved promising results in English.MethodsWe tested our classifier in different corpora: a new revision of the Mac-Morpho corpus, in which we merged some tags and performed corrections and two previous versions of it. We evaluate the impact of using different types of word embeddings and explicit features as input.ResultsWe compare our tagger’s performance with other systems and achieve state-of-the-art results in the new corpus. We show how different methods for generating word embeddings and additional features differ in accuracy.ConclusionsThe work reported here contributes with a new revision of the Mac-Morpho corpus and a state-of-the-art new tagger available for use out-of-the-box.

international symposium on neural networks | 2013

A two-step convolutional neural network approach for semantic role labeling

Erick Rocha Fonseca; João Luís Garcia Rosa

Semantic role labeling (SRL) is a well known task in Natural Language Processing, consisting of identifying and labeling verbal arguments. It has been widely studied in English, but scarcely explored in other languages. In this paper, we employ a two-step convolutional neural architecture to label semantic arguments in Brazilian Portuguese texts, and avoid the use of external NLP tools. We achieve an F1 score of 62.2, which, although considerably lower than the state-of-the-art for English, seems promising considering the available resources. Also, dividing the process into two easier subtasks makes it more feasible to further improve performance through semi-supervised learning. Our system is available online and ready to be used out of the box to label new texts.

north american chapter of the association for computational linguistics | 2015

A Deep Architecture for Non-Projective Dependency Parsing

Erick Rocha Fonseca; Sandra Maria Aluísio

Graph-based dependency parsing algorithms commonly employ features up to third order in an attempt to capture richer syntactic relations. However, each level and each feature combination must be defined manually. Besides that, input features are usually represented as huge, sparse binary vectors, offering limited generalization. In this work, we present a deep architecture for dependency parsing based on a convolutional neural network. It can examine the whole sentence structure before scoring each head/modifier candidate pair, and uses dense embeddings as input. Our model is still under ongoing work, achieving 91.6% unlabeled attachment score in the Penn Treebank.

processing of the portuguese language | 2012

An architecture for semantic role labeling on portuguese

Erick Rocha Fonseca; João Luís Garcia Rosa

We present an adaptation of the architeture of the system SENNA, which performs various NLP tasks, to Portuguese, considering the richly inflected morphology of the language. We propose to separate words in lemmas and their flexional attributes. We point out the major problems that could arise from this approach as well as their potential solutions. This architecture can greatly benefit from the use of unlabeled data, which is especially good considering the small amounts of labeled resources in Portuguese.

processing of the portuguese language | 2016

Improving POS Tagging Across Portuguese Variants with Word Embeddings

Erick Rocha Fonseca; Sandra Maria Aluísio

Brazilian Portuguese (BP) and European Portuguese (EP) have specific NLP resources and tools for many tasks. It is generally agreed upon that applying them to the variant other than their intended one results in a performance drop; however, very little research has measured it. We evaluated a POS tagger in a cross-variant setting under multiple combinations of word embeddings, train and test corpora, and found that (i) BP is easier than EP, (ii) word embeddings help increase tagger performance significantly, but not enough to close the accuracy gap in a cross-variant setting and (iii) embeddings generated from a corpus with both variants are useful in cross-variant scenarios. While we cannot generalize observations from POS tagging to any NLP task, this is an important first step for such evaluations.

processing of the portuguese language | 2018

Syntactic Knowledge for Natural Language Inference in Portuguese.

Erick Rocha Fonseca; Sandra Maria Aluísio

Natural Language Inference (NLI) is the task of detecting relations such as entailment, contradiction and paraphrase in pairs of sentences. With the recent release of the ASSIN corpus, NLI in Portuguese is now getting more attention. However, published results on ASSIN have not explored syntactic structure, neither combined word embedding metrics with other types of features. In this work, we sought to remedy this gap, proposing a new model for NLI that achieves 0.72 F\(_1\) score on ASSIN, setting a new state of the art. Our feature analysis shows that word embeddings and syntactic knowledge are both important to achieve such results.

processing of the portuguese language | 2016

Evaluating Phonetic Spellers for User-Generated Content in Brazilian Portuguese

Gustavo Augusto de Mendonça Almeida; Lucas Avanço; Magali Sanches Duran; Erick Rocha Fonseca; Maria das Graças Volpe Nunes; Sandra Maria Aluísio

Recently, spell checking (or spelling correction systems) has regained attention due to the need of normalizing user-generated content (UGC) on the web. UGC presents new challenges to spellers, as its register is much more informal and contains much more variability than traditional spelling correction systems can handle. This paper proposes two new approaches to deal with spelling correction of UGC in Brazilian Portuguese (BP), both of which take into account phonetic errors. The first approach is based on three phonetic modules running in a pipeline. The second one is based on machine learning, with soft decision making, and considers context-sensitive misspellings. We compared our methods with others on a human annotated UGC corpus of reviews of products. The machine learning approach surpassed all other methods, with 78.0 % correction rate, very low false positive (0.7 %) and false negative rate (21.9 %).

STIL | 2013