Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ariani Di Felippo is active.

Publication


Featured researches published by Ariani Di Felippo.


Journal of the Brazilian Computer Society | 2014

A survey of automatic term extraction for Brazilian Portuguese

Merley da Silva Conrado; Ariani Di Felippo; Thiago Alexandre Salgueiro Pardo; Solange Oliveira Rezende

BackgroundTerm extraction is highly relevant as it is the basis for several tasks, such as the building of dictionaries, taxonomies, and ontologies, as well as the translation and organization of text data.Methods and ResultsIn this paper, we present a survey of the state of the art in automatic term extraction (ATE) for the Brazilian Portuguese language. In this sense, the main contributions and projects related to such task have been classified according to the knowledge they use: statistical, linguistic, and hybrid (statistical and linguistic). We also present a study/review of the corpora used in the term extraction in Brazilian Portuguese, as well as a geographic mapping of Brazil regarding such contributions, projects, and corpora, considering their origins.ConclusionsIn spite of the importance of the ATE, there are still several gaps to be filled, for instance, the lack of consensus regarding the formal definition of meaning of ‘term’. Such gaps are larger for the Brazilian Portuguese when compared to other languages, such as English, Spanish, and French. Examples of gaps for Brazilian Portuguese include the lack of a baseline ATE system, as well as the use of more sophisticated linguistic information, such as the WordNet and Wikipedia knowledge bases. Nevertheless, there is an increase in the number of contributions related to ATE and an interesting tendency to use contrasting corpora and domain stoplists, even though most contributions only use frequency, noun phrases, and morphosyntactic patterns.


processing of the portuguese language | 2014

Alignment-Based Sentence Position Policy in a News Corpus for Multi-document Summarization

Fernando Antônio Asevedo Nóbrega; Verônica Agostini; Renata T. Camargo; Ariani Di Felippo; Thiago Alexandre Salgueiro Pardo

This paper presents an empirical investigation of sentence position relevance in a corpus of news texts for generating abstractive multi-document summaries. Differently from previous work, we propose to use text-summary alignment information to compute sentence relevance.


linguistic annotation workshop | 2015

A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects

Roque López; Thiago Alexandre Salgueiro Pardo; Lucas Avanço; Pedro Paulo Balage Filho; Alessandro Y. Bokan; Paula Christina Figueira Cardoso; Márcio de Souza Dias; Fernando Antônio Asevedo Nóbrega; Marco Antonio Sobrevilla Cabezudo; Jackson Wilke da Cruz Souza; Andressa Zacarias; Eloize Rossi Marques Seno; Ariani Di Felippo

Aspect-based opinion summarization is the task of automatically generating a summary for some aspects of a specific topic from a set of opinions. In most cases, to evaluate the quality of the automatic summaries, it is necessary to have a reference corpus of human summaries to analyze how similar they are. The scarcity of corpora in that task has been a limiting factor for many research works. In this paper, we introduce OpiSums-PT, a corpus of extractive and abstractive summaries of opinions written in Brazilian Portuguese. We use this corpus to analyze how similar human summaries are and how people take into account the issues of aspect coverage and sentiment orientation to generate manual summaries. The results of these analyses show that human summaries are diversified and people generate summaries only for some aspects, keeping the overall sentiment orientation with little variation.


brazilian symposium on multimedia and the web | 2008

OntoMethodus: a methodology to build domain-specific ontologies and its use in a system to support the generation of terminographic products

Ariani Di Felippo; Sandra Maria Aluísio; Leandro H. M. de Oliveira; Gladis Maria Barcellos Almeida

Given the importance of domain ontologies for developing terminographic products, we propose a seven-step methodology -- OntoMethodus -- to build ontologies especially from unstructured sources. Finally, we present e-Termos, an ongoing project to develop an environment to support generation of terminographic products in Brazilian Portuguese which uses the OntoMethodus.


Alfa: Revista de Linguística (São José do Rio Preto) | 2018

CARACTERIZAÇÃO DA COMPLEMENTARIDADE TEMPORAL: SUBSÍDIOS PARA SUMARIZAÇÃO AUTOMÁTICA MULTIDOCUMENTO

Jackson Wilke da Cruz Souza; Ariani Di Felippo

Complementarity is a usual multi-document phenomenon that commonly occurs among news texts about the same event. From a set of sentence pairs (in Portuguese) manually annotated with CST (Cross-Document Structure Theory) relations (Historical background and Follow-up) that make explicit the temporal complementary among the sentences, we identified a potential set of linguistic attributes of such complementary. Using Machine Learning algorithms, we evaluate the capacity of the attributes to discriminate between Historical background and Follow-up. JRip learned a small set of rules with high accuracy. Based on a set of 5 rules, the classifier discriminates the CST relations with 80% of accuracy. According to the rules, the occurrence of temporal expression in sentence 2 is the most discriminative feature in the task. As a contribution, the JRip classifier can improve the performance of the CST-discourse parsers for Portuguese.


Revista de Estudos da Linguagem | 2017

Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)

Matheus Rigobelo Chaud; Ariani Di Felippo

Abstract : Multilingual Multi-Document Summarization aims at ranking the sentences of a cluster with (at least) 2 news texts (1 in the user’s language and 1 in a foreign language), and select the top-ranked sentences for a summary in the user’s language. We explored three concept-based statistics and one superficial strategy for sentence ranking. We used a bilingual corpus (Brazilian Portuguese-English) encoded in UNL ( Universal Network Language ) with source and summary sentences aligned based on content overlap. Our experiment shows that “concept frequency normalized by the number of concepts in the sentence” is the measure that best ranks the sentences selected by humans. However, it does not outperform the superficial strategy based on the position of the sentences in the texts. This indicates that the most frequent concepts are not always contained in first sentences, usually selected by humans to build the summaries because they convey the main information of the collection. Keywords: content selection; concept; statistical measure; multilingual corpus; multi-document summarization. Keywords : content selection; concept; statistical measure; multilingual corpus; multi-document summarization. Resumo : O objetivo da Sumarizacao Automatica Multilingue Multidocumento e ranquear as sentencas de uma colecao com ao menos duas noticias (1 na lingua do usuario e 1 em lingua estrangeira) e selecionar as mais bem pontuadas para compor um sumario na lingua do usuario. Exploramos tres estatisticas conceituais e uma estrategia superficial para criar um ranque das sentencas quanto a relevância. Para tanto, utilizamos um corpus bilingue (portugues-ingles) anotado via UNL (Universal Network Language) e com textos-fonte e sumarios alinhados em nivel sentencial. A avaliacao indica que a estatistica denominada frequencia de conceitos normalizada pelo numero de conceitos da sentenca e a que melhor reproduz o ranqueamento humano. Essa medida, entretanto, nao supera a estrategia superficial baseada na posicao das sentencas. Isso indica que os conceitos mais frequentes do cluster nem sempre estao contidos nas primeiras sentencas dos textosfonte, usualmente selecionadas pelos humanos para compor os sumarios porque veiculam a informacao principal da colecao. Palavras-chave : selecao de conteudo; conceito; medida estatistica; corpus multilingue; sumarizacao multidocumento.


processing of the portuguese language | 2016

Applying Lexical-Conceptual Knowledge for Multilingual Multi-document Summarization

Ariani Di Felippo; Fabrício E. S. Tosta; Thiago Alexandre Salgueiro Pardo

We define Multilingual Multi-Document Summarization (MMDS) as the process of identifying the main information of a cluster with (at least) two texts, one in the user’s language and one in a foreign language, and presenting it as a summary in the user’s language. Although it is a relevant task due to the increasing amount of on-line information in different languages, there are only baselines for (Brazilian) Portuguese, which apply machine-translation to obtain a monolingual input and superficial features for sentence extraction. We report our investigation on the application of conceptual frequency measure to build a summary in Portuguese from a bilingual cluster (Portuguese and English). The methods tackle two additional challenges: using Princeton WordNet for nouns annotation and applying MT to translate selected sentences in English to Portuguese. The experiments were performed using a corpus of 20 clusters, and show that lexical-conceptual knowledge improves the linguistic quality and informativeness of extracts.


meeting of the association for computational linguistics | 2016

Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments.

Ariani Di Felippo; Ani Nenkova

Content can be expressed at different levels of specificity, varying the amount of detail presented to the reader. The need to transform specific content into more general form naturally arises in summarization, where people and machines need to convey the gist of a text within imposed space constraints. Completely removing sentences and phrases is one way to reduce the level of detail. The bulk of work on summarization content selection and compression deal with these tasks. In this paper, we present a corpus study on a more subtle and understudied phenomenon: noun phrase generalization. Based on multi-document news and abstract alignments at the phrase level, we arrive at a five category classification scheme and find that the most common category requires semantic interpretation and inference. The others rely on lexical substitution or deletion of details from the original expression. We provide a systematic analysis, elucidating the capabilities needed for automating the generation of more general or more specific references.


Revista de Estudos da Linguagem | 2005

Representação formal dos adjetivos valenciais com vistas ao Processamento Automático do Português

Ariani Di Felippo; Bento Bento Carlos Dias-da Silva

Considering urgent the demand for the construction oflinguistically-motivated lexicons for Natural Language Processingsystems, this paper presents a linguistic-computationalrepresentation for the Brazilian Portuguese adjectives that projecta valence. In order to propose this representation, the microstructureof the mental lexicon and the conception of lexical item areinvestigated from a (psycho)linguistic perspective. In addition,(i) the main properties of the valence adjectives are describedand (ii) some of the main lexical-grammatical representationformalisms are investigated. Finally, a lexical entry template forthese adjectives based on the features structures or attributevaluematrix formalism is proposed.


Archive | 2011

CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese

Paula Christina Figueira Cardoso; Erick Galani Maziero; Maria Lucía; R. Castro Jorge; Ariani Di Felippo; Lucia Helena Machado Rino; Maria das Graças; Volpe Nunes; Thiago Alexandre Salgueiro Pardo; Rodovia Washington Luís

Collaboration


Dive into the Ariani Di Felippo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jackson Wilke da Cruz Souza

Federal University of São Carlos

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Renata T. Camargo

Federal University of São Carlos

View shared research outputs
Researchain Logo
Decentralizing Knowledge