Vanessa Queiroz Marinho
University of São Paulo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vanessa Queiroz Marinho.
brazilian conference on intelligent systems | 2016
Vanessa Queiroz Marinho; Graeme Hirst; Diego R. Amancio
Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been applied to study the authorship attribution problem, the use of motifs for text analysis has been restricted to a few works. The goal of this paper is to apply the concept of motifs, recurrent interconnection patterns, in the authorship attribution task. The absolute frequencies of all thirteen directed motifs with three nodes were extracted from the co-occurrence networks and used as classification features. The effectiveness of these features was verified with four machine learning methods. The results show that motifs are able to distinguish the writing style of different authors. In our best scenario, 57.5% of the books were correctly classified. The chance baseline for this problem is 12.5%. In addition, we have found that function words play an important role in these recurrent patterns. Taken together, our findings suggest that motifs should be further explored in other related linguistic tasks.
workshop on graph based methods for natural language processing | 2017
Vanessa Queiroz Marinho; Henrique Ferraz de Arruda; Thales S. Lima; Luciano da Fontoura Costa; Diego R. Amancio
Authorship attribution is a natural language processing task that has been widely studied, often by considering small order statistics. In this paper, we explore a complex network approach to assign the authorship of texts based on their mesoscopic representation, in an attempt to capture the flow of the narrative. Indeed, as reported in this work, such an approach allowed the identification of the dominant narrative structure of the studied authors. This has been achieved due to the ability of the mesoscopic approach to take into account relationships between different, not necessarily adjacent, parts of the text, which is able to capture the story flow. The potential of the proposed approach has been illustrated through principal component analysis, a comparison with the chance baseline method, and network visualization. Such visualizations reveal individual characteristics of the authors, which can be understood as a kind of calligraphy.
Physica A-statistical Mechanics and Its Applications | 2018
Henrique Ferraz de Arruda; Vanessa Queiroz Marinho; Thales S. Lima; Diego R. Amancio; Luciano da Fontoura Costa
Text network analysis has received increasing attention as a consequence of its wide range of applications. In this work, we extend a previous work founded on the study of topological features of mesoscopic networks. Here, the geometrical properties of visualized networks are quantified in terms of several image analysis techniques and used as subsidies for authorship attribution. It was found that the visual features account for performance similar to that achieved by using topological measurements. In addition, the combination of these two types of features improved the performance.
Journal of Complex Networks | 2018
Vanessa Queiroz Marinho; Graeme Hirst; Diego R. Amancio
The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called \emph{labelled motifs}, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled motifs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.
Journal of Complex Networks | 2018
Henrique Ferraz de Arruda; Filipi Nascimento Silva; Vanessa Queiroz Marinho; Diego R. Amancio; Luciano da Fontoura Costa
meeting of the association for computational linguistics | 2017
Edilson Anselmo Corrêa Júnior; Vanessa Queiroz Marinho; Leandro Borges dos Santos
brazilian conference on intelligent systems | 2017
Edilson Anselmo Correa; Vanessa Queiroz Marinho; Leandro Borges dos Santos; Thales Felipe Costa Bertaglia; Marcos Vinícius Treviso; Henrico Bertini Brum
arXiv: Computation and Language | 2016
Henrique Ferraz de Arruda; Filipi Nascimento Silva; Vanessa Queiroz Marinho; Diego R. Amancio; Luciano da Fontoura Costa
Archive | 2018
Henrique Ferraz de Arruda; Vanessa Queiroz Marinho; Luciano Costa; Diego R. Amancio
arXiv: Computation and Language | 2017
Vanessa Queiroz Marinho; Graeme Hirst; Diego R. Amancio