Vanessa Queiroz Marinho

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vanessa Queiroz Marinho is active.

Explore More

Publication

Featured researches published by Vanessa Queiroz Marinho.

brazilian conference on intelligent systems | 2016

Authorship Attribution via Network Motifs Identification

Vanessa Queiroz Marinho; Graeme Hirst; Diego R. Amancio

Concepts and methods of complex networks can be used to analyse texts at their different complexity levels. Examples of natural language processing (NLP) tasks studied via topological analysis of networks are keyword identification, automatic extractive summarization and authorship attribution. Even though a myriad of network measurements have been applied to study the authorship attribution problem, the use of motifs for text analysis has been restricted to a few works. The goal of this paper is to apply the concept of motifs, recurrent interconnection patterns, in the authorship attribution task. The absolute frequencies of all thirteen directed motifs with three nodes were extracted from the co-occurrence networks and used as classification features. The effectiveness of these features was verified with four machine learning methods. The results show that motifs are able to distinguish the writing style of different authors. In our best scenario, 57.5% of the books were correctly classified. The chance baseline for this problem is 12.5%. In addition, we have found that function words play an important role in these recurrent patterns. Taken together, our findings suggest that motifs should be further explored in other related linguistic tasks.

workshop on graph based methods for natural language processing | 2017

On the "Calligraphy" of Books.

Vanessa Queiroz Marinho; Henrique Ferraz de Arruda; Thales S. Lima; Luciano da Fontoura Costa; Diego R. Amancio

Authorship attribution is a natural language processing task that has been widely studied, often by considering small order statistics. In this paper, we explore a complex network approach to assign the authorship of texts based on their mesoscopic representation, in an attempt to capture the flow of the narrative. Indeed, as reported in this work, such an approach allowed the identification of the dominant narrative structure of the studied authors. This has been achieved due to the ability of the mesoscopic approach to take into account relationships between different, not necessarily adjacent, parts of the text, which is able to capture the story flow. The potential of the proposed approach has been illustrated through principal component analysis, a comparison with the chance baseline method, and network visualization. Such visualizations reveal individual characteristics of the authors, which can be understood as a kind of calligraphy.

Physica A-statistical Mechanics and Its Applications | 2018

An image analysis approach to text analytics based on complex networks

Henrique Ferraz de Arruda; Vanessa Queiroz Marinho; Thales S. Lima; Diego R. Amancio; Luciano da Fontoura Costa

Text network analysis has received increasing attention as a consequence of its wide range of applications. In this work, we extend a previous work founded on the study of topological features of mesoscopic networks. Here, the geometrical properties of visualized networks are quantified in terms of several image analysis techniques and used as subsidies for authorship attribution. It was found that the visual features account for performance similar to that achieved by using topological measurements. In addition, the combination of these two types of features improved the performance.

Journal of Complex Networks | 2018

Labelled network subgraphs reveal stylistic subtleties in written texts

Vanessa Queiroz Marinho; Graeme Hirst; Diego R. Amancio

The vast amount of data and increase of computational capacity have allowed the analysis of texts from several perspectives, including the representation of texts as complex networks. Nodes of the network represent the words, and edges represent some relationship, usually word co-occurrence. Even though networked representations have been applied to study some tasks, such approaches are not usually combined with traditional models relying upon statistical paradigms. Because networked models are able to grasp textual patterns, we devised a hybrid classifier, called \emph{labelled motifs}, that combines the frequency of common words with small structures found in the topology of the network, known as motifs. Our approach is illustrated in two contexts, authorship attribution and translationese identification. In the former, a set of novels written by different authors is analyzed. To identify translationese, texts from the Canadian Hansard and the European parliament were classified as to original and translated instances. Our results suggest that labelled motifs are able to represent texts and it should be further explored in other tasks, such as the analysis of text complexity, language proficiency, and machine translation.

Journal of Complex Networks | 2018

Representation of texts as complex networks: a mesoscopic approach

Henrique Ferraz de Arruda; Filipi Nascimento Silva; Vanessa Queiroz Marinho; Diego R. Amancio; Luciano da Fontoura Costa

meeting of the association for computational linguistics | 2017

NILC-USP at SemEval-2017 Task 4: A Multi-view Ensemble for Twitter Sentiment Analysis.

Edilson Anselmo Corrêa Júnior; Vanessa Queiroz Marinho; Leandro Borges dos Santos

brazilian conference on intelligent systems | 2017

PELESent: Cross-Domain Polarity Classification Using Distant Supervision

Edilson Anselmo Correa; Vanessa Queiroz Marinho; Leandro Borges dos Santos; Thales Felipe Costa Bertaglia; Marcos Vinícius Treviso; Henrico Bertini Brum

arXiv: Computation and Language | 2016