Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Osvaldo Novais Oliveira is active.

Publication


Featured researches published by Osvaldo Novais Oliveira.


Information Sciences | 2009

A complex network approach to text summarization

Lucas Antiqueira; Osvaldo Novais Oliveira; Luciano da Fontoura Costa; Maria das Graças Volpe Nunes

Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features.


Physica A-statistical Mechanics and Its Applications | 2007

Strong correlations between text quality and complex networks features

L. Antiqueira; Maria das Graças Volpe Nunes; Osvaldo Novais Oliveira; L. da F. Costa

Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks (word adjacency model), from which typical network features such as the in/outdegree, clustering coefficient and shortest path were obtained. Another metric was derived from the dynamics of the network growth, based on the variation of the number of connected components. The scores assigned by the human judges according to three text quality criteria (coherence and cohesion, adherence to standard writing conventions and theme adequacy/development) were correlated with the network measurements. Text quality for all three criteria was found to decrease with increasing average values of outdegrees, clustering coefficient and deviation from the dynamics of network growth. Among the criteria employed, cohesion and coherence showed the strongest correlation, which probably indicates that the network measurements are able to capture how the text is developed in terms of the concepts represented by the nodes in the networks. Though based on a particular set of texts and specific language, the results presented here point to potential applications in other instances of text analysis.


International Journal of Modern Physics C | 2008

COMPLEX NETWORKS ANALYSIS OF MANUAL AND MACHINE TRANSLATIONS

Diego R. Amancio; Lucas Antiqueira; Thiago Alexandre Salgueiro Pardo; Luciano da Fontoura Costa; Osvaldo Novais Oliveira; Maria das Graças Volpe Nunes

Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by MT tools can be distinguished from their manual counterparts by means of metrics such as in- (ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better MT tools and automatic evaluation metrics.


New Journal of Physics | 2011

Comparing intermittency and network measurements of words and their dependence on authorship

Diego R. Amancio; Eduardo G. Altmann; Osvaldo Novais Oliveira; Luciano da Fontoura Costa

Many features of texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper, we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books by eight authors who lived in the nineteenth and twentieth centuries, for which the following network measurements were obtained: the clustering coefficient, average shortest path lengths and betweenness. We found that the two factors with stronger dependence on authors were skewness in the distribution of word intermittency and the average shortest paths. Other factors such as betweenness and Zipfs law exponent show only weak dependence on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was about 65% accuracy upon combining complex networks and intermittency features with the nearest-neighbor algorithm of automatic authorship. From a detailed analysis of the interdependence of the various metrics, it is concluded that the methods used here are complementary for providing short- and long-scale perspectives on texts, which are useful for applications such as the identification of topical words and information retrieval.


Journal of Informetrics | 2012

Three-feature model to reproduce the topology of citation networks and the effects from authors’ visibility on their h-index

Diego R. Amancio; Osvaldo Novais Oliveira; Luciano da Fontoura Costa

Various factors are believed to govern the selection of references in citation networks, but a precise, quantitative determination of their importance has remained elusive. In this paper, we show that three factors can account for the referencing pattern of citation networks for two topics, namely “graphenes” and “complex networks”, thus allowing one to reproduce the topological features of the networks built with papers being the nodes and the edges established by citations. The most relevant factor was content similarity, while the other two – in-degree (i.e. citation counts) and age of publication – had varying importance depending on the topic studied. This dependence indicates that additional factors could play a role. Indeed, by intuition one should expect the reputation (or visibility) of authors and/or institutions to affect the referencing pattern, and this is only indirectly considered via the in-degree that should correlate with such reputation. Because information on reputation is not readily available, we simulated its effect on artificial citation networks considering two communities with distinct fitness (visibility) parameters. One community was assumed to have twice the fitness value of the other, which amounts to a double probability for a paper being cited. While the h-index for authors in the community with larger fitness evolved with time with slightly higher values than for the control network (no fitness considered), a drastic effect was noted for the community with smaller fitness.


international conference on communications, circuits and systems | 2006

Using Complex Networks for Language Processing: The Case of Summary Evaluation

Thiago Alexandre Salgueiro Pardo; Lucas Antiqueira; M. das Gracas Nunes; Osvaldo Novais Oliveira; L. da F. Costa

The ability to access embedded knowledge makes complex networks extremely promising for natural language processing, which normally requires deep knowledge representation that is not accessible with first-order statistics. In this paper, we demonstrate that features of complex networks, which have been shown to correlate with text quality, can be used to evaluate summaries. The metrics are the average degree, cluster coefficient, and the extent to which the dynamics of network growth deviates from a straight line. They were found to be much smaller for the high-quality, manual summaries, and increased for automatic summaries, thus pointing to a loss of quality, as expected. We also discuss the comparative performance of automatic summarizers.


processing of the portuguese language | 2006

Modeling and evaluating summaries using complex networks

Thiago Alexandre Salgueiro Pardo; Lucas Antiqueira; Maria das Graças Volpe Nunes; Osvaldo Novais Oliveira; Luciano da Fontoura Costa

This paper presents a summary evaluation method based on a complex network measure. We show how to model summaries as complex networks and establish a possible correlation between summary quality and the measure known as dynamics of the network growth. It is a generic and language independent method that enables easy and fast comparative evaluation of summaries. We evaluate our approach using manually produced summaries and automatic summaries produced by three automatic text summarizers for the Brazilian Portuguese language. The results are in agreement with human intuition and showed to be statistically significant.


Journal of Statistical Mechanics: Theory and Experiment | 2012

Using complex networks to quantify consistency in the use of words

Diego R. Amancio; Osvaldo Novais Oliveira; L. da F. Costa

In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipfs law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.


Physica A-statistical Mechanics and Its Applications | 2011

Using metrics from complex networks to evaluate machine translation

Diego R. Amancio; Maria das Graças Volpe Nunes; Osvaldo Novais Oliveira; Thiago Alexandre Salgueiro Pardo; Lucas Antiqueira; L. da F. Costa


Inteligencia Artificial,revista Iberoamericana De Inteligencia Artificial | 2007

Some issues on complex networks for author characterization

L. Antiqueira; Thiago Alexandre Salgueiro Pardo; M. das Gracas Nunes; Osvaldo Novais Oliveira

Collaboration


Dive into the Osvaldo Novais Oliveira's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

L. da F. Costa

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

L. Antiqueira

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar

Volpe Nunes

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge