Alneu de Andrade Lopes
University of São Paulo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alneu de Andrade Lopes.
Computers & Graphics | 2007
Alneu de Andrade Lopes; Roberto Pinho; Fernando Vieira Paulovich; Rosane Minghim
In many situations, individuals or groups of individuals are faced with the need to examine sets of documents to achieve understanding of their structure and to locate relevant information. In that context, this paper presents a framework for visual text mining to support exploration of both general structure and relevant topics within a textual document collection. Our approach starts by building a visualization from the text data set. On top of that, a novel technique is presented that generates and filters association rules to detect and display topics from a group of documents. Results have shown a very consistent match between topics extracted using this approach to those actually present in the data set.
Social Network Analysis and Mining | 2013
Jorge Carlos Valverde-Rebaza; Alneu de Andrade Lopes
Currently, online social networks and social media have become increasingly popular showing an exponential growth. This fact have attracted increasing research interest and, in turn, facilitating the emergence of new interdisciplinary research directions, such as social network analysis. In this scenario, link prediction is one of the most important tasks since it deals with the problem of the existence of a future relation among members in a social network. Previous techniques for link prediction were based on structural (or topological) information. Nevertheless, structural information is not enough to achieve a good performance in the link prediction task on large-scale social networks. Thus, the use of additional information, such as interests or behaviors that nodes have into their communities, may improve the link prediction performance. In this paper, we analyze the viability of using a set of simple and non-expensive techniques that combine structural with community information for predicting the existence of future links in a large-scale online social network, such as Twitter. Twitter, a microblogging service, has emerged as a useful source of informative data shared by millions of users whose relationships require no reciprocation. Twitter network was chosen because it is not well understood, mainly due to the occurrence of directed and asymmetric links yet. Experiments show that our proposals can be used efficiently to improve unsupervised and supervised link prediction task in a directed and asymmetric large-scale network.
visualization and data analysis | 2006
Rosane Minghim; Fernando Vieira Paulovich; Alneu de Andrade Lopes
This paper presents a technique for generation of maps of documents targeted at placing similar documents in the same neighborhood. As a result, besides being able to group (and separate) documents by their contents, it runs at very manageable computational costs. Based on multi-dimensional projection techniques and an algorithm for projection improvement, it results in a surface map that allows the user to identify a number of important relationships between documents and sub-groups of documents via visualization and interaction. Visual attributes such as height, color, isolines and glyphs as well as aural attributes (such as pitch), help add dimensions for integrated visual analysis. Exploration and narrowing of focus can be performed using a set of tools provided. This novel text mapping technique, named IDMAP (Interactive Document Map), is fully described in this paper. Results are compared with dimensionality reduction and cluster techniques for the same purposes. The maps are bound to support a large number of applications that rely on retrieval and examination of document collections and to complement the type of information offered by current knowledge domain visualizations.
brazilian symposium on artificial intelligence | 2012
Jorge Carlos Valverde-Rebaza; Alneu de Andrade Lopes
Cluster in graphs is densely connected group of vertices sparsely connected to other groups. Hence, for prediction of a future link between a pair of vertices, these vertices common neighbors may play different roles depending on if they belong or not to the same cluster. Based on that, we propose a new measure (WIC) for link prediction between a pair of vertices considering the sets of their intra-cluster or within-cluster (W) and between-cluster or inter-cluster (IC) common neighbors. Also, we propose a set of measures, referred to as W forms, using only the set given by the within-cluster common neighbors instead of using the set of all common neighbors as usually considered in the basic local similarity measures. Consequently, a previous clustering scheme must be applied on the graph. Using three different clustering algorithms, we compared WIC measure with ten basic local similarity measures and their counterpart W forms on ten real networks. Our analyses suggest that clustering information, no matter the clustering algorithm used, improves link prediction accuracy.
Information Sciences | 2011
João Roberto Bertini; Liang Zhao; Robson Motta; Alneu de Andrade Lopes
Graph is a powerful representation formalism that has been widely employed in machine learning and data mining. In this paper, we present a graph-based classification method, consisting of the construction of a special graph referred to as K-associated graph, which is capable of representing similarity relationships among data cases and proportion of classes overlapping. The main properties of the K-associated graphs as well as the classification algorithm are described. Experimental evaluation indicates that the proposed technique captures topological structure of the training data and leads to good results on classification task particularly for noisy data. In comparison to other well-known classification techniques, the proposed approach shows the following interesting features: (1) A new measure, called purity, is introduced not only to characterize the degree of overlap among classes in the input data set, but also to construct the K-associated optimal graph for classification; (2) nonlinear classification with automatic local adaptation according to the input data. Contrasting to K-nearest neighbor classifier, which uses a fixed K, the proposed algorithm is able to automatically consider different values of K, in order to best fit the corresponding overlap of classes in different data subspaces, revealing both the local and global structure of input data. (3) The proposed classification algorithm is nonparametric, implicating high efficiency and no need for model selection in practical applications.
Journal of Computer Science and Technology | 2014
Rafael Geraldeli Rossi; Alneu de Andrade Lopes; Thiago de Paulo Faleiros; Solange Oliveira Rezende
Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to objects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms.
Information Sciences | 2013
João Roberto Bertini; Liang Zhao; Alneu de Andrade Lopes
Non-stationary classification problems concern the changes on data distribution over a classifier lifetime. To face this problem, learning algorithms must conciliate essential, but difficult to gather, attributes like good classification performance, stability and low associated costs, like processing time and memory. This paper presents an extension of the K-associated optimal graph learning algorithm to cope with classification over non-stationary domains. The algorithm relies on a graph structure consisting of many disconnected components (subgraphs). Such graph enhances data representation by fitting locally groups of data according to a purity measure, which, in turn, quantifies the overlapping between vertices of different classes. As a result, the graph can be used to accurately estimate the probability of unlabeled data to belong to a given class. The proposed algorithm is benefited from the dynamical evolution of the graph by updating its set of components when new data is presented along time, by removing old components as new components arise. Experimental results on artificial and real domains and further statistical analysis show that the proposed algorithm is an effective solution to non-stationary classification problems.
Information Processing and Management | 2016
Rafael Geraldeli Rossi; Alneu de Andrade Lopes; Solange Oliveira Rezende
Scalable algorithm based on bipartite networks to perform transduction.Unlabeled data effectively employed to improve classification performance.Better performance than algorithms based on vector space model or networks.Rigorous evaluation to show the drawbacks of the existing transductive algorithms.Trade-off analysis between inductive supervised and transductive classification. Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.
Neurocomputing | 2015
Robson Motta; Rosane Minghim; Alneu de Andrade Lopes; Maria Cristina Ferreira de Oliveira
Abstract Multidimensional projections are valuable tools to generate visualizations that support exploratory analysis of a wide variety of complex high-dimensional data. However, projection mappings obtained from different techniques vary considerably, and users exploring the mappings or selecting between projection techniques still have limited assistance in their task. Current methods to assess projection quality fail to capture properties that are paramount to user interpretation, such as the capability of conveying class information, or the preservation of groups and neighborhoods from the original space. In this paper we propose a unifying framework to derive objective measures of the local behavior of projection mappings that support interpreting the mappings and comparing solutions regarding several properties. A quality value is computed for each data point, from which a single global value may be also assigned to the projection. Measures are computed from a recently introduced data graph model known as Extended Minimum Spanning Tree (EMST). Measurements of the topology of EMST graphs, built relative to the original and projected data representations, are scale independent and afford evaluation of multiple properties. We introduce measures of visual properties and of preservation of properties from the original space. They are targeted at (i) depicting class segregation capability; (ii) quantifying ‘neighborhood purity’ regarding classes; (iii) evaluating neighborhood preservation; and finally (iv) evaluating group preservation. We introduce the measures and illustrate how they can inform users about the local and global behavior of projection techniques considering multiple mappings of artificial and real data sets.
international conference on data mining | 2012
Rafael Geraldeli Rossi; Thiago de Paulo Faleiros; Alneu de Andrade Lopes; Solange Oliveira Rezende
Usually, algorithms for categorization of numeric data have been applied for text categorization after a preprocessing phase which assigns weights for textual terms deemed as attributes. However, due to characteristics of textual data, some algorithms for data categorization are not efficient for text categorization. Characteristics of textual data such as sparsity and high dimensionality sometimes impair the quality of general purpose classifiers. Here, we propose a text classifier based on a bipartite heterogeneous network used to represent textual document collections. Such algorithm induces a classification model assigning weights to objects that represents terms of the textual document collection. The induced weights correspond to the influence of the terms in the classification of documents they appear. The least-mean-square algorithm is used in the inductive process. Empirical evaluation using a large amount of textual document collections shows that the proposed IMBHN algorithm produces significantly better results than the k-NN, C4.5, SVM and Naïve Bayes algorithms.