George D. C. Cavalcanti
Federal University of Pernambuco
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George D. C. Cavalcanti.
Expert Systems With Applications | 2013
Rafael Ferreira; Luciano de Souza Cabral; Rafael Dueire Lins; Gabriel de França Pereira e Silva; Fred Freitas; George D. C. Cavalcanti; Rinaldo Lima; Steven J. Simske; Luciano Favaro
Abstract Text summarization is the process of automatically creating a shorter version of one or more text documents. It is an important way of finding relevant information in large text libraries or in the Internet. Essentially, text summarization techniques are classified as Extractive and Abstractive. Extractive techniques perform text summarization by selecting sentences of documents according to some criteria. Abstractive summaries attempt to improve the coherence among sentences by eliminating redundancies and clarifying the contest of sentences. In terms of extractive summarization, sentence scoring is the technique most used for extractive text summarization. This paper describes and performs a quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature. Three different datasets (News, Blogs and Article contexts) were evaluated. In addition, directions to improve the sentence extraction results obtained are suggested.
congress on evolutionary computation | 2010
Nitai B. Silva; Ing Ren Tsang; George D. C. Cavalcanti; Ing-Jyh Tsang
A social network is composed by communities of individuals or organizations that are connected by a common interest. Online social networking sites like Twitter, Facebook and Orkut are among the most visited sites in the Internet. Presently, there is a great interest in trying to understand the complexities of this type of network from both theoretical and applied point of view. The understanding of these social network graphs is important to improve the current social network systems, and also to develop new applications. Here, we propose a friend recommendation system for social network based on the topology of the network graphs. The topology of network that connects a user to his friends is examined and a local social network called Oro-Aro is used in the experiments. We developed an algorithm that analyses the sub-graph composed by a user and all the others connected people separately by three degree of separation. However, only users separated by two degree of separation are candidates to be suggested as a friend. The algorithm uses the patterns defined by their connections to find those users who have similar behavior as the root user. The recommendation mechanism was developed based on the characterization and analyses of the network formed by the users friends and friends-of-friends (FOF).
Pattern Recognition | 2015
Rafael M. O. Cruz; Robert Sabourin; George D. C. Cavalcanti; Tsang Ing Ren
Dynamic ensemble selection systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of a base classifier, such as, its accuracy in local regions of the feature space around the query instance. However, using only one criterion about the behavior of a base classifier is not sufficient to accurately estimate its level of competence. In this paper, we present a novel dynamic ensemble selection framework using meta-learning. We propose five distinct sets of meta-features, each one corresponding to a different criterion to measure the level of competence of a classifier for the classification of input samples. The meta-features are extracted from the training data and used to train a meta-classifier to predict whether or not a base classifier is competent enough to classify an input instance. During the generalization phase, the meta-features are extracted from the query instance and passed down as input to the meta-classifier. The meta-classifier estimates, whether a base classifier is competent enough to be added to the ensemble. Experiments are conducted over several small sample size classification problems, i.e., problems with a high degree of uncertainty due to the lack of training data. Experimental results show that the proposed meta-learning framework greatly improves classification accuracy when compared against current state-of-the-art dynamic ensemble selection techniques. HighlightsWe propose a novel dynamic ensemble selection framework using meta-learning.We present five sets of meta-features to measure the competence of a classifier.Results demonstrate the proposed framework outperforms current techniques.
international conference on document analysis and recognition | 2009
Rodolfo P. dos Santos; Gabriela S. Clemente; Tsang Ing Ren; George D. C. Cavalcanti
Text extraction is an important phase in document recognition systems. In order to segment text from a page document it is necessary to detect all the possible manuscript text regions. In this article we propose an efficient algorithm to segment handwritten text lines. The text line algorithm uses a morphological operator to obtain the features of the images. Following, a sequence of histogram projection and recovery is proposed to obtain the line segmented region of the text. First, an Y histogram projection is performed which results in the text lines positions. To divide the lines in different regions a threshold is applied. After that, another threshold is used to eliminate false lines. These procedures, however, cause some loss on the text line area. So, a recovery method is proposed to minimize this effect. In order to detect the extreme positions of the text in the horizontal direction, an X histogram projection is applied. Then, as in the Y direction, another threshold is used to eliminate false words. Finally, in order to optimize the area of the manuscript text line, a text selection is carried out. Experimental results using the IAM-database showed that this new approach is robust, fast and produces very good score rates.
congress on evolutionary computation | 2007
Gabriel L. F. B. C. Azevedo; George D. C. Cavalcanti; Edson C. B. Carvalho Filho
Techniques based on biometrics have been successfully applied to personal identification systems. One rather promising technique uses the keystroke dynamics of each user in order to recognize him/her. In the present study, we present the development of a hybrid system based on support vector machines and stochastic optimization techniques. The main objective is the analysis of these optimization algorithms for feature selection. We evaluate two optimization techniques for this task: genetic algorithms (GA) and particle swarm optimization (PSO). We use the standard GA and we created a PSO variation, where each particle is represented by a vector of probabilities that indicate the possibility of selecting a particular feature and directly affects the original values of the features. In the present study, PSO outperformed GA with regard to classification error, processing time and feature reduction rate.
Expert Systems With Applications | 2014
Nara M. Portela; George D. C. Cavalcanti; Tsang Ing Ren
Magnetic resonance (MR) brain image segmentation of different anatomical structures or tissue types has become a critical requirement in the diagnosis of neurological diseases. Depending on the availability of the training samples, image segmentation can be either supervised or unsupervised. While supervised learning requires a sufficient amount of labelled training data, which is expensive and time-consuming, unsupervised learning techniques suffer from the problem of local traps. Semi-supervised algorithms that includes prior knowledge into the unsupervised learning can enhance the segmentation process without the need of labelled training data. This paper proposes a method to improve the quality of MR brain tissue segmentation and to accelerate the convergence process. The proposed method is a clustering based semi-supervised classifier that does not need a set of labelled training data and uses less human expert analysis than a supervised approach. The proposed classifier labels the voxels clusters of an image slice and then uses statistics and class labels information of the resultant clusters to classify the remaining image slices by applying Gaussian Mixture Model (GMM). The experimental results show that the proposed semi-supervised approach accelerates the convergence and improves the results accuracy when comparing with the classical GMM approach.
Expert Systems With Applications | 2012
Roberto H.W. Pinheiro; George D. C. Cavalcanti; Renato Fernandes Corrêa; Tsang Ing Ren
In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the proposed method with the Variable Ranking method using three text categorization benchmarks (Reuters-21578, 20 Newsgroup and WebKB), two different classifiers (k-Nearest Neighbor and Naive Bayes) and five feature evaluation functions. The experiments show that ALOFT obtains equivalent or better results than the classical Variable Ranking.
Expert Systems With Applications | 2015
Roberto H.W. Pinheiro; George D. C. Cavalcanti; Tsang Ing Ren
Bag-of-words is the most used representation method in text categorization. It represents each document as a feature vector where each vector position represents a word. Since all words in the database are considered features, the feature vector can reach tens of thousands of features. Therefore, text categorization relies on feature selection to eliminate meaningless data and to reduce the execution time. In this paper, we propose two filtering methods for feature selection in text categorization, namely: Maximum f Features per Document (MFD), and Maximum f Features per Document – Reduced (MFDR). Both algorithms determine the number of selected features f in a data-driven way using a global-ranking Feature Evaluation Function (FEF). The MFD method analyzes all documents to ensure that each document in the training set is represented in the final feature vector. Whereas MFDR analyzes only the documents with high FEF valued features to select less features therefore avoiding unnecessary ones. The experimental study evaluated the effectiveness of the proposed methods on four text categorization databases (20 Newsgroup, Reuters, WebKB and TDT2) and three FEFs using the Naive Bayes classifier. The proposed methods present better or equivalent results when compared with the ALOFT method, in all cases, and Variable Ranking, in more than 93% of the cases.
international symposium on neural networks | 2010
Rafael M. O. Cruz; George D. C. Cavalcanti; Tsang Ing Ren
This paper presents a novel approach for cursive character recognition by using multiple feature extraction algorithms and a classifier ensemble. Several feature extraction techniques, using different approaches, are extracted and evaluated. Two techniques, Modified Edge Maps and Multi Zoning, are proposed. The former one presents the best overall result. Based on the results, a combination of the feature sets is proposed in order to achieve high recognition performance. This combination is motivated by the observation that the feature sets are both, independent and complementary. The ensemble is performed by combining the outputs generated by the classifier in each feature set separately. Both fixed and trained combination rules are evaluated using the C-Cube database. A trained combination scheme using a MLP network as combiner achieves the best results which is also the best results for the C-Cube database by a good margin.
international symposium on neural networks | 2003
J.C.B. Melo; George D. C. Cavalcanti; Katia S. Guimarães
The PCA linear transformation method is used for feature extraction to the secondary structure prediction problem. The method of dimensionality reduction is applied on PSI-Blast profiles built on NCBIs Nonredundant Protein database. Different numbers of components extracted are used as input to three artificial neural networks with 30, 35 or 40 nodes in the hidden layer. Those classifiers are trained with the RPROP algorithm. To estimate the accuracy of the predictor the sevenfold cross-validation method is applied to CB396, a database used previously to evaluate the performance of several predictors. Aiming to increase the efficiency of the predictor presented here, the outputs of the classifiers are combined through five simple rules: product, average, voting, minimum and maximum. This original application for the PCA method derives relevant results. Even with a drastic reduction from 260 to 80 components, the accuracy obtained is at least 1% superior to the best one published for another predictor, the CONSENSUS, a combination of four other predictors. With a reduction from 260 to 180 components the performance is even better, achieving an Q/sub 3/ accuracy of 74.5%. The results flag the PCA as a promising method for feature extraction in the secondary structure prediction problem.