Getúlio J. A. Amaral
Federal University of Pernambuco
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Getúlio J. A. Amaral.
Journal of the American Statistical Association | 2007
Getúlio J. A. Amaral; Ian L. Dryden; Andrew T. A. Wood
We propose a novel bootstrap hypothesis testing approach for the problem of testing a null hypothesis of a common mean direction, mean polar axis, or mean shape across several populations of real unit vectors (the directional case) or complex unit vectors (the two-dimensional shape case). Multisample testing problems of this type arise frequently in directional statistics and shape analysis (as in other areas of statistics), but to date there has been relatively little discussion of nonparametric bootstrap approaches to this problem. The bootstrap approach described here is based on a statistic that can be expressed as the smallest eigenvalue of a certain positive definite matrix. We prove that this statistic has a limiting chi-squared distribution under the null hypothesis of equality of means across populations. Although we focus mainly on the version of the statistic in which neither isotropy within populations nor constant dispersion structure across populations is assumed, we explain how to modify the statistic so that either or both of these assumptions can be incorporated. Our numerical results indicate that the bootstrap approach proposed here may be expected to perform well in practice.
Communications in Statistics - Simulation and Computation | 2010
Getúlio J. A. Amaral; Luiz H. G. Dore; Rosangela Lessa; Borko Stosic
In this work it is shown how the k-means method for clustering objects can be applied in the context of statistical shape analysis. Because the choice of the suitable distance measure is a key issue for shape analysis, the Hartigan and Wong k-means algorithm is adapted for this situation. Simulations on controlled artificial data sets demonstrate that distances on the pre-shape spaces are more appropriate than the Euclidean distance on the tangent space. Finally, results are presented of an application to a real problem of oceanography, which in fact motivated the current work.
Knowledge Based Systems | 2017
Leandro C. Souza; Renata M. C. R. Souza; Getúlio J. A. Amaral; Telmo de Menezes e Silva Filho
Abstract Interval symbolic data is a complex data type that can often be obtained by summarizing large datasets. All existing linear regression approaches for interval data use certain fixed reference points to model intervals, such as midpoints, ranges and lower and upper bounds. This is a limitation, because different datasets might be better represented by different reference points. In this paper, we propose a new method for extracting knowledge from interval data. Our parametrized approach automatically extracts the best reference points from the regressor variables. These reference points are then used to build two linear regressions: one for the lower bounds of the response variable and another for its upper bounds. Before the regressions are applied, we compute a criterion to verify the mathematical coherence of predicted values. Mathematical coherence means that the upper bounds are greater than the lower bounds. If the criterion shows that the coherence is not guaranteed, we suggest the use of a novel interval Box-Cox transformation of the response variable. Experimental evaluations with synthetic and real interval datasets illustrate the advantages and the usefulness of the proposed method to perform interval linear regression.
Communications in Statistics - Simulation and Computation | 2013
Getúlio J. A. Amaral; Olga Patricia Reyes Floréz; Francisco José A. Cysneiros
We describe methods to detect influential observations in a sample of pre-shapes when the underlying distribution is assumed to be complex Bingham. One of these methods is based on Cooks distance, which is derived from the likelihood of the complex Bingham distribution. Other method is related to the tangent space, which is based on the local influence for the multivariate normal distribution. A method to detect outliers is also explained. The application of the methods is illustrated in both a real dataset and a simulated sample.
Iheringia Serie Zoologia | 2012
Carina Carneiro de Melo Moura; Elisângela da Silva Guimarães; Geraldo Jorge Barbosa de Moura; Getúlio J. A. Amaral; Arley C. da Silva
ABSTRACT. Spatio-temporal distribution and reproductive success of Eretmochelys imbricata on the beaches of Ipojuca, Pernambuco, Brazil. This study aimed to verify the spatio-temporal distribution of Eretmochelys imbricata (Linnaeus, 1766) and aspects of its reproductive biology, such as incubation time, reproductive success, biometric measurements of females, number of nests and fecundity. Data were collected during 2007 to 2010, on the beaches of Muro Alto, Cupe, Merepe, Porto de Galinhas, and Maracaipe, all of them located in the city of Ipojuca, state of Pernambuco, Brazil. Parameters relating to reproductive biology and nesting areas of the species were comparatively analyzed. Eretmochelys imbricata was recorded nesting between October and May, when 350 nests were monitored through three seasons. The spawning peak happened from January to March, also revealing a seasonal pattern. The number of nests differed significantly between seasons. The Merepe beach presented an elevated occurrence of nests (46 nests/km) if compared to the other monitored beaches. On the aspects of reproductive biology, the reproductive success was 65,6% and the incubation time interval ranged from 54 to 56 days. Biometric measurements were collected from 59 specimens, resulting in an average of 92,5 cm ± 4,5 for the curved carapace length, and of 83,4 cm ± 5 for the curved carapace width. The results can be used for subsidize conservation plans and demonstrate that the beaches recorded in this study are relevant as nesting areas for
Communications in Statistics - Simulation and Computation | 2009
Getúlio J. A. Amaral; Marcelo Rodrigo Portela Ferreira
Some bootstrap and boosting methods for problems related to classification are introduced in this article. The first method chooses better boosting weights by using a bootstrap search algorithm. The second method is a good way to define a classification frontier. A new formulation for boosting in linear discriminant analysis is given. Since in this new formulation the uncertainty is represented by the weighted covariance matrix, it is more appropriate from the conceptual point of view. Simulation results show that the proposed methods perform well in data analysis.
Communications in Statistics - Simulation and Computation | 2018
Abraão D. C. Nascimento; Raquel C. da Silva; Getúlio J. A. Amaral
Abstract Directional data are related to vectors on the unit sphere. When these vectors are not signed, this kind of data is called axial data. The Watson distribution is one of the main models for axial data. This model is equipped by two parameters: dominant axis and concentration. Based on the Rényi divergence and the Bhattacharya and Hellinger distances, we propose three hypothesis tests to check if two samples come from populations having the same concentration parameter. Results from synthetic and real data indicate that the proposed tests can yield good performance on Watson data.
Pattern Analysis and Applications | 2017
Diêgo B. M. Maciel; Getúlio J. A. Amaral; Renata M. C. R. de Souza; Bruno A. Pimentel
In the fuzzy k-modes clustering, there is just one membership degree of interest by class for each individual which cannot be sufficient to model ambiguity of data precisely. It is known that the essence of a multivariate thinking allows to expose the inherent structure and meaning revealed within a set of variables classified. In this paper, a multivariate approach for membership degrees is presented to better handle ambiguous data that share properties of different clusters. This method is compared with other fuzzy k-modes methods of the literature based on a multivariate internal index that is also proposed in this paper. Synthetic and real categorical data sets are considered in this study.
International Journal of Business Intelligence and Data Mining | 2017
Renata M. C. R. Souza; Maria P.S. Souza; Telmo de Menezes e Silva Filho; Getúlio J. A. Amaral
Swarm-based optimisation methods have been previously used for tackling clustering tasks, with good results. However, the results obtained by this kind of algorithm are highly dependent on the chosen fitness criterion. In this work, we investigate the influence of four different fitness criteria on swarm-based clustering performance. The first function is the typical sum of distances between instances and their cluster centroids, which is the most used clustering criterion. The remaining functions are based on three different types of data dispersion: total dispersion, within-group dispersion and between-groups dispersion. We use a swarm-based algorithm to optimise these criteria and perform clustering tasks with nine real and artificial datasets. For each dataset, we select the best criterion in terms of adjusted Rand index and compare it with three state-of-the-art swarm-based clustering algorithms, trained with their proposed criteria. Numerical results confirm the importance of selecting an appropriate fitness criterion for each clustering task.
Communications in Statistics-theory and Methods | 2017
C. M. Barros; Getúlio J. A. Amaral; A. D. C. Nascimento; Audrey H.M.A. Cysneiros
ABSTRACT A method for detecting outliers in axial data has been proposed by Best and Fisher (1986). For extending that work, we propose four new methods. Two of them are suitable for outlier detection and they depend on the classic geodesic distance and a modified version of this distance. The other two procedures, which are designed for influential observation detection, are based on the Kullback–Leibler and Cook’s distances. Some simulation experiments are performed to compare all considered methods. Detection and error rates are used as comparison criteria. Numerical results provide evidence in favor of the KL distance.