Paula Brito
University of Porto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paula Brito.
Archive | 2007
Paula Brito; Patrice Bertrand; Guy Cucumel; Francisco de A. T. de Carvalho
This volume presents recent methodological developments in data analysis and classification. A wide range of topics is covered that includes methods for classification and clustering, dissimilarity analysis, graph analysis, consensus methods, conceptual analysis of data, analysis of symbolic data, statistical multivariate methods, data mining and knowledge discovery in databases. Besides structural and theoretical results, the book presents a wide variety of applications, in fields such as biology, micro-array analysis, cyber traffic, bank fraud detection, and text analysis. Combining new methodological advances with a wide variety of real applications, this volume is certainly of special value for researchers and practitioners, providing new analytical tools that are useful in theoretical research and daily practice in classification and data analysis.
IEEE Transactions on Knowledge and Data Engineering | 1994
Paula Brito
We study assertion objects that constitute a particular class of symbolic objects. Symbolic objects constitute a data analysis driven formalism, which can be compared to propositional calculus, but which is oriented toward the duality intension (characteristic properties) versus extension (set of all individuals verifying a given set of properties). The set of assertion objects is endowed with a partial order and a quasi-order. We focus on the property of completeness, which precisely expresses the duality intension-extension. The order structure of complete assertion objects is studied, using notions of lattice theory and Galois connection, and extending R. Willes work (1982) to multiple-valued data. Two results are then obtained for particular cases. >
Journal of Applied Statistics | 2012
Paula Brito; A. Pedro Duarte Silva
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance–covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.
Archive | 1990
Paula Brito; Edwin Diday
The development of Symbolic Data Analysis comes from the need both to process more general data than classical techniques of Data Analysis do and to develop methods that yield easily interpretable results. In this paper we show how we may enlarge the domain of the data at the input and obtain an “explained” output of a clustering method by adopting notions of Symbolic Data Analysis. We start by recalling the definitions and properties of symbolic objects (Diday (1987b), Diday and Brito (1989)). We shall consider objects that take one and only one value per variable, objects that may present more than one value per variable, and objects such that the definition of a variable depends on the value taken by another one. We then compare notions defined on symbolic objects to similar notions present in the literature (Wille (1982), Ganter (1984), Duquenne (1986), Guenoche (1989)) and show how the former extend the latter. We then recall pyramidal clustering and the main properties of pyramids (Diday (1986)). Pyramids are halfway between hierarchies and lattices: they generalize the former by allowing the presence of non-disjoint clusters, however a pyramid does not present crossing in its graphical representation like lattices do. This intermediate situation led us to adopt pyramids to structure symbolic objects: they allow the definition of a structure on the objects representing inheritance without losing “too much” information, and they have a readable graphical representation. We present an algorithm of “symbolic pyramidal clustering”. This algorithm may apply to a data set of some kind of symbolic objects considering even the case of dependence between variables. As output it yields a pyramid whose clusters are represented by symbolic objects meeting a given property. The inheritance structure between the clusters will then allow for the generation of rules.
Pattern Recognition | 2006
Helena Brás Silva; Paula Brito; Joaquim Pinto da Costa
Applying graph theory to clustering, we propose a partitional clustering method and a clustering tendency index. No initial assumptions about the data set are requested by the method. The number of clusters and the partition that best fits the data set, are selected according to the optimal clustering tendency index value.
Annals of Operations Research | 1995
Paula Brito
We recall a formalism based on the notion of symbolic object (Diday [15], Brito and Diday [8]), which allows to generalize the classical tabular model of Data Analysis. We study assertion objects, a particular class of symbolic objects which is endowed with a partial order and a quasi-order. Operations are then defined on symbolic objects. We study the property of completeness, already considered in Brito and Diday [8], which expresses the duality extension intension. We formalize this notion in the framework of the theory of Galois connections and study the order structure of complete assertion objects. We introduce the notion ofc-connection, as being a pair of mappings (f,g) between two partially ordered sets which should fulfil given conditions. A complete assertion object is then defined as a fixed point of the composedf o g; this mapping is called a “completeness operator” for it “completes” a given assertion object. The set of complete assertion objects forms a lattice and we state how suprema and infima are obtained. The lattice structure being too complex to allow a clustering study of a data set, we have proposed a pyramidal clustering approach [8]. The symbolic pyramidal clustering method builds a pyramid bottom-up, each cluster being described by a complete assertion object whose extension is the cluster itself. We thus obtain an inheritance structure on the data set. The inheritance structure then leads to the generation of rules.
Archive | 1994
Paula Brito
A method of symbolic pyramidal clustering is presented, which allows to construct a pyramid on a set of multi-valued data. The criteria guiding cluster formation are based on the notions of extension/intension, and lead to clusters allowing for a conjunctive description. Both input data and the obtained clusters are represented within the framework of the formalism of symbolic objects. The method is positionned in the context of conceptual clustering methods. Its advantages and drawbacks are then discussed on the basis of some applications.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2014
Paula Brito
Symbolic Data Analysis (SDA) provides a framework for the representation and analysis of data that comprehends inherent variability. While in Data Mining and classical Statistics the data to be analyzed usually presents one single value for each variable, that is no longer the case when the entities under analysis are not single elements, but groups gathered on the basis of some given criteria. Then, for each variable, variability inherent to each group should be taken into account. Also, when analysing concepts, such as botanic species, disease descriptions, car models, and so on, data entail intrinsic variability, which should be explicitly considered. To this purpose, new variable types have been introduced, whose realizations are not single real values or categories, but sets, intervals, or, more generally, distributions over a given domain. SDA provides methods for the (multivariate) analysis of such data, where the variability expressed in the data representation is taken into account, using various approaches.
Archive | 1998
Paula Brito
We start by recalling the basic clustering algorithm for symbolic clustering using the hierarchical and pyramidal models. This algorithm allows to cluster a set of assertion objects and provides a clustering structure where each class is represented by an assertion object, generalising all its members. We then show how to extend this algorithm so as to take into account probability distributions on discrete variables. This extension is made by suitably adapting the generalisation step and the generality degree used by the algorithm.
GfKl | 2007
Paula Brito
In this paper we discuss some issues which arise when applying classical data analysis techniques to interval data, focusing on the notions of dispersion, association and linear combinations of interval variables. We present some methods that have been proposed for analysing this kind of data, namely for clustering, discriminant analysis, linear regression and interval time series analysis.