Edwin Diday | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edwin Diday is active.

Explore More

Publication

Featured researches published by Edwin Diday.

Pattern Recognition | 1991

Symbolic clustering using a new dissimilarity measure

K. Chidananda Gowda; Edwin Diday

Abstract A new dissimilarity measure, based on “position”, “span” and “content” of symbolic objects is proposed for symbolic clustering. The dissimilarity measure is new in the sense that it is not just another aspect of a similarity measure. In the proposed hierarchical agglomerative clustering methodology, composite symbolic objects are formed using a Cartesian join operator whenever a mutual pair of symbolic objects is selected for agglomeration based on minimum dissimilarity. The minimum dissimilarity values of different merging levels are used to compute the cluster indicator values and hence to determine the number of clusters in the data. The results of the application of the algorithm on numeric data of known number of classes are described first so as to show the efficacy of the method. Subsequently, the results of the experiments on two data sets of Assertion type of symbolic objects drawn from the domains of fat-oil and microcomputers are presented.

systems man and cybernetics | 1992

Symbolic clustering using a new similarity measure

K.C. Gowda; Edwin Diday

A hierarchical, agglomerative, symbolic clustering methodology based on a similarity measure that takes into consideration the position, span, and content of symbolic objects is proposed. The similarity measure used is of a new type in the sense that it is not just another aspect of dissimilarity. The clustering methodology forms composite symbolic objects using a Cartesian join operator when two symbolic objects are merged. The maximum and minimum similarity values at various merging levels permit the determination of the number of clusters in the data set. The composite symbolic objects representing different clusters give a description of the resulting classes and lead to knowledge acquisition. The algorithm is capable of discerning clusters in data sets made up of numeric as well as symbolic objects consisting of different types and combinations of qualitative and quantitative feature values. In particular, the algorithm is applied to fat-oil and microcomputer data. >

Journal of the American Statistical Association | 2003

From the statistics of data to the statistics of knowledge: Symbolic data analysis

Lynne Billard; Edwin Diday

Increasingly, datasets are so large they must be summarized in some fashion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that the data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This article looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyze such data. It quickly becomes clear that the range of methodologies available draws analogies with developments before 1900 that formed a foundation for the inferential statistics of the 1900s, methods largely limited to small (by comparison) datasets and classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalog (so to speak) of new symbolic methodologies along with rigorous mathematical and statistical foundational work for these methods.

Archive | 1994

New Approaches in Classification and Data Analysis

Edwin Diday; Yves Lechevallier; Martin Schader; Patrice Bertrand; Bernard Burtschy

Classification and Clustering: Problems for the Future.- From classifications to cognitive categorization: the example of the road lexicon.- A review of graphical methods in Japan-from histogram to dynamic display.- New Data and New Tools: A Hypermedia Environment for Navigating Statistical Knowledge in Data Science.- On the logical necessity and priority of a monothetic conception of class, and on the consequent inadequacy of polythetic accounts of category and categorization.- Research and Applications of Quantification Methods in East Asian Countries.- Algorithms for a geometrical P.C.A. with the L1-norm.- Comparison of hierarchical classifications.- On quadripolar Robinson dissimilarity matrices.- An Ordered Set Approach to Neutral Consensus Functions.- From Apresjan Hierarchies and Bandelt-Dress Weak hierarchies to Quasi-hierarchies.- Spanning trees and average linkage clustering.- Adjustments of tree metrics based on minimum spanning trees.- The complexity of the median procedure for binary trees.- A multivariate analysis of a series of variety trials with special reference to classification of varieties.- Quality control of mixture. Application: The grass.- Mixture Analysis with Noisy Data.- Locally optimal tests on spatial clustering.- Choosing the Number of Clusters, Subset Selection of Variables, and Outlier Detection in the Standard Mixture-Model Cluster Analysis.- An examination of procedures for determining the number of clusters in a data set.- The gap test: an optimal method for determining the number of natural classes in cluster analysis.- Mode detection and valley seeking by binary morphological analysis of connectivity for pattern classification.- Interactive Class Classification Using Types.- K-means clustering in a low-dimensional Euclidean space.- Complexity relaxation of dynamic programming for cluster analysis.- Partitioning Problems in Cluster Analysis: A Review of Mathematical Programming Approaches.- Clusters and factors: neural algorithms for a novel representation of huge and highly multidimensional data sets.- Graphs and structural similarities.- A generalisation of the diameter criterion for clustering.- Percolation and multimodal data structuring.- Classification and Discrimination Techniques Applied to the Early Detection of Business Failure.- Recursive Partition and Symbolic Data Analysis.- Interpretation Tools For Generalized Discriminant Analysis.- Inference about rejected cases in discriminant analysis.- Structure Learning of Bayesian Networks by Genetic Algorithms.- On the representation of observational data used for classification and identification of natural objects.- Alternative strategies and CATANOVA testing in two-stage binary segmentation.- Alignment, Comparison and Consensus of Molecular Sequences.- An Empirical Evaluation of Consensus Rules for Molecular Sequences.- A Probabilistic Approach To Identifying Consensus In Molecular Sequences.- Applications of Distance Geometry to Molecular Conformation.- Classification of aligned biological sequences.- Use of Pyramids in Symbolic Data Analysis.- Proximity Coefficients between Boolean symbolic objects.- Conceptual Clustering in Structured Domains: A Theory Guided Approach.- Automatic Aid to Symbolic Cluster Interpretation.- Symbolic Clustering Algorithms using Similarity and Dissimilarity Measures.- Feature Selection for Symbolic Data Classification.- Towards extraction method of knowledge founded by symbolic objects.- One Method of Classification based on an Analysis of the Structural Relationship between Independent Variables.- The Integration of Neural Networks with Symbolic Knowledge Processing.- Ordering of Fuzzy k-Partitions.- On the Extension of Probability Theory and Statistics to the Handling of Fuzzy Data.- Fuzzy Regression.- Clustering and Aggregation of Fuzzy Preference Data: Agreement vs. Information.- Rough Classification with Valued Closeness Relation.- Representing proximities by network models.- An Eigenvector Algorithm to Fit lp-Distance Matrices.- A non linear approach to Non Symmetrical Data Analysis.- An Algorithmic Approach to Bilinear Models for Two-Way Contingency Tables.- New Approaches Based on Rankings in Sensory Evaluation.- Estimating failure times distributions from censored systems arranged in series.- Calibration Used as a Nonresponse Adjustment.- Least Squares Smoothers and Additive Decomposition.- High Dimensional Representations and Information Retrieval.- Experiments of Textual Data Analysis at Electricite de France.- Conception of a Data Supervisor in the Prospect of Piloting Management Quality of Service and Marketing.- Discriminant Analysis Using Textual Data.- Recent Developments in Case Based Reasoning: Improvements of Similarity Measures.- Contiguity in discriminant factorial analysis for image clustering.- Exploratory and Confirmatory Discrete Multivariate Analysis in a Probabilistic Approach for Studying the Regional Distribution of Aids in Angola.- Factor Analysis of Medical Image Sequences (FAMIS): Fundamental principles and applications.- Multifractal Segmentation of Medical Images.- The Human Organism-a Place to Thrive for the Immuno-Deficiency Virus.- Comparability and usefulness of newer and classical data analysis techniques. Application in medical domain classification.- The Classification of IRAS Point Sources.- Astronomical classification of the Hipparcos input catalogue.- Group identification and individual assignation of stars from kinematical and luminosity parameters.- Specific numerical and symbolic analysis of chronological series in view to classification of long period variable stars.- Author and Subject Index.

Archive | 2000

Regression Analysis for Interval-Valued Data

Lynne Billard; Edwin Diday

When observations in large data sets are aggregated into smaller more manageable data sizes, the resulting classifications of observations invariably involve symbolic data. In this paper, covariance and correlation functions are introduced for interval-valued symbolic data. These and their associated terms are then used to fit linear regression models to such data. The methods are illustrated with an example from cardiology.

Archive | 2002

Symbolic Regression Analysis

Lynne Billard; Edwin Diday

Billard and Diday (2000) developed procedures for fitting a regression equation to symbolic interval-valued data. The present paper compares that approach with several possible alternative models using classical techniques; the symbolic regression approach is preferred. Thence, a regression approach is provided for symbolic histogram-valued data. The results are illustrated with a medical data set.

Discrete Applied Mathematics | 2003

Maximal and stochastic Galois lattices

Edwin Diday; Richard Emilion

We present a general formula for the intent-extent mappings of a Galois lattice generated by individual descriptions which lie in any arbitrary lattice.The formulation is unique if a natural maximality condition is required. This formulation yields, as particular cases, formal concept binary Galois lattices of Wille, those defined by Brito or Blyth-Janowitz, as well as fuzzy or stochastic Galois lattices.For the case of random descriptors we show that the nodes of Galois lattices defined by distributions are limit of empirical Galois lattices nodes. Choquet capacities, t-norms and t-conorms appear as natural valuations of these lattices.

Statistical Analysis and Data Mining | 2011

Principal component analysis for interval-valued observations

A. Douzal-Chouakria; Lynne Billard; Edwin Diday

One feature of contemporary datasets is that instead of the single point value in the p-dimensional space ℜp seen in classical data, the data may take interval values thus producing hypercubes in ℜp. This paper studies the vertices principal components methodology for interval-valued data; and provides enhancements to allow for so-called ‘trivial’ intervals, and generalized weight functions. It also introduces the concept of vertex contributions to the underlying principal components, a concept not possible for classical data, but one which provides a visualization method that further aids in the interpretation of the methodology. The method is illustrated in a dataset using measurements of facial characteristics obtained from a study of face recognition patterns for surveillance purposes. A comparison with analyses in which classical surrogates replace the intervals, shows how the symbolic analysis gives more informative conclusions. A second example illustrates how the method can be applied even when the number of parameters exceeds the number of observations, as well as how uncertainty data can be accommodated.

Computational Statistics & Data Analysis | 2006

I-Scal: Multidimensional scaling of interval dissimilarities

Patrick J. F. Groenen; Suzanne Winsberg; O. Rodríguez; Edwin Diday

Multidimensional scaling aims at reconstructing dissimilarities between pairs of objects by distances in a low-dimensional space. However, in some cases the dissimilarity itself is unknown, but the range of the dissimilarity is given. Such fuzzy data give rise to a data matrix in which each dissimilarity is an interval of values. These interval dissimilarities are modelled by the ranges of the distances defined as the minimum and maximum distance between two rectangles representing the objects. Previously, two approaches for such data have been proposed and one of them is investigated. A new algorithm called I-Scal is developed. Because I-Scal is based on iterative majorization it has the advantage that each iteration is guaranteed to improve the solution until no improvement is possible. In addition, a rational start configuration is proposed that is helpful in locating a good quality local minima. In a simulation study, the quality of this algorithm is investigated and I-Scal is compared with one previously proposed algorithm. Finally, I-Scal is applied on an empirical example of dissimilarity intervals of sounds.

Archive | 1989

Symbolic Cluster Analysis

Edwin Diday; M. Paula Brito

The aim of this paper is to introduce the symbolic approach in data analysis and to show that it extends data analysis to more complex data which may be closer to the multidimensional reality. We introduce several kinds of symbolic objects (”events”, ”assertions”, and also ”hordes” and ”synthesis” objects) which are defined by a logical conjunction of properties concerning the variables. They can take for instance several values on a same variable and they are adapted to the case of missing and nonsense values. Background knowledge may be represented by hierarchical or pyramidal taxonomies. In clustering the problem remains to find inter-class structures such as partitions, hierarchies and pyramids on symbolic objects. Symbolic data analysis is conducted on several principles: accuracy of the representation, coherence between the kind of objects used at input and output, knowledge predominance for driving the algorithms, self-explanation of the results. We define order, union and intersection between symbolic objects and we conclude that they are organised according to an inheritance lattice. We study several properties and qualities of symbolic objects, of classes and of classifications of symbolic objects. Modal symbolic objects are then introduced. Finally, we present an algorithm to represent the clusters of a partition by modal assertions and obtain a locally optimal partition according to a given criterion.

Explore More