Is this you? Create Your Porfile

Pedro Contreras

Royal Holloway, University of London

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro Contreras is active.

Explore More

Publication

Featured researches published by Pedro Contreras.

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2012

Algorithms for hierarchical clustering: an overview

Fionn Murtagh; Pedro Contreras

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self‐organizing maps, and mixture models. We review grid‐based clustering, focusing on hierarchical density‐based approaches. Finally, we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid‐based algorithm.

SIAM Journal on Scientific Computing | 2008

Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding

Fionn Murtagh; Geoff Downs; Pedro Contreras

Coding of data, usually upstream of data analysis, has crucial implications for the data analysis results. By modifying the data coding—through use of less than full precision in data values—we can aid appreciably the effectiveness and efficiency of the hierarchical clustering. In our first application, this is used to lessen the quantity of data to be hierarchically clustered. The approach is a hybrid one, based on hashing and on the Ward minimum variance agglomerative criterion. In our second application, we derive a hierarchical clustering from relationships between sets of observations, rather than the traditional use of relationships between the observations themselves. This second application uses embedding in a Baire space, or longest common prefix ultrametric space. We compare this second approach, which is of

Journal of Classification | 2012

Fast, Linear Time Hierarchical Clustering using the Baire Metric

Pedro Contreras; Fionn Murtagh

O(n \log n)

Artificial Intelligence Review | 2003

Interactive Visual User Interfaces: A Survey

Fionn Murtagh; Tugba Taskaya; Pedro Contreras; Josiane Mothe; Kurt Englmeier

complexity, to k-means.

P-adic Numbers, Ultrametric Analysis, and Applications | 2012

Fast, linear time, m-adic hierarchical clustering for search and retrieval using the Baire metric, with linkages to generalized ultrametrics, hashing, formal concept analysis, and precision of data measurement

Fionn Murtagh; Pedro Contreras

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partitioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.

International Symposium on Statistical Learning and Data Sciences | 2015

Random Projection Towards the Baire Metric for High Dimensional Clustering

Fionn Murtagh; Pedro Contreras

Following a short survey of input data types onwhich to construct interactive visual userinterfaces, we report on a new and recentimplementation taking concept hierarchies asinput data. The visual user interfacesexpress domain ontologies which are based onthese concept hierarchies. We detail aweb-based implementation, and show examples ofusage. An appendix surveys related systems,many of them commercial.

Archive | 2010

Fast hierarchical clustering from the Baire distance

Pedro Contreras; Fionn Murtagh

We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of datameasurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.

arXiv: Machine Learning | 2012

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Fionn Murtagh; Pedro Contreras

For high dimensional clustering and proximity finding, also referred to as high dimension and low sample size data, we use random projection with the following principle. With the greater probability of close-to-orthogonal projections, compared to orthogonal projections, we can use rank order sensitivity of projected values. Our Baire metric, divisive hierarchical clustering, is of linear computation time.

Archives of Data Science Series A (Online First) | 2017

Clustering through High Dimensional Data Scaling: Applications and Implementations

Fionn Murtagh; Pedro Contreras

The Baire or longest common prefix ultrametric allows a hierarchy, a multiway tree, or ultrametric topology embedding, to be constructed very efficiently. The Baire distance is a 1-bounded ultrametric. For high dimensional data, one approach for the use of the Baire distance is to base the hierarchy construction on random projections. In this paper we use the Baire distance on the Sloan Digital Sky Survey (SDSS, http://www.sdss.org) archive. We are addressing the regression of (high quality, more costly to collect) spectroscopic and (lower quality, more readily available) photometric redshifts. Nonlinear regression is used for mapping photometric and astrometric redshifts.

Entropy | 2009

Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models in Image Segmentation

Fionn Murtagh; Pedro Contreras; Jean-Luc Starck

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. “Structure” can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

Explore More