Michael E. Houle
National Institute of Informatics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael E. Houle.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1988
Michael E. Houle; Godfried T. Toussaint
For a set of points P in three-dimensional space, the width of P, W (P), is defined as the minimum distance between parallel planes of support of P. It is shown that W(P) can be computed in O(n log n+I) time and O(n) space, where I is the number of antipodal pairs of edges of the convex hull of P, and n is the number of vertices; in the worst case, I=O(n/sup 2/). For a convex polyhedra the time complexity becomes O(n+I). If P is a set of points in the plane, the complexity can be reduced to O(nlog n). For simple polygons, linear time suffices. >
international conference on data engineering | 2005
Michael E. Houle; Jun Sakuma
This paper introduces a practical index for approximate similarity queries of large multi-dimensional data sets: the spatial approximation sample hierarchy (SASH). A SASH is a multi-level structure of random samples, recursively constructed by building a SASH on a large randomly selected sample of data objects, and then connecting each remaining object to several of their approximate nearest neighbors from within the sample. Queries are processed by first locating approximate neighbors within the sample, and then using the pre-established connections to discover neighbors within the remainder of the data set. The SASH index relies on a pairwise distance measure, but otherwise makes no assumptions regarding the representation of the data. Experimental results are provided for query-by-example operations on protein sequence, image, and text data sets, including one consisting of more than 1 million vectors spanning more than 1.1 million terms - far in excess of what spatial search indices can handle efficiently. For sets of this size, the SASH can return a large proportion of the true neighbors roughly 2 orders of magnitude faster than sequential search.
Data Mining and Knowledge Discovery | 2016
Guilherme Oliveira Campos; Arthur Zimek; Jörg Sander; Ricardo J. G. B. Campello; Barbora Micenková; Erich Schubert; Ira Assent; Michael E. Houle
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.
Journal of Graph Algorithms and Applications | 1998
Prosenjit Bose; Hazel Everett; Sándor P. Fekete; Michael E. Houle; Anna Lubiw; Henk Meijer; Kathleen Romanik; Günter Rote; Thomas C. Shermer; Sue Whitesides; Christian Zelle
This paper proposes a 3-dimensional visibility representation of graphs G =( V;E) in which vertices are mapped to rectangles floating in R 3 parallel to the x;y-plane, with edges represented by vertical lines of sight. We apply an extension of the Erd} os-Szekeres Theorem in a geometric setting to obtain an upper bound of n = 56 for the largest representable complete graph Kn. On the other hand, we show by construction that n 22. These are the best existing bounds. We also note that planar graphs and complete bipartite graphs Km;n are representable, but that the family of representable graphs is not closed under graph minors.
symposium on applications and the internet | 2003
Yasuhiko Morimoto; Masaki Aono; Michael E. Houle; Kevin S. McCurley
The content of the World-Wide Web is pervaded by information of a geographical or spatial nature, particularly location information such as addresses, postal codes, and telephone numbers. We present a system for extracting spatial knowledge from collections of Web pages gathered by Web-crawling programs. For each page determined to contain location information, we apply geocoding techniques to compute geographic coordinates, such as latitude-longitude pairs. Next, we augment the location information with keyword descriptors extracted from Web page contents. We then apply spatial data mining techniques on the augmented location information to derive spatial knowledge.
international conference on data mining | 2010
Timothy de Vries; Sanjay Chawla; Michael E. Houle
Time, cost and energy efficiency are critical factors for many data analysis techniques when the size and dimensionality of data is very large. We investigate the use of Local Outlier Factor (LOF) for data of this type, providing a motivating example from real world data. We propose Projection-Indexed Nearest-Neighbours (PINN), a novel technique that exploits extended nearest neighbour sets in the a reduced dimensional space to create an accurate approximation for k-nearest-neighbour distances, which is used as the core density measurement within LOF. The reduced dimensionality allows for efficient sub-quadratic indexing in the number of items in the data set, where previously only quadratic performance was possible. A detailed theoretical analysis of Random Projection(RP) and PINN shows that we are able to preserve the density of the intrinsic manifold of the data set after projection. Experimental results show that PINN outperforms the standard projection methods RP and PCA when measuring LOF for many high-dimensional real-world data sets of up to 300000 elements and 102600 dimensions.
symposium on computational geometry | 1985
Michael E. Houle; Godfried T. Toussaint
Given a set of points <italic>P</italic> = {<italic>p</italic><subscrpt>1</subscrpt>,<italic>p</italic><subscrpt>2</subscrpt>,…,<italic>p<subscrpt>n</subscrpt></italic>} in three dimensions, the width of <italic>P, W</italic> (<italic>P</italic>), is defined as the minimum distance between parallel planes of support of <italic>P</italic>. It is shown that <italic>W</italic>(<italic>P</italic>) can be computed in <italic>&Ogr;</italic>(<italic>n</italic> log <italic>n</italic> + <italic>I</italic>) time and <italic>&Ogr;</italic>(<italic>n</italic>) space, where <italic>I</italic> is the number of antipodal pairs of edges of the convex hull of <italic>P</italic>, and in the worst case <italic>I</italic> - <italic>&Ogr;</italic>(<italic>n</italic><supscrpt>2</supscrpt>). If <italic>P</italic> is a set of points in the plane, this complexity can be reduced to <italic>&Ogr;</italic>(<italic>n</italic> log <italic>n</italic>). Finally, for simple polygons linear time suffices.
graph drawing | 2001
Carsten Friedrich; Michael E. Houle
Enabling the user of a graph drawing system to preserve the mental map between two different layouts of a graph is a major problem. Whenever a layout in a graph drawing system is modified, the mental map of the user must be preserved. One way in which the user can be helped in understanding a change of layout is through animation of the change. In this paper, we present clustering-based strategies for identifying groups of nodes sharing a common, simple motion from initial layout to final layout. Transformation of these groups is then handled separately in order to generate a smooth animation.
Algorithmica | 2001
Vladimir Estivill-Castro; Michael E. Houle
Abstract. In this paper we present a method for clustering geo-referenced data suitable for applications in spatial data mining, based on the medoid method. The medoid method is related to k -MEANS, with the restriction that cluster representatives be chosen from among the data elements. Although the medoid method in general produces clusters of high quality, especially in the presence of noise, it is often criticized for the Ω(n2) time that it requires. Our method incorporates both proximity and density information to achieve high-quality clusters in subquadratic time; it does not require that the user specify the number of clusters in advance. The time bound is achieved by means of a fast approximation to the medoid objective function, using Delaunay triangulations to store proximity information.
Information Sciences | 1993
Gen-Huey Chen; Michael E. Houle; Ming-Ter Kuo
A distributed algorithm is presented for constructing a nearly optimal Steiner tree in an asynchronous network represented by a weighted communication graph G = (V, E, c). The worst-case cost ratio of the obtained solution to any given minimum-cost Steiner tree Tmin is 2(1−1l), where l is the number of leaves of Tmin. The message complexity of the algorithm is O(|E| + |V|∗(|V| + log|V|)) and the time complexity is O|V|*|V|), where S is the subset of nodes of G to be connected.