Eirini Spyropoulou
University of Bristol
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eirini Spyropoulou.
Data Mining and Knowledge Discovery | 2014
Eirini Spyropoulou; Tijl De Bie; Mario Boley
Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected subsets of database entities. We show how this pattern syntax is generally applicable to multi-relational data, while it reduces to well-known tiles “ Geerts et al. (Proceedings of Discovery Science, pp 278–289, 2004)” when the data is a simple binary or attribute-value table. We propose RMiner, a simple yet practically efficient divide and conquer algorithm to mine such patterns which is an instantiation of an algorithmic framework for efficiently enumerating all fixed points of a suitable closure operator “Boley et al. (Theor Comput Sci 411(3):691–700, 2010)”. We show how the interestingness of patterns of the proposed syntax can conveniently be quantified using a general framework for quantifying subjective interestingness of patterns “De Bie (Data Min Knowl Discov 23(3):407–446, 2011b)”. Finally, we illustrate the usefulness and the general applicability of our approach by discussing results on real-world and synthetic databases.
international conference on data mining | 2011
Eirini Spyropoulou; Tijl De Bie
Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for highly simplified types of data, such as an attribute-value table or a binary database, such that those methods are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected sub graphs in a representation of the database as a k-partite graph. We show how this pattern syntax is generally applicable to multirelational data, while it reduces to well-known tiles [7] when the data is a simple binary or attribute-value table. We propose RMiner, an efficient algorithm to mine such patterns, and we introduce a method for quantifying their interestingness when contrasted with prior information of the data miner. Finally, we illustrate the usefulness of our approach by discussing results on real-world and synthetic databases.
knowledge discovery and data mining | 2010
Tijl De Bie; Kleanthis-Nikolaos Kontonasios; Eirini Spyropoulou
This paper suggests a framework for mining subjectively interesting pattern sets that is based on two components: (1) the encoding of prior information in a model for the data miners state of mind; (2) the search for a pattern set that is maximally informative while efficient to convey to the data miner. We illustrate the framework with an instantiation for tile patterns in binary databases where prior information on the row and column marginals is available. This approach implements step (1) above by constructing the MaxEnt model with respect to the prior information [2, 3], and step (2) by relying on concepts from information and coding theory. We provide a brief overview of a number of possible extensions and future research challenges, including a key challenge related to the design of empirical evaluations for subjective interestingness measures.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2012
Kleanthis-Nikolaos Kontonasios; Eirini Spyropoulou; Tijl De Bie
Knowledge discovery methods often discover a large number of patterns. Although this can be considered of interest, it certainly presents considerable challenges too. Indeed, this set of patterns often contains lots of uninteresting patterns that risk overwhelming the data miner. In addition, a single interesting pattern can be discovered in a multitude of tiny variations that for all practical purposes are redundant. These issues are referred to as the pattern explosion problem. They lie at the basis of much recent research attempting to quantify interestingness and redundancy between patterns, with the purpose of filtering down a large pattern set to an interesting and compact subset. Many diverse approaches to interestingness and corresponding interestingness measures (IMs) have been proposed in the literature. Some of them, named objective IMs, define interestingness only based on objective criteria of the pattern and data at hand. Subjective IMs additionally depend on the users prior knowledge about the dataset. Formalizing unexpectedness is probably the most common approach for defining subjective IMs, where a pattern is deemed unexpected if it contradicts the users expectations about the dataset. Such subjective IMs based on unexpectedness form the focus of this paper. We categorize measures based on unexpectedness into two major subgroups, namely, syntactical and probabilistic approaches. Based on this distinction, we survey different methods for assessing the unexpectedness of patterns with a special focus on frequent itemsets, tiles, association rules, and classification rules.
discovery science | 2013
Eirini Spyropoulou; Tijl De Bie; Mario Boley
We present a novel method for mining local patterns from multi-relational data in which relationships can be of any arity. More specifically, we define a new pattern syntax for such data, develop an efficient algorithm for mining it, and define a suitable interestingness measure that is able to take into account prior information of the data miner. Our approach is a strict generalisation of prior work on multi-relational data in which relationships were restricted to be binary, as well as of prior work on local pattern mining from a single n-ary relationship. Remarkably, despite being more general our algorithm is comparably fast or faster than the state-of-the-art in these less general problem settings.
ieee international conference on data science and advanced analytics | 2015
Jefrey Lijffijt; Eirini Spyropoulou; Bo Kang; Tijl De Bie
Methods for local pattern mining are fragmented along two dimensions: the pattern syntax, and the data types on which they are applicable. Pattern syntaxes include subgroups, n-sets, itemsets, and many more; common data types include binary, categorical, and real-valued. Recent research on relational pattern mining has shown how the aforementioned pattern syntaxes can be unified in a single framework. However, a unified model to deal with various data types is lacking, certainly for more complexly structured types such as real numbers, time of day—which is circular—, geographical location, terms from a taxonomy, etc. We introduce P-N-RMiner, a generic tool for mining interesting local patterns in (relational) data with structured attributes. We show how to handle the attribute structures in a generic manner, by modelling them as partial orders. We also derive an information-theoretic subjective interestingness measure for such patterns and present an algorithm to efficiently enumerate the patterns. We find that (1) P-N-RMiner finds patterns that are substantially more informative, (2) the new interestingness measure cannot be approximated using existing methods, and (3) we can leverage the partial orders to speed up enumeration.
european conference on machine learning | 2013
Tijl De Bie; Eirini Spyropoulou
Exploratory Data Mining (EDM), the contemporary heir of Exploratory Data Analysis (EDA) pioneered by Tukey in the seventies, is the task of facilitating the extraction of interesting nuggets of information from possibly large and complexly structured data. Major conceptual challenges in EDM research are the understanding of how one can formalise a nugget of information (given the diversity of types of data of interest), and how one can formalise how interesting such a nugget of information is to a particular user (given the diversity of types of users and intended purposes). In this Nectar paper we briefly survey a number of recent contributions made by us and collaborators towards a theoretically motivated and practically usable resolution of these challenges.
knowledge discovery and data mining | 2016
Matthijs van Leeuwen; Tijl De Bie; Eirini Spyropoulou; Cedric Mesnage
The utility of a dense subgraph in gaining a better understanding of a graph has been formalised in numerous ways, each striking a different balance between approximating actual interestingness and computational efficiency. A difficulty in making this trade-off is that, while computational cost of an algorithm is relatively well-defined, a pattern’s interestingness is fundamentally subjective. This means that this latter aspect is often treated only informally or neglected, and instead some form of density is used as a proxy. We resolve this difficulty by formalising what makes a dense subgraph pattern interesting to a given user. Unsurprisingly, the resulting measure is dependent on the prior beliefs of the user about the graph. For concreteness, in this paper we consider two cases: one case where the user only has a belief about the overall density of the graph, and another case where the user has prior beliefs about the degrees of the vertices. Furthermore, we illustrate how the resulting interestingness measure is different from previous proposals. We also propose effective exact and approximate algorithms for mining the most interesting dense subgraph according to the proposed measure. Usefully, the proposed interestingness measure and approach lend themselves well to iterative dense subgraph discovery. Contrary to most existing approaches, our method naturally allows subsequently found patterns to be overlapping. The empirical evaluation highlights the properties of the new interestingness measure given different prior belief sets, and our approach’s ability to find interesting subgraphs that other methods are unable to find.
international conference data science | 2014
Eirini Spyropoulou; Tijl De Bie
Three recent trends aim to make local pattern mining more directly suited for use on data as it presents itself in practice, namely in a multi-relational form and affected by noise. The first of these trends is the generalisation of local pattern syntaxes to approximate, noise-tolerant, variants (notably fault-tolerant itemset mining and community detection). The second of these trends is to develop pattern syntaxes that are directly applicable to multi-relational data. The third one is to better quantify the interestingness of and redundancy between such local patterns. In this paper we leverage recent results from these lines of research to introduce a noise-tolerant pattern syntax for multi-relational data. We show how enumerating all patterns of this syntax in a given database can be done remarkably efficiently. We contribute a way to quantify the interestingness of these patterns, thus overcoming the pattern explosion problem. And finally, we show the usefulness of the pattern syntax and the scalability of the algorithm by presenting experimental results on real world and synthetic data.
Pattern Recognition Letters | 2016
Matt McVicar; Benjamin Sach; Cedric Mesnage; Jefrey Lijffijt; Eirini Spyropoulou; Tijl De Bie
Defining and computing distances between tree structures is a classical area of study in theoretical computer science, with practical applications in the areas of computational biology, information retrieval, text analysis, and many others. In this paper, we focus on rooted, unordered, uniquely-labelled trees such as taxonomies and other hierarchies. For trees as these, we introduce the intuitive concept of a ‘local move’ operation as an atomic edit of a tree. We then introduce SuMoTED, a new edit distance measure between such trees, defined as the minimal number of local moves required to convert one tree into another. We show how SuMoTED can be computed using a scalable algorithm with quadratic time complexity. Finally, we demonstrate its use on a collection of music genre taxonomies.