Matthew J. Rattigan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew J. Rattigan is active.

Explore More

Publication

Featured researches published by Matthew J. Rattigan.

Sigkdd Explorations | 2005

The case for anomalous link discovery

Matthew J. Rattigan; David D. Jensen

In this paper, we describe the challenges inherent to the task of link prediction, and we analyze one reason why many link prediction models perform poorly. Specifically, we demonstrate the effects of the extremely large class skew associated with the link prediction task. We then present an alternate task --- anomalous link discovery (ALD) --- and qualitatively demonstrate the effectiveness of simple link prediction models for the ALD task. We show that even the simplistic structural models that perform poorly on link prediction can perform quite well at the ALD task.

international conference on machine learning | 2007

Graph clustering with network structure indices

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Graph clustering has become ubiquitous in the study of relational data sets. We examine two simple algorithms: a new graphical adaptation of the k-medoids algorithm and the Girvan-Newman method based on edge betweenness centrality. We show that they can be effective at discovering the latent groups or communities that are defined by the link structure of a graph. However, both approaches rely on prohibitively expensive computations, given the size of modern relational data sets. Network structure indices (NSIs) are a proven technique for indexing network structure and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. We also present promising quantitative and qualitative evaluations of the modified algorithms on synthetic and real data sets.

knowledge discovery and data mining | 2006

Using structure indices for efficient approximation of network properties

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Statistics on networks have become vital to the study of relational data drawn from areas such as bibliometrics, fraud detection, bioinformatics, and the Internet. Calculating many of the most important measures - such as betweenness centrality, closeness centrality, and graph diameter-requires identifying short paths in these networks. However, finding these short paths can be intractable for even moderate-size networks. We introduce the concept of a network structure index (NSI), a composition of (1) a set of annotations on every node in the network and (2) a function that uses the annotations to estimate graph distance between pairs of nodes. We present several varieties of NSIs, examine their time and space complexity, and analyze their performance on synthetic and real data sets. We show that creating an NSI for a given network enables extremely efficient and accurate estimation of a wide variety of network statistics on that network.

international conference on data mining | 2007

Exploiting Network Structure for Active Inference in Collective Classification

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Jing He1,3, Guangyan Huang2, Yanchun Zhang1, and Yong Shi3 1School of Computer Science and Mathematics, Victoria University, Australia 2Institute of Software, Chinese Academy of Sciences, Beijing 100080, P.R.China 3Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100080, P.R.China [email protected], [email protected], [email protected], [email protected] Abstract Cluster analysis has been identified as a core task in data mining. What constitutes a cluster, or a good clustering, may depend on the background of researchers and applications. This paper proposes two optimization criteria of abstract degree and fidelity in the field of image abstract. To satisfy the fidelity criteria, a novel clustering algorithm named Global Optimized Color-based DBSCAN Clustering (GOC- DBSCAN) is provided. Also, non-optimized local color information based version of GOC-DBSCAN, called HSV-DBSCAN, is given. Both of them are based on HSV color space. Clusters of GOC-DBSCAN are analyzed to find the factors that impact on the performance of both abstract degree and fidelity. Examples show generally the greater the abstract degree is, the less is the fidelity. It also shows GOC- DBSCAN outperforms HSV-DBSCAN when they are evaluated by the two optimization criteria.Active inference seeks to maximize classification performance while minimizing the amount of data that must be labeled ex ante. This task is particularly relevant in the context of relational data, where statistical dependencies among instances can be exploited to improve classification accuracy. We show that efficient methods for indexing network structure can be exploited to select high-value nodes for labeling. This approach substantially outperforms random selection and selection based on simple measures of local structure. We demonstrate the relative effectiveness of this selection approach through experiments with a relational neighbor classifier on a variety of real and synthetic data sets, and identify the necessary characteristics of the data set that allow this approach to perform well.

knowledge discovery and data mining | 2003

Information awareness: a prospective technical assessment

David D. Jensen; Matthew J. Rattigan; Hannah Blau

Recent proposals to apply data mining systems to problems in law enforcement, national security, and fraud detection have attracted both media attention and technical critiques of their expected accuracy and impact on privacy. Unfortunately, the majority of technical critiques have been based on simplistic assumptions about data, classifiers, inference procedures, and the overall architecture of such systems. We consider these critiques in detail, and we construct a simulation model that more closely matches realistic systems. We show how both the accuracy and privacy impact of a hypothetical system could be substantially improved, and we discuss the necessary and sufficient conditions for this improvement to be achieved. This analysis is neither a defense nor a critique of any particular system concept. Rather, our model suggests alternative technical designs that could mitigate some concerns, but also raises more specific conditions that must be met for such systems to be both accurate and socially desirable.

knowledge discovery and data mining | 2005

The case for anomalous link detection

Matthew J. Rattigan; David D. Jensen

In this paper, we describe the challenges inherent to the Link Prediction (LP) problem in multirelational data mining, and explore the reasons why many LP models have performed poorly. We present the alternate (and complimentary) task of Anomalous Link Discovery (ALD) and qualitatively demonstrate the effectiveness of simple LP models for the ALD task.

ACM Transactions on Knowledge Discovery From Data | 2011

Indexing Network Structure with Shortest-Path Trees

Marc E. Maier; Matthew J. Rattigan; David D. Jensen

The ability to discover low-cost paths in networks has practical consequences for knowledge discovery and social network analysis tasks. Many analytic techniques for networks require finding low-cost paths, but exact methods for search become prohibitive for large networks, and data sets are steadily increasing in size. Short paths can be found efficiently by utilizing an index of network structure, which estimates network distances and enables rapid discovery of short paths. Through experiments on synthetic networks, we demonstrate that one such novel network structure index based on the shortest-path tree outperforms other previously proposed indices. We also show that it generalizes across arbitrarily weighted networks of various structures and densities, provides accurate estimates of distance, and has efficient time and space complexity. We present results on real data sets for several applications, including navigation, diameter estimation, centrality computation, and clustering---all made efficient by virtue of the network structure index.

international conference on data mining | 2010

Leveraging D-Separation for Relational Data Sets

Matthew J. Rattigan; David D. Jensen

Testing for marginal and conditional independence is a common task in machine learning and knowledge discovery applications. Prior work has demonstrated that conventional independence tests suffer from dramatically increased rates of Type I errors when naively applied to relational data. We use graphical models to specify the conditions under which these errors occur, and use those models to devise novel and accurate conditional independence tests.

Archive | 2003