Is this you? Create Your Porfile

Marc E. Maier

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc E. Maier is active.

Explore More

Publication

Featured researches published by Marc E. Maier.

international conference on machine learning | 2007

Graph clustering with network structure indices

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Graph clustering has become ubiquitous in the study of relational data sets. We examine two simple algorithms: a new graphical adaptation of the k-medoids algorithm and the Girvan-Newman method based on edge betweenness centrality. We show that they can be effective at discovering the latent groups or communities that are defined by the link structure of a graph. However, both approaches rely on prohibitively expensive computations, given the size of modern relational data sets. Network structure indices (NSIs) are a proven technique for indexing network structure and efficiently finding short paths. We show how incorporating NSIs into these graph clustering algorithms can overcome these complexity limitations. We also present promising quantitative and qualitative evaluations of the modified algorithms on synthetic and real data sets.

knowledge discovery and data mining | 2006

Using structure indices for efficient approximation of network properties

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Statistics on networks have become vital to the study of relational data drawn from areas such as bibliometrics, fraud detection, bioinformatics, and the Internet. Calculating many of the most important measures - such as betweenness centrality, closeness centrality, and graph diameter-requires identifying short paths in these networks. However, finding these short paths can be intractable for even moderate-size networks. We introduce the concept of a network structure index (NSI), a composition of (1) a set of annotations on every node in the network and (2) a function that uses the annotations to estimate graph distance between pairs of nodes. We present several varieties of NSIs, examine their time and space complexity, and analyze their performance on synthetic and real data sets. We show that creating an NSI for a given network enables extremely efficient and accurate estimation of a wide variety of network statistics on that network.

international conference on data mining | 2007

Exploiting Network Structure for Active Inference in Collective Classification

Matthew J. Rattigan; Marc E. Maier; David D. Jensen

Jing He1,3, Guangyan Huang2, Yanchun Zhang1, and Yong Shi3 1School of Computer Science and Mathematics, Victoria University, Australia 2Institute of Software, Chinese Academy of Sciences, Beijing 100080, P.R.China 3Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100080, P.R.China [email protected], [email protected], [email protected], [email protected] Abstract Cluster analysis has been identified as a core task in data mining. What constitutes a cluster, or a good clustering, may depend on the background of researchers and applications. This paper proposes two optimization criteria of abstract degree and fidelity in the field of image abstract. To satisfy the fidelity criteria, a novel clustering algorithm named Global Optimized Color-based DBSCAN Clustering (GOC- DBSCAN) is provided. Also, non-optimized local color information based version of GOC-DBSCAN, called HSV-DBSCAN, is given. Both of them are based on HSV color space. Clusters of GOC-DBSCAN are analyzed to find the factors that impact on the performance of both abstract degree and fidelity. Examples show generally the greater the abstract degree is, the less is the fidelity. It also shows GOC- DBSCAN outperforms HSV-DBSCAN when they are evaluated by the two optimization criteria.Active inference seeks to maximize classification performance while minimizing the amount of data that must be labeled ex ante. This task is particularly relevant in the context of relational data, where statistical dependencies among instances can be exploited to improve classification accuracy. We show that efficient methods for indexing network structure can be exploited to select high-value nodes for labeling. This approach substantially outperforms random selection and selection based on simple measures of local structure. We demonstrate the relative effectiveness of this selection approach through experiments with a relational neighbor classifier on a variety of real and synthetic data sets, and identify the necessary characteristics of the data set that allow this approach to perform well.

knowledge discovery and data mining | 2007

Relational data pre-processing techniques for improved securities fraud detection

Andrew S. Fast; Lisa Friedland; Marc E. Maier; Brian J. Taylor; David D. Jensen; Henry G. Goldberg; John Komoroske

Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers(NASD). We describe several methods for data pre-processing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.

Mathematical and Computer Modelling | 2007

A hybrid model for tumor-induced angiogenesis in the cornea in the presence of inhibitors

Heather A. Harrington; Marc E. Maier; Lé Santha Naidoo; N. Whitaker; Panayotis G. Kevrekidis

The present work formulates and analyzes, by means of numerical experiments, a model for tumor-induced angiogenesis in the presence of inhibitors in the cornea. Our model is a generalization of the earlier work of Tong and Yuan [S. Tong, F. Yuan, Numerical simulations of angiogenesis in the cornea, Microvascular Research 61 (2001) 14-27] to incorporate the role of inhibitors, relevant to experimental assays. The derived set of hybrid equations consists of partial differential equations for the tumor angiogenic factors and the inhibitors with a particle model for the motion of the endothelial cells. This is analyzed numerically in the two-dimensional setting. The relevant results are discussed and qualitative agreement with the experimental work is illustrated.

ACM Transactions on Knowledge Discovery From Data | 2011

Indexing Network Structure with Shortest-Path Trees

Marc E. Maier; Matthew J. Rattigan; David D. Jensen

The ability to discover low-cost paths in networks has practical consequences for knowledge discovery and social network analysis tasks. Many analytic techniques for networks require finding low-cost paths, but exact methods for search become prohibitive for large networks, and data sets are steadily increasing in size. Short paths can be found efficiently by utilizing an index of network structure, which estimates network distances and enables rapid discovery of short paths. Through experiments on synthetic networks, we demonstrate that one such novel network structure index based on the shortest-path tree outperforms other previously proposed indices. We also show that it generalizes across arbitrarily weighted networks of various structures and densities, provides accurate estimates of distance, and has efficient time and space complexity. We present results on real data sets for several applications, including navigation, diameter estimation, centrality computation, and clustering---all made efficient by virtue of the network structure index.

uncertainty in artificial intelligence | 2013