Donniell E. Fishkind | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Donniell E. Fishkind is active.

Explore More

Publication

Featured researches published by Donniell E. Fishkind.

Journal of the American Statistical Association | 2012

A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs

Daniel L. Sussman; Minh Tang; Donniell E. Fishkind; Carey E. Priebe

We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedding associates each node with a vector; these vectors are clustered via minimization of a square error criterion. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be misassigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result in the setting where the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs. We compare our method with Laplacian spectral clustering through analysis of simulated data and a graph derived from Wikipedia documents.

SIAM Journal on Matrix Analysis and Applications | 2013

CONSISTENT ADJACENCY-SPECTRAL PARTITIONING FOR THE STOCHASTIC BLOCK MODEL WHEN THE MODEL PARAMETERS ARE UNKNOWN ∗

Donniell E. Fishkind; Daniel L. Sussman; Minh Tang; Joshua T. Vogelstein; Carey E. Priebe

For random graphs distributed according to a stochastic block model, we consider the inferential task of partioning vertices into blocks using spectral techniques. Spectral partioning using the normalized Laplacian and the adjacency matrix have both been shown to be consistent as the number of vertices tend to infinity. Importantly, both procedures require that the number of blocks and the rank of the communication probability matrix are known, even as the rest of the parameters may be unknown. In this article, we prove that the (suitably modified) adjacency-spectral partitioning procedure, requiring only an upper bound on the rank of the communication probability matrix, is consistent. Indeed, this result demonstrates a robustness to model mis-specification; an overestimate of the rank may impose a moderate performance penalty, but the procedure is still consistent. Furthermore, we extend this procedure to the setting where adjacencies may have multiple modalities and we allow for either directed or undirected graphs.

SIAM Journal on Matrix Analysis and Applications | 1999

The Moore--Penrose Generalized Inverse for Sums of Matrices

James Allen Fill; Donniell E. Fishkind

In this paper we exhibit, under suitable conditions, a neat relationship between the Moore--Penrose generalized inverse of a sum of two matrices and the Moore--Penrose generalized inverses of the individual terms. We include an application to the parallel sum of matrices.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Graph Matching: Relax at Your Own Risk

Vince Lyzinski; Donniell E. Fishkind; Marcelo Fiori; Joshua T. Vogelstein; Carey E. Priebe; Guillermo Sapiro

Graph matching-aligning a pair of graphs to minimize their edge disagreements-has received wide-spread attention from both theoretical and applied communities over the past several decades, including combinatorics, computer vision, and connectomics. Its attention can be partially attributed to its computational difficulty. Although many heuristics have previously been proposed in the literature to approximately solve graph matching, very few have any theoretical support for their performance. A common technique is to relax the discrete problem to a continuous problem, therefore enabling practitioners to bring gradient-descent-type algorithms to bear. We prove that an indefinite relaxation (when solved exactly) almost always discovers the optimal permutation, while a common convex relaxation almost always fails to discover the optimal permutation. These theoretical results suggest that initializing the indefinite algorithm with the convex optimum might yield improved practical performance. Indeed, experimental results illuminate and corroborate these theoretical findings, demonstrating that excellent results are achieved in both benchmark and real data problems by amalgamating the two approaches.

PLOS ONE | 2015

Fast approximate quadratic programming for graph matching.

Joshua T. Vogelstein; John M. Conroy; Vince Lyzinski; Louis J. Podrazik; Steven G. Kratzer; Eric T. Harley; Donniell E. Fishkind; R. Jacob Vogelstein; Carey E. Priebe

Quadratic assignment problems arise in a wide variety of domains, spanning operations research, graph theory, computer vision, and neuroscience, to name a few. The graph matching problem is a special case of the quadratic assignment problem, and graph matching is increasingly important as graph-valued data is becoming more prominent. With the aim of efficiently and accurately matching the large graphs common in big data, we present our graph matching algorithm, the Fast Approximate Quadratic assignment algorithm. We empirically demonstrate that our algorithm is faster and achieves a lower objective value on over 80% of the QAPLIB benchmark library, compared with the previous state-of-the-art. Applying our algorithm to our motivating example, matching C. elegans connectomes (brain-graphs), we find that it efficiently achieves performance.

parallel computing | 2015

Spectral clustering for divide-and-conquer graph matching

Vince Lyzinski; Daniel L. Sussman; Donniell E. Fishkind; Henry Pao; Li Chen; Joshua T. Vogelstein; Youngser Park; Carey E. Priebe

We present a parallelized bijective graph matching algorithm that leverages seeds and is designed to match very large graphs. Our algorithm combines spectral graph embedding with existing state-of-the-art seeded graph matching procedures. We justify our approach by proving that modestly correlated, large stochastic block model random graphs are correctly matched utilizing very few seeds through our divide-and-conquer procedure. We also demonstrate the effectiveness of our approach in matching very large graphs in simulated and real data examples, showing up to a factor of 8 improvement in runtime with minimal sacrifice in accuracy.

systems man and cybernetics | 2007

Disambiguation Protocols Based on Risk Simulation

Donniell E. Fishkind; Carey E. Priebe; Kendall Giles; Leslie N. Smith; Vural Aksakalli

Suppose there is a need to swiftly navigate through a spatial arrangement of possibly forbidden regions, with each region marked with the probability that it is, indeed, forbidden. In close proximity to any of these regions, you have the dynamic capability of disambiguating the region and learning for certain whether or not the region is forbidden - only in the latter case may you proceed through that region. The central issue is how to most effectively exploit this disambiguation capability to minimize the expected length of the traversal. Regions are never entered while they are possibly forbidden, and thus, no risk is ever actually incurred. Nonetheless, for the sole purpose of deciding where to disambiguate, it may be advantageous to simulate risk, temporarily pretending that possibly forbidden regions are riskily traversable, and each potential traversal is weighted with its level of undesirability, which is a function of its traversal length and traversal risk. In this paper, the simulated risk disambiguation protocol is introduced, which has you follow along a shortest traversal - in this undesirability sense - until an ambiguous region is about to be entered; at that location, a disambiguation is performed on this ambiguous region. (The process is then repeated from the current location, until the destination is reached.) We introduce the tangent arc graph as a means of simplifying the implementation of simulated risk disambiguation protocols, and we show how to efficiently implement the simulated risk disambiguation protocols that are based on linear undesirability functions. The effectiveness of these disambiguation protocols is illustrated with examples, including an example that involves mine countermeasures path planning.

The Annals of Applied Statistics | 2015

Vertex nomination schemes for membership prediction

Donniell E. Fishkind; Vince Lyzinski; Henry Pao; Li Chen; Carey E. Priebe

Suppose that a graph is realized from a stochastic block model where one of the blocks is of interest, but many or all of the vertices’ block labels are unobserved. The task is to order the vertices with unobserved block labels into a “nomination list” such that, with high probability, vertices from the interesting block are concentrated near the list’s beginning. We propose several vertex nomination schemes. Our basic—but principled—setting and development yields a best nomination scheme (which is a Bayes–Optimal analogue), and also a likelihood maximization nomination scheme that is practical to implement when there are a thousand vertices, and which is empirically near-optimal when the number of vertices is small enough to allow comparison to the best nomination scheme. We then illustrate the robustness of the likelihood maximization nomination scheme to the modeling challenges inherent in real data, using examples which include a social network involving human trafficking, the EnronGraph, a worm brain connectome and a political blog network. 1. Article overview. In a stochastic block model, the vertices of the graph are partitioned into blocks, and the existence/nonexistence of an edge between any pair of vertices is an independent Bernoulli trial, with the Bernoulli parameter being a function of the block memberships of the pair of vertices. We are concerned here with a graph realized from a stochastic block model such that many or all of the vertices’ block labels are hidden (i.e., unobserved). Suppose that one particular block is of interest, and the task is to order the vertices with a hidden block label into a “nomination list” with the goal of having vertices from the interesting block concentrated near

Journal of Classification | 2016

On the Incommensurability Phenomenon

Donniell E. Fishkind; Cencheng Shen; Youngser Park; Carey E. Priebe

Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principal components analysis is performed separately on the data sets to reduce their dimensionality. In some circumstances it may happen that the two lower-dimensional data sets have an inordinately large Procrustean fitting-error between them. The purpose of this manuscript is to quantify this “incommensurability phenomenon”. In particular, under specified conditions, the square Procrustean fitting-error of the two normalized lower-dimensional data sets is (asymptotically) a convex combination (via a correlation parameter) of the Hausdorff distance between the projection subspaces and the maximum possible value of the square Procrustean fitting-error for normalized data. We show how this gives rise to the incommensurability phenomenon, and we employ illustrative simulations and also use real data to explore how the incommensurability phenomenon may have an appreciable impact.

Journal of the Operational Research Society | 2011

Sensor information monotonicity in disambiguation protocols

Xugang Ye; Donniell E. Fishkind; Lowell Abrams; Carey E. Priebe

Previous work has considered the problem of swiftly traversing a marked traversal-medium where the marks represent probabilities that associated local regions are traversable, further supposing that the traverser is equipped with a dynamic capability to disambiguate these regions en route. In practice, however, the marks are given by a noisy sensor, and are only estimates of the respective probabilities of traversability. In this paper, we investigate the performance of disambiguation protocols that utilize such sensor readings. In particular, we investigate the difference in performance when a disambiguation protocol employs various sensors ranked by their estimation quality. We demonstrate that a superior sensor can yield superior traversal performance—so called Sensor Information Monotonicity. In so doing, we provide to the decision-maker the wherewithal to quantitatively assess the advantage of a superior (and presumably more expensive) sensor in light of the associated improvement in performance.

Explore More