Edward K. Kao
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Edward K. Kao.
ieee high performance extreme computing conference | 2017
Siddharth Samsi; Vijay Gadepally; Michael B. Hurley; Michael Jones; Edward K. Kao; Sanjeev Mohindra; Paul Monticciolo; Albert Reuther; Steven Smith; William S. Song; Diane Staheli; Jeremy Kepner
The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
IEEE Transactions on Signal Processing | 2014
Steven T. Smith; Edward K. Kao; Kenneth D. Senne; Garrett Bernstein; Scott Philips
A novel unified Bayesian framework for network detection is developed, under which a detection algorithm is derived based on random walks on graphs. The algorithm detects threat networks using partial observations of their activity, and is proved to be optimum in the Neyman-Pearson sense. The algorithm is defined by a graph, at least one observation, and a diffusion model for threat. A link to well-known spectral detection methods is provided, and the equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements on coordinated covert networks. This performance is demonstrated using a new hybrid mixed-membership blockmodel introduced to simulate random covert networks with realistic properties.Network detection is an important capability in many areas of applied research in which data can be represented as a graph of entities and relationships. Oftentimes the object of interest is a relatively small subgraph in an enormous, potentially uninteresting background. This aspect characterizes network detection as a “big data” problem. Graph partitioning and network discovery have been major research areas over the last ten years, driven by interest in internet search, cyber security, social networks, and criminal or terrorist activities. The specific problem of network discovery is addressed as a special case of graph partitioning in which membership in a small subgraph of interest must be determined. Algebraic graph theory is used as the basis to analyze and compare different network detection methods. A new Bayesian network detection framework is introduced that partitions the graph based on prior information and direct observations. The new approach, called space-time threat propagation, is proved to maximize the probability of detection and is therefore optimum in the Neyman-Pearson sense. This optimality criterion is compared to spectral community detection approaches which divide the global graph into subsets or communities with optimal connectivity properties. We also explore a new generative stochastic model for covert networks and analyze using receiver operating characteristics the detection performance of both classes of optimal detection techniques.
Proceedings of SPIE | 2010
Ryan S. Holt; Peter A. Mastromarino; Edward K. Kao; Michael B. Hurley
Multi-class assignment is often used to aid in the exploitation of data in the Intelligence, Surveillance, and Reconnaissance (ISR) community. For example, tracking systems collect detections into tracks and recognition systems classify objects into various categories. The reliability of these systems is highly contingent upon the correctness of the assignments. Conventional methods and metrics for evaluating assignment correctness only convey partial information about the system performance and are usually tied to the specific type of system being evaluated. Recently, information theory has been successfully applied to the tracking problem in order to develop an overall performance evaluation metric. In this paper, the information-theoretic framework is extended to measure the overall performance of any multiclass assignment system, specifically, any system that can be described using a confusion matrix. The performance is evaluated based upon the amount of truth information captured and the amount of false information reported by the system. The information content is quantified through conditional entropy and mutual information computations using numerical estimates of the association probabilities. The end result is analogous to the Receiver Operating Characteristic (ROC) curve used in signal detection theory. This paper compares these information quality metrics to existing metrics and demonstrates how to apply these metrics to evaluate the performance of a recognition system.
ieee high performance extreme computing conference | 2017
Edward K. Kao; Vijay Gadepally; Michael B. Hurley; Michael Jones; Jeremy Kepner; Sanjeev Mohindra; Paul Monticciolo; Albert Reuther; Siddharth Samsi; William S. Song; Diane Staheli; Steven Smith
An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
international conference on acoustics, speech, and signal processing | 2012
Steven T. Smith; Scott Philips; Edward K. Kao
This paper addresses threat propagation on space-time graphs, defined to be a time-sampled graph. The application considered is geographical sites connected by tracks, though such graphs arise in many fields. Several new concepts and efficient algorithms are introduced, specifically, the space-time adjacency matrix and harmonic threat propagation. The cued threat propagation problem is shown to be equivalent to the harmonic solution to Laplaces equation on the graph. Alternately, the Perron-Frobenius theorem is applied to a modified space-time adjacency matrix to derive a concept of eigen-threat on space-time graphs. Both approaches yield fast, scalable algorithms for space-time threat propagation applicable to both very small and very large graphs. Algorithms are motivated by a continuous time stochastic process model. Detection performance is shown using a simulated insurgent network data for which harmonic space-time threat propagation achieves an 84% probability of detection with a 4% false alarm probability over the entire graph.
international conference on acoustics, speech, and signal processing | 2012
Scott Philips; Michael Yee; Edward K. Kao; Christian Anderson
Existing literature on network community detection typically exploits the structure of static associations between entities. However, real world network data often consists of observations of coordinated interactions between members who belong to multiple communities. This paper presents a novel perspective and approach for activity-based community detection, where a community is defined as a group of actors engaged in correlated activities over time. Detection is performed by propagating membership iteratively to neighboring nodes through edges that represent interactions. We compare the proposed approach to two state-of-the-art methods based on modularity, and demonstrate its effectiveness on a simulated vehicle movement dataset and the Enron email corpus.
international conference on acoustics, speech, and signal processing | 2014
Steven T. Smith; Edward K. Kao; Kenneth D. Senne; Garrett Bernstein
A Bayesian framework for network detection is developed based on random walks on graphs. Networks are detected using partial observations of their activity, and the Bayesian approach is proved to be optimum in the Neyman-Pearson sense, assuming random walk propagation on a given graph and diffusion model with absorbing states. The equivalence of the random walk and harmonic solutions to the Bayesian formulation is proven. A general diffusion model is introduced that utilizes spatio-temporal relationships between vertices, and is used for a specific space-time formulation that leads to significant performance improvements.
international conference on information fusion | 2011
Steven T. Smith; Andrew Silberfarb; Scott Philips; Edward K. Kao; Christian Anderson
Archive | 2013
Michael B. Hurley; Edward K. Kao
ieee signal processing workshop on statistical signal processing | 2018
Steven T. Smith; Edward K. Kao; Danelle C. Shah; Olga Simek; Donald B. Rubin