Archive | 2019

Kernel Methods for Graph-structured Data Analysis

 

Abstract


OF THE DISSERTATION Kernel Methods for Graph-structured Data Analysis by Zhen Zhang Doctor of Philosophy in Electrical Engineering Washington University in St. Louis, 2019 Professor Arye Nehorai, Chair Structured data modeled as graphs arise in many application domains, such as computer vision, bioinformatics, and sociology. In this dissertation, we focus on three important topics in graph-structured data analysis: graph comparison, graph embeddings, and graph matching, for all of which we propose effective algorithms by making use of kernel functions and the corresponding reproducing kernel Hilbert spaces. For the first topic, we develop effective graph kernels, named as “RetGK,” for quantitatively measuring the similarities between graphs. Graph kernels, which are positive definite functions on graphs, are powerful similarity measures, in the sense that they make various kernel-based learning algorithms, for example, clustering, classification, and regression, applicable to structured data. Our graph kernels are obtained by two-step embeddings. In the first step, we represent the graph nodes with numerical vectors in Euclidean spaces. To do this, we revisit the concept of random walks and introduce a new node structural role descriptor, the return probability feature. In the second step, we represent the whole graph xi with an element in reproducing kernel Hilbert spaces. After that, we can naturally obtain our graph kernels. The advantages of our proposed kernels are that they can effectively exploit various node attributes, while being scalable to large graphs. We conduct extensive graph classification experiments to evaluate our graph kernels. The experimental results show that our graph kernels significantly outperform state-of-the-art approaches in both accuracy and computational efficiency. For the second topic, we develop scalable attributed graph embeddings, named as “SAGE.” Graph embeddings are Euclidean vector representations, which encode the attributed and the topological information. With graph embeddings, we can apply all the machine learning algorithms, such as neural networks, regression/classification trees, and generalized linear regression models, to graph-structured data. We also want to highlight that SAGE considers both the edge attributes and node attributes, while RetGK only considers the node attributes. “SAGE” is a extended work of “RetGK,” in the sense that it is still based on the return probabilities of random walks and is derived from graph kernels. But “SAGE” uses a totally different strategy, i.e., the “distance to kernel and embeddings” algorithm, to further represent graphs. To involve the edge attributes, we introduce the adjoint graph, which can help convert edge attributes to node attributes. We conduct classification experiments on graphs with both node and edge attributes. “SAGE” achieves the better performances than all previous methods. For the third topic, we develop a new algorithm, named as “KerGM,” for graph matching. Typically, graph matching problems can be formulated as two kinds of quadratic assignment problems (QAPs): Koopmans-Beckmann’s QAP or Lawler’s QAP. In our work, we

Volume None
Pages None
DOI 10.7936/4wc3-w778
Language English
Journal None

Full Text