GraphKKE: Graph Kernel Koopman Embedding for Human Microbiome Analysis
GGraphKKE: Graph Kernel Koopman Embeddingfor Human Microbiome Analysis
Kateryna Melnyk ∗ , Stefan Klus , Gr´egoire Montavon , and Tim O.F. Conrad Department of Mathematics and Computer Science, Freie Universit¨at Berlin Department of Mathematics, University of Surrey Electrical Engineering and Computer Science, Technische Universit¨at Berlin Zuse Institute Berlin
Abstract
More and more diseases have been found to be strongly correlated with disturbances in the microbiomeconstitution, e.g., obesity, diabetes, or some cancer types. Thanks to modern high-throughput omics technologies,it becomes possible to directly analyze human microbiome and its influence on the health status. Microbialcommunities are monitored over long periods of time and the associations between their members are explored.These relationships can be described by a time-evolving graph. In order to understand responses of the microbialcommunity members to a distinct range of perturbations such as antibiotics exposure or diseases and generaldynamical properties, the time-evolving graph of the human microbial communities has to be analyzed. Thisbecomes especially challenging due to dozens of complex interactions among microbes and metastable dynamics.The key to solving this problem is the representation of the time-evolving graphs as fixed-length feature vectorspreserving the original dynamics. We propose a method for learning the embedding of the time-evolving graphthat is based on the spectral analysis of transfer operators and graph kernels. We demonstrate that our methodcan capture temporary changes in the time-evolving graph on both created synthetic data and real-world data.Our experiments demonstrate the efficacy of the method. Furthermore, we show that our method can be appliedto human microbiome data to study dynamic processes.
Only about 1 out of 10 cells in our body is actually ahuman cell. We are colonized by a diverse community ofbacteria, archaea, and viruses, jointly referred to as themicrobiome. About 1.5 kg of microbes live almost ev-erywhere on and in the human body as symbionts, e.g.,on the skin, in the mouth, or in the gut. They have astrong influence on both their hosts and environments.For example, more and more diseases have been foundto be strongly correlated with the disturbances in themicrobiome constitution, e.g., obesity, diabetes, or somecancer types. Furthermore, recent studies have revealedthat gut microbiome also has a huge impact on brainfunctions and is related to disorders such as Alzheimer’sdisease (Xu and Wang [2016]). Most studies aiming atunderstanding the differences in the microbiome profilesof healthy and ill individuals, however, are focused on sta-tistical constitution analysis, omitting the large variety ofcomplex microbe–microbe and host–microbe interactions,which can be modeled as time-evolving graphs. Figure 1: An example of a time-evolving graph of microbeinteractions with two metastable states: healthy and ill .Red color of vertices means that the concentrations of mi-crobes decreased after a person became ill at time t + τ +1.This reduction results in the change of the topology of thetime-evolving graph characterized by removing the edgesbetween vertices in red and some vertices in yellow.It has also been found that although the constitutionof the microbiome is constantly changing throughout ourlives (in response to environmental factors), a healthy hu-man microbiome can be considered as a metastable statelying in a minimum of some ecological stability landscape(Shaw et al. [2019]). Broadly speaking, metastability canbe observed when for short timescales, the system ap-pears to be equilibrated, but at larger time scales, under- ∗ Correspondence: [email protected] a r X i v : . [ q - b i o . Q M ] S e p oes some transitions from one metastable state to othermetastable states (Bovier and den Hollander [2006]).This phenomenon occurs in dynamical systems of variousstructures, including systems with vector-valued states,but also systems represented as time-evolving graphs.As an illustration of a time-evolving graph that lies inan energy landscape with two metastable states, considerthe time-evolving microbiome interaction graph shown inFigure 1, where vertices represent the concentrations ofbacteria species and edges pairwise associations betweenthem. In this example, a disease can be thought of as aperturbation that displaces the microbiome compositionfrom its equilibrium ( healthy ) state. This displacement ischaracterized by removing the edges between vertices inred due to the reduction of the concentration of one of ver-tices. Given an evolution of the graphs (in this example,the evolution of the microbe interactions), we aim at ana-lyzing dynamics occurring in the graph over time, namely,extracting the number of metastable states and their lo-cations, substructures of a graph, which characterize thestate space (e.g., the difference in the microbe interactionsbetween the states healthy and ill ). Moreover, the detec-tion of the metastable states in the time-evolving graphcan serve additional purposes such as graph clustering.Figure 2: An illustration of the proposed method andchallenges which we aim to overcome. (a) Learningtransfer operators using graph kernels, where k ( · , · ) isa graph kernel and K k is the Koopman operator. (b) In the learned embedding space it is possible to detectmetastable states and to determine distinct substruc-tures.
Related work.
Two potential ways to detectmetastable states in a time-evolving graph (e.g., thestates healthy and ill in our example) are the following:1. A typical solution would be to analyze the time-evolving graph directly in the space of graphs with-out taking into account potential temporal correlations.Practically, this can take the form of a simple kernel-based graph clustering algorithm. Classic graph kernels decompose graphs into substructures (e.g., walks (Kanget al. [2012]), subgraphs (Shervashidze et al. [2009]),paths (Borgwardt and Kriegel [2005]), and subtrees (Sher-vashidze et al. [2011])) and count the number of commonsubstructures between graphs in order to obtain the fea-ture vectors. Afterwards, these feature vectors can beused by various machine learning approaches to clustersnapshots of the time-evolving graphs. The problem withsuch methods is that they are incapable of capturingthe time-information, which is crucial for time-evolvinggraphs with metastability.2. Another possible way is graph representation learn-ing, which aims at finding a mapping that embeds the sys-tem into some low-dimensional space. That is, we repre-sent a single snapshot of the time-evolving graph at eachtime point by a single vector retaining the original proper-ties of the dynamics. After finding the optimal embeddingspace, the low-dimensional representation can be used asa feature input for diverse machine learning approachesfor analyzing time-series data.The recently proposed methods for graph representa-tion learning focus mostly on static graphs. These meth-ods can be broadly divided into two categories. The firstcategory comprises methods for embedding graph sub-structures (e.g., vertices or subgraphs) (Perozzi et al.[2014], A. and J. [2016], Wang et al. [2016], Ou et al.[2016]). For instance, DeepWalk (Perozzi et al. [2014])and node2vec (A. and J. [2016]) are approaches that userandom walks to produce embeddings. The only differ-ence between them is that node2vec utilizes two hyper-parameters, where one of them controls the likelihood ofa random walk to return to the previously visited vertexand another parameter controls the likelihood to exploreundiscovered parts of a graph. DeepWalk first traversesthe graph with random walks in order to extract localstructures and then it uses the Skip-Gram algorithm tolearn embeddings. The second category pertains to rep-resentation learning of the entire graph, which is usedfor the classification/clustering of the set of graphs. Thegraph2vec approach (Narayanan et al. [2017]) learns theembedding of the set of graphs using the idea of the Skip-Gram from doc2vec (Le and Mikolov [2014]). It comprisestwo main components: (1) The generation of rooted sub-graphs around every vertex using the Weisfeiler–Lehmanrelabeling process from Shervashidze et al. [2011]; (2)Learning the embedding of the given graphs following theSkip-Gram with negative sampling procedure. Althoughthis approach is capable of projecting the entire set ofgraphs into low-dimensional space, it does not capturethe time-evolution of the graph.Recently, some work has also been done on learningthe embedding vectors of vertices in the time-evolvinggraph. Dyngraph2vec (Goyal et al. [2019]) is a deep-learning based approach which learns both the topologi-cal patterns in a graph and the temporal transitions using2ultiple nonlinear layers and recurrent layers. Moreover,it uses the lookback hyperparameter in the recurrent lay-ers to control the length of temporal patterns. The ideaof DynamicTriad (Zhou et al. [2018]) is to use a groupof three vertices, a so-called triad , to model the dynamicchanges of graph structures. This approach only consid-ers patterns within two time steps, which means that itcannot capture patterns that exist for a longer period oftime. The main disadvantage of the substructure rep-resentation learning approaches, both for static and fortime-evolving graphs, is that they are not able to projectthe entire set of snapshots of the time-evolving graph intolow-dimensional space.
Contribution.
To this end, we present an approachnamed graphKKE (the overall structure is shown in Fig-ure 2), which is, to our knowledge, the first approach forrepresentation learning of an entire time-evolving graph.Inspired by the proposed kernel transfer operator ap-proach for molecular conformation analysis (Klus et al.[2020], Klus et al. [2018]), we use the same approach forlearning the embeddings of time-evolving graphs. Themethod is based on the spectral analysis of transfer oper-ators, such as the Perron–Frobenius or Koopman operatorin a reproducing kernel Hilbert space.Overall, we highlight the following contributions: (cid:3)
We propose graphKKE , a novel unsupervised rep-resentation learning technique to analyze a time-evolving graph, i.e., class labels of the graphs arenot required for learning their embedding. More-over, we demonstrate the applicability of the graphkernels to time-evolving graphs. Our method is notonly capable of preserving the information aboutthe underlying dynamical graph patterns but alsoof taking into account the topological structure ofthe graph. (cid:3)
We present a new simulation method for construct-ing artificial benchmark datasets of time-evolvinggraphs with metastability and with graph struc-tures of different complexity. We demonstrate that graphKKE significantly outperforms other meth-ods for graph representation learning on severalbenchmark problems. (cid:3)
We illustrate that graphKKE can extract the im-portant associations among microbes and capturethe temporal changes occurring in the time-evolvingmicrobiome interaction graph.The remainder of this paper is organized as follows:In Section 2, the problem of learning the embeddings oftime-evolving graphs with metastable behavior is defined.In Section 3, we introduce transfer operators, graph ker-nels, and the method for the approximation of transfer operators using graph kernels. A model for the simula-tion of time-evolving benchmark graphs with metastabil-ity and the experiments with these benchmark datasetsare presented in Sections 4 and 5. Eventually, Section 6illustrates that it is possible to obtain a meaningful low-dimensional representation for microbiome data.
In order to state the problem formally, let us first in-troduce the necessary notations and definitions.A graph G is a pair ( V, E ) with a non-empty set of ver-tices V ( G ) and a set of edges E ( G ) = { ( v i , v j ) | v i , v j ∈ V } . The set V ( G ) often represents the objects in thedata and E ( G ) relations between objects. We define theadjacency matrix of the graph G as the n × n matrix A with A ij = 1 if the edge ( v i , v j ) ∈ E ( G ), and 0 other-wise. Furthermore, we say that ¯ G = ( ¯ V , ¯ E ) is a sub-graph of a graph G = ( V, E ) if and only if ¯ V ⊆ V and¯ E ⊆ E ∧ (( v i , v j ) ∈ ¯ E ⇒ v i , v j ∈ ¯ V ).Given a time-evolving graph G as a sequence of T graphs G = ( G , . . . , G T − ) at the consecutive timepoints { , . . . , T − } for some T ∈ N . We call G t a time-snapshot of G at time t . We say that the time-evolvinggraph G exhibits metastable behavior if G can be parti-tioned into s subsets G = G ∪ · · · ∪ G s − for some s (cid:28) T such that for each time point t ∈ { , . . . , T − } P ( G t +1 ∈ G i | G t ∈ G j ) (cid:28) , if i (cid:54) = j and P ( G t +1 ∈ G i | G t ∈ G j ) ≈ , if i = j. We call G , . . . , G s − metastable states of the time-evolving graph G and each G t , t = 0 , . . . , T −
1, belongs toexactly one of the states G i . In most cases, each state G i is characterized by a certain pattern of graph attributes(i.e., edges, vertex labels).We define our problem as follows: Given a time-evolving graph G = ( G , . . . , G T − ) with assumedmetastable behavior, we aim to represent each time-snapshot G t as a vector in a low-dimensional space R m ,where m is a number of embedding dimensions, retainingthe metastable behavior of G . Commonly, the number of embedding dimensions m isa hyperparameter that has to be tuned in order to obtaina good performance, in our approach we will show thatthe number of embedding dimensions m can be chosen tobe the number of states s , which eliminates the need tooptimize this hyperparameter. In what follows, we first introduce transfer opera-tors, kernel functions, and graph kernels. Afterwards, we3resent our approach — graphKKE — that is capableof learning embeddings of time-evolving graphs preserv-ing temporal changes in a low-dimensional space.
In order to capture the temporal changes in the time-evolving graph, transfer operator theory will be used inour method. Therefore, we will briefly discuss transferoperators and their applicability in the analysis of dy-namical systems (for details, see Klus et al. [2016]). In-formation about the evolution of the system is containedin the spectral properties (such as eigenvalues and eigen-functions) of linear operators. The most commonly usedexamples of such operators are the Koopman operatorand the Perron–Frobenius operator.Let { X t } t ≥ be a stochastic process defined on a high-dimensional state space X ⊂ R d . The pointwise evolutionof X t can be formally described by the transition densityfunction p τ ( y | x ), which gives the probability to find theprocess at a point y after some lag time τ , given thatit started in x at time 0. More formally, the transitiondensity function is p τ ( y | x ) = P ( X t + τ = y | X t = x ) . With the aid of the transition density function, theKoopman operator expresses the evolution of a functionof the state, also called observable, whereas the Perron–Frobenius operator evolves probability densities. Let f t ∈ L ∞ ( X ) be an observable of the system. Then theKoopman operator K τ : L ∞ ( X ) → L ∞ ( X ) is defined by K τ f t ( x ) = (cid:90) p τ ( y | x ) f t ( y ) dy. (1)The evolution of probability densities can be describedin a similar way. Assume the initial density of the sys-tem is given by g t ∈ L ( X ). Then the Perron–Frobeniusoperator P τ : L ( X ) → L ( X ) is defined by P τ g t ( x ) = (cid:90) p τ ( x | y ) g t ( y ) dy. A density π is called invariant density or equilibriumdensity if it is invariant under the action of P τ , that is, P τ π = π . Let u t ( x ) = π ( x ) − g t ( x ) be a probability den-sity with respect to the equilibrium density π . Then, thePerron–Frobenius operator with respect to the equilib-rium density is defined as T τ u t ( x ) = 1 π ( x ) (cid:90) p τ ( x | y ) π ( y ) u t ( y ) dy. Both the Koopman operator K τ and the Peron–Frobenius operator P τ are linear, infinite-dimensional op-erators, which are adjoint to each other and, therefore,it should not matter which one we choose to study the behavior of the system. Moreover, although they are typ-ically defined on the function spaces L and L ∞ , we as-sume that the operators are well-defined on L (for de-tails, see Klus et al. [2016]).The information about the long-term behavior of thedynamical system is encoded in the spectral propertiesof these operators such as eigenvalues and eigenfunctions(Klus et al. [2020]). More precisely, eigenfunctions witheigenvalues close to 1 of both Koopman and Perron–Frobenius operators contain information about the loca-tions of metastable states in the state space X .Since transfer operators are infinite-dimensional, thegoal is to obtain a finite-dimensional approximation ofthese operators. Below, we will show how to obtaina finite-dimensional approximation of transfer operatorsutilizing the evaluation of graph kernels on training data. In this section, we describe kernel functions anda neighborhood aggregation graph kernel, the 1-dimensional Weisfeiler–Lehman kernel, since all our ex-periments make use of this graph kernel. However, onecan potentially use other graph kernels, which can be tai-lored to specific applications.
Kernel function.
Kernel-based methods are machinelearning algorithms that learn by comparing any pair ofdata points using similarity measures called kernel func-tions. We will say that k : X × X → R is a kernel on X ifthere is a Hilbert space H and a feature map ϕ : X → H such that k ( x, x (cid:48) ) = (cid:104) ϕ ( x ) , ϕ ( x (cid:48) ) (cid:105) (2)for x, x (cid:48) ∈ X and where (cid:104)· , ·(cid:105) is the inner product on H .A feature map ϕ exists if and only if k is a positive-semidefinite function. However, the kernel is normallynot defined by an explicit representation of ϕ , but in-stead, each kernel implicitly defines a potentially infinite-dimensional mapping ϕ .For a given set of data points x , . . . , x m ∈ X , the ma-trix K with K ij = k ( x i , x j ) for i, j = 0 , . . . , m , is called Gram matrix . The Gram matrix is positive semidefinitefor all possible { x , . . . , x m } .Now let G be a sequence of graphs, then a kernel k : G × G → H is called a graph kernel. Gaussian kernel.
The most popular kernel functionused in numerous kernel-based methods is the Gaussiankernel, which for two graphs G and (cid:98) G can be defined as k ( G, (cid:98) G ) = exp (cid:16) − (cid:107) A − (cid:98) A (cid:107) σ (cid:17) , where A and (cid:98) A are the respective adjacency matrices, σ > (cid:107) · (cid:107) the Frobe-4ius norm. The Hilbert space associated with the Gaus-sian kernel is an infinite-dimensional space H that is densein L and therefore, the space H is a rich approximationspace for L (Klus et al. [2018]). Weisfeiler–Lehman kernel.
In this work, we will usea neighborhood aggregation kernel — the Weisfeiler–Lehman (WL) kernel (Shervashidze et al. [2011]) — forgraphs with discrete vertex labels. However, one couldchoose any other class of graph kernels such as graphletkernels from Shervashidze et al. [2009] or random walkkernels from Kang et al. [2012].We will briefly give an overview of the Weisfeiler–Lehman kernel. Let G and (cid:98) G be graphs and l (0) is aset of unique original vertex labels of G and (cid:98) G . The keyidea of this kernel is to augment each vertex label by thesorted set of neighboring vertex labels, and then to com-press the augmented label into some new label using ahash function f . That is, at each iteration h = 1 , . . . , the1-dimensional Weisfeiler–Lehman kernel computes a newset of vertex labels l ( h ) such that l ( h ) v = f (cid:16) l ( h − v + ( l ( h − u + ... + l ( h − u k ) (cid:17) , { u , ..., u k } ∈ sorted( N ( v )) , ∀ v ∈ V ( G ) ∪ V ( (cid:98) G ) and where the symbol “+” denotes theconcatenation of strings, N ( v ) the set of neighbors of avertex v , and sorted( N ( v )) means that vertex labels needto be sorted before concatenation. The hash function f ischosen in such a way that f ( l ( h ) ( v )) = f ( l ( h ) ( v (cid:48) )) if andonly if l ( h ) ( v ) = l ( h ) ( v (cid:48) ), v, v (cid:48) ∈ V ( G ) ∪ V ( (cid:98) G ). The nextstep is to compute a feature vector for each graph G and (cid:98) G at each iteration h : ϕ ( h ) ( G ) = ( C ( h ) ( G, l ( h )0 ) , ..., C ( h ) ( G, l ( h ) | l ( h ) | )) , where l ( h ) = { l ( h )0 , l ( h )1 , . . . , l ( h ) | l ( h ) | } denotes the set of com-pressed vertex labels at iteration h and C ( h ) ( G, l ( h ) i ) isthe number of occurrences of a label l ( h ) i in the graph G at iteration h .Finally, the Weisfeiler–Lehman kernel for two graphs G and (cid:98) G is defined as: k ( G, (cid:98) G ) = (cid:104) ϕ (0) ( G ) , ϕ (0) ( (cid:98) G ) (cid:105) + ... + (cid:104) ϕ ( h ) ( G ) , ϕ ( h ) ( (cid:98) G ) (cid:105) . In the next subsection, we will introduce an approachfor learning the embedding of a time-evolving graph usingtransfer operators and graph kernels.
Now, we introduce a graph kernel-based approxima-tion method for time-evolving graphs inspired by themethod proposed in Klus et al. [2018]. Since we cannot compute eigendecompositions ofinfinite-dimensional operators numerically, typically suit-able finite-dimensional subspaces are considered. It wasshown that the initial eigenvalue problem on L can beapproximated by an eigenvalue problem defined on thereproducing kernel Hilbert space H utilizing only kernelevaluations.Assume we have measurement data, given by a time-evolving graph G = ( G , ..., G T − ), where each G t isa single snapshot of G at time point t and (cid:98) G is a setof graphs mapped forward for a time lag τ , that is, (cid:98) G t = G t + τ .It was shown in Klus et al. [2020] that in order tofind eigenfunctions of transfer operators, we need to solveauxiliary matrix eigenvalue problems, given by K − GG K (cid:98) GG ˜ φ = λ ˜ φ (3)and K − GG K G (cid:98) G ˜ φ = λ ˜ φ, (4)where [ K GG ] ij = k ( G i , G j ), [ K (cid:98) GG ] ij = k ( (cid:98) G i , G j ) denoteGram matrices, k ( · , · ) is a graph kernel, and K G (cid:98) G = K (cid:62) (cid:98) GG .The equations (3) and (4) approximate the Koopman op-erator and Perron–Frobenius operator, respectively.This eigenvalue problem is closely related to kernelcanonical correlation analysis (kernel CCA) (Klus et al.[2019]). Kernel CCA computes eigenfunctions of theforward-backward dynamics to identify so-called coher-ent sets. Coherent sets are a generalization of metastablesets and are regions of the state space that are not dis-torted over a certain time interval.Additionally, in order to evaluate the eigenfunctionsof these operators at a given graph, we set φ = Ψ ˜ φ and φ = Ψ K − GG ˜ φ, respectively, where Ψ = [ k ( · , G ) , . . . , k ( · , G T − )] is calleda feature matrix.We assume that K GG is non-singular or otherwise wereplace the inverse by its regularized version ( K GG + ηI ) − , where η ≥ k ( · , · ) is a graph kernel, then we applythe following normalization: k norm ( G i , G j ) = k ( G i , G j ) (cid:112) k ( G i , G i ) k ( G j , G j ) , for all i, j = 0 , . . . , T −
1. The same normalization isapplied to graphs in both G and (cid:98) G .The number of states s in the time-evolving graph G isdetermined by the number of dominant eigenvalues closeto 1. That is, if we have s dominant eigenvalues closeto 1, then the time-evolving graph can be divided into s G = G ∪ · · · ∪ G s − . Moreover, all informationabout long-term behavior of the time-evolving graph G is contained within the eigenfunctions associated with s dominant eigenvalues close to 1. All things considered,the dominant eigenvalues can be used to determine thenumber of states s in the data and the dimension of anew low-dimensional space. The eigenfunctions associ-ated with the dominant eigenvalues close to 1 are con-sidered as a low-dimensional representation of the time-evolving graph G . Most of the benchmark data sets such as those fromchemo- and bio-informatics domains, see Kersting et al.[2016], can be represented by static graphs. Thus, thesedatasets are not appropriate for our purposes, since theydo not have time information and metastable behavior.Hence, in this section we present a model for generatingtime-evolving graphs with a comprehensible structure toestimate the performance of the proposed method.Figure 3: An example of a trajectory of a particle in the5-well potential described in Section 4. Points indicatethe position of the particle at time t and blue lines showthe movement of the particle from time point t to t + τ .In order to obtain a time-evolving graph G withmetastability, we use a stochastic differential equationto generate a trajectory based on which a set of time-snapshots of the graph G is then constructed.Let us consider a particle in a 2-dimensional s -well po-tential given by the stochastic differential equation (SDE)(Klus et al. [2019]): dX t = −∇ F ( X t ) dt + (cid:112) β − dW t , (5)with the potential F ( x ) = cos( s arctan( x , x )) + 10 (cid:16)(cid:113) x + x − (cid:17) . Here, s denotes the number of wells, since we assumethat the number of wells defines the number of states inthe time-evolving graph G , the parameter β is the in-verse temperature and W t is a standard Wiener process.The particle stays in one of the wells for a relatively longtime and then jumps to one of the neighboring wells.We consider one realization (trajectory) S ∈ R of thestochastic process X = { X t } L − t =0 , where L is the lengthof the trajectory. An example of such a trajectory S isshown in Figure 3, where the number of wells is s = 5 and β = 0 .
05. Before generating a time-evolving graph G , wecluster all points of S using k -means in order to obtainthe ground truth labels for time-snapshots of G . Everysynthetic benchmark data is based on this trajectory andconstructed as follows.Figure 4: An illustration of our benchmark data at times (a) t = 0, (b) t = 256. In both (a) and (b) , the leftimage shows a time-snapshot G and G and the rightimage are a points of the particle of the trajectory shownin Figure 3, which are clustered into 5 sets with k -means.Edges in red color are removed from the graph. Verticesin the circles are considered as patterns characterizingcorresponding states.The construction of the time-evolving graph G = { G , ..., G T − } can be described by a three-step process.In the first step, the trajectory S = { ( x ( i )1 x ( i )2 ) } L − i =0 us-ing SDE (5) is generated. We consider the case wherethe number of time points T in G is equal to the length L of S , we will then denote them both by T . In thesecond step, we choose the number of vertices n and as-sign positions ( a j , b j ), j = 0 , . . . , n − v ∈ V ( G t ) in a Cartesian coordinate system. The num-ber of vertices n and their positions will be the same for6ach G t ∈ G , t = 0 , . . . , T −
1. We use the uniformdistribution to generate random points ( a j , b j ) such that( a j , b j ) ∼ U [ − , × [ − , . Finally, in the third step of theconstruction process, we generate temporary patterns inthe structure of the time-evolving graph such that it ex-hibits metastable behavior in the following way. At eachtime point t ∈ { , . . . , T − } , we draw a circle around thepoint ( x ( t )1 , x ( t )2 ) ∈ S with radius r . We choose the radius r as the average of the radii of each cluster in S and r is the same for each t . Each time-snapshot G t is first setto be a complete graph. We define temporal patterns,which characterize each state of G , by removing all edgesbetween vertices that are inside the current circle. In or-der to add noise to the data we also remove edges outsidethe circle with the out-state probability. An example ofthe benchmark data is shown in Figure 4.The code to generate the data and codeof graphKKE are available on GitHub at https://github.com/k-melnyk/graphKKE . We illustrate the efficacy of graphKKE proposed inSection 3.3 on the benchmark dataset and a real-worlddataset with an artificial signal. We will show that ourmethod is capable of learning the embedding of the time-evolving graph maintaining all dynamic properties in suchway that it is possible to detect the metastable statesin the low-dimensional space. Besides the experimentswith benchmark and real-world datasets, we compare ourmethod with several state-of-the-art approaches for graphclustering.
Experimental setup.
In order to test the performanceof the method proposed in Section 3 and compare the re-sult to other baselines models, we generate the syntheticdata described in Section 4 with different configurationsof interest such as the number of vertices n , the numberof time steps T , and the number of states s . The datasetsare summarized in Table 2. For each dataset we set theout-state probability to 0 .
1. We apply graphKKE withthe Weisfeiler–Lehman graph kernel with number of it-erations h = 1 and regularization parameter η = 0 . G , we ap-ply k -means clustering to the SDE trajectory S . For theWeisfeiler–Lehman kernel, the initial set of vertex labels l is defined to be { , , , . . . , n } . Figure 5: The eigenvalues of the Koopman operator ap-proximated by graphKKE for 5DynG-100 dataset. Thelarge spectral gap after the fifth eigenvalue reveals thepresence of the 5 metastable states in the dataset. Results & Analysis.
We visualize the result only forthe 5DynG-100 dataset. The resulting eigenvalues areshown in Figure 5. A spectral gap after the fifth eigen-value indicates that the time-evolving graph G can be di-vided into s = 5 metastable states G = G ∪· · ·∪ G . Thisimplies that the number of embedding dimensions m isequal to the number of dominant eigenvalues or the num-ber of states s , since all information about the long-termbehavior of the time-evolving graph is contained withinthe eigenfunctions associated with s dominant eigenvaluesclose to 1. Thus, each time-snapshot of G is embeddedinto a new vector space R s as m = s , where the low-dimensional representations of time-snapshots are eigen-functions associated with s dominant eigenvalues.Applying k -means to the eigenfunctions associatedwith the five dominant eigenvalues results in the fiveclusters. Since each state of the time-evolving graph ischaracterized by some common pattern in the topologicalstructure, we average adjacency matrices of each state.Thus, if we have a time-evolving graph with s states G = G ∪ · · · ∪ G s − and { A , . . . , A s − } is a set of corre-sponding subsets of adjacency matrices, then A avgi = 1 | A i | | A i |− (cid:88) j =0 A ji , where A ji ∈ A i , i = 0 , . . . , s −
1. Each average adjacencymatrix A avgi is associated with the average graph G avgi .Figure 6 illustrates the graphs of each state, wherevertices are colored according to their degrees of the av-erage graph G avgi , i = 0 , . . . , s − graphKKE for the 5Dyn-100 dataset. Each plot showsthe average graph for each state, where color of the vertices reflects the average number of the edges. Vertices inbrown have more edges than vertices in yellow and each state has a different cluster of vertices with fewer edges.Thus, the state-dependent graph exhibits a unique topology making it distinguishable from each other.Figure 7: A visualisation of the real-world MovingPicdataset, where columns are the concentrations of partic-ular species over time. Darker color specifies the higherconcentration. (a) shows the original MovingPic datasetand (b) the dataset with added 5% noisy signal, as de-scribed in Section 5.2. In this experiment, we apply graphKKE to analyzea microbiome dataset — MovingPic coming from Capo-raso et al. [2011], where one male and one female weresampled daily at three body sites (gut, skin, and mouth)for 15 months and for 6 month, respectively. As a featurematrix, the OTU table D ∈ N T × p is used, where T isthe number of time points and p is the number of OTUs.The operational taxonomic units (OTUs) are defined asgroups of closely related microbes or bacteria species.We use the microbiome profile only from the skin andsince the data does not have any perturbations such asantibiotics exposure or diseases, we add an artificial noisysignal to the data in the following way. A practical justi-fication for adding noise to the signal is that the humanmicrobiome might react not only to major perturbationssuch as diseases or antibiotics exposure but also to someshort-term daily fluctuations such as changing of lifestyleor stress. Moreover, the noise will be added to test therobustness of graphKKE . Let d i = [ d i , d i , . . . , d T − i ] bethe T -dimensional column vector of OTU counts of the ith species. OTUs with less than 30% of total reads areremoved from the matrix D . We randomly choose 100OTUs that are used to add the noisy signal. The vectorof length T is constructed using a sine wave function: z = R · sin (cid:18) πtω (cid:19) and then for each i, i = 0 , . . . , d i , d i = d i + max(0 , z + (cid:15) · w · z ) , w ∼ Normal (0 ,
1) and (cid:15) is the level of Gaussiannoise. We set (cid:15) to one of { , . , . } . Figure 7 showsthe OTU table without the artificial signal and with theartificial signal.The next step is the construction of a time-evolvinggraph. Let d t = [ d t , d t , ..., d tp ] be the p -dimensional rowvector of OTU counts at time point t, t = 0 , ..., T −
1. Theraw OTU counts are typically normalized by the total cu-mulative count c t = (cid:80) pi =1 d ti in order to account for thedifferent sequencing depth (Lo and Marculescu [2019]).Thus, the normalization of d t by the total cumulativecount results in the relative abundance vector: x t = (cid:104) d t c t , d t c t , . . . , d tp c t (cid:105) for each time point t, t = 0 , . . . , T −
1. The time-snapshots of the time-evolving graph G = ( G , ..., G T − )are then constructed as follows. First of all, we computethe Pearson correlation coefficient of each pair of OTUs( d i , d j ), with i, j = 1 , ..., p in order to define an initialco-occurrence graph. We choose a threshold of 0 . . − . G . Edges with the Pearson correlationcoefficient in the range [ − .
5; 0 .
5] are removed from theinitial graph. Furthermore, to construct time-snapshotsfor each t = 0 , . . . , T −
1, we use the OTU counts. Ifthe OTU count for the current vertex is zero, we removeedges connecting this vertex and its neighboring vertices.The statistics of the pre-processed data can be seen inTable 2.Table 1: Hyperparameters for graphKKE used in thecomparative analysis in Section 5.3, where σ is the band-width, h the number of iterations, and η the regulariza-tion parameter. Dataset σ h η (cid:98) G t = G t + τ . That is, for thechosen lag time τ = 1, G = ( G , . . . , G T − ) and (cid:98) G =( G , . . . , G T − ). From the two time-evolving graphs G and (cid:98) G , we compute the Gram matrices K GG and K G (cid:98) G us-ing the Weisfeiler–Lehman kernel, where the number ofiterations is set to h = 1, and the regularization parame-ter to η = 0 .
9. Figure 8: The eigenvalues of the Koopman operator ap-proximated by graphKKE for different percentages ofGaussian noise added to the MovingPic dataset.
Results & Analysis.
The eigenvalues detected by graphKKE for different percentages of Gaussian noiseare shown in Figure 8. The gap after the second eigen-value and the values of these eigenvalues close to 1 implythe presence of two states in the time-evolving graph G .The spectral gap after the forth eigenvalue indicates thepresence of four states but we are not aware of the bio-logical interpretations of the second two states since theoriginal study does not mention any potential perturba-tions. The experiment also shows that graphKKE isrobust to the noise in the data. In order to find the lo-cation of the states, we cluster time-snapshots into twostates using k -means applied to the two normalized eigen-functions associated with two dominant eigenvalues withthe number of clusters set to 2.The following experiment will demonstrate whetherthe detected states in the benchmark and the real-worlddatasets correspond to the ground truth labels. Moreover,we will show that graphKKE outperforms other meth-ods for learning the embeddings of time-evolving graphs. Experimental setup.
The goal of this experiment isto compare graphKKE to several state-of-the-art repre-sentation learning and graph clustering approaches usingbenchmark and real-world datasets. The proposed ap-proach with two different graph kernels — Gaussian andWeisfeiler–Lehman kernels — is compared with graph2vec(Narayanan et al. [2017]) and the original WL kernel(Shervashidze et al. [2011]). The main idea of graph2vecis explained in Section 1 and the WL kernel is discussed inSection 3.2. Since the analysis is done for the graph clus-tering task, we apply k -means to the resulting embeddingvectors of every approach. The embedding dimensions of { , , , } were chosen for graph2vec. The hyper-parameters of graphKKE were chosen empirically and The combinations of hyperparameters with the biggest spectral gap were used. σ for the Gaussiankernel is critical for the performance of graphKKE . Theoptimal choice of σ is beyond the scope of this paper (fordetails see ? ). For the MovingPic dataset, the level ofGaussian noise is set to 0 .
05 in this experiment.Figure 9: The large spectral gap after the secondeigenvalue of the Koopman operator approximated by graphKKE indicates that the cholera infection datasetcan be divided into two metastable states.
Evaluation Metric.
In order to assess the results ofthe clustering of the embedding vectors for all approaches,the Adjusted Rand Index (ARI) is used. Higher ARI cor-responds to greater accuracy in correctly identifying theground truth labels/states.
Results & Analysis.
The graph clustering results forall datasets using graphKKE and other state-of-the-artmethods are presented in Table 3 (experimental datasets).For graph2vec the embedding dimension of 5 was used asa dimension with the best ARI to compare its result withthe results of other approaches. We observe that bothgraph2vec and WL kernel perform poorly on the bench-mark and real-world datasets. One reason of the poorembedding is that these two methods do not take intoaccount the time information which is crucial in time-evolving graphs with metastability.Additionally, this experiment shows that the detectedmetastable states using the embedding of graphKKE correspond exactly to the ground truth labels. In thebenchmark data, the ground truth labels are the labelsof the k -means clustering of the trajectory S . In the caseof the MovingPic dataset, the ground truth labels corre-spond to the time period when the sine wave function ofthe artificial signal is zero (label 0) or greater than zero(label 1). Having studied the performance of graphKKE onbenchmark datasets and the real-world dataset with the artificial signal, we now describe the application of our graphKKE approach to the microbiome data. Such datais more challenging than the benchmark data because thereal-world data generating process is more complex andalso contains noise.Figure 10: The same interaction graph constructed withPearson correlation coefficients for two different states de-tected using graphKKE . Vertices are colored accordingto the average number of edges: (a) period of cholerainfection. (b) period of recovery. Brown color impliesthat particular species (vertices) have more associationinteractions than species (vertices) in yellow.
Background.
The microbiome data, which we will an-alyze in this section comes from a study about recoveryfrom Vibrio cholerae infection (Hsiao et al. [2014]). Fecalmicrobiota was collected during acute diarrhea and re-covery periods of cholera in a cohort of seven Bangladeshiadults. In our experiments, we chose one patient, sincethere is variation in the constituents of the gut microbiotaamong individuals (Durack and Lynch [2019]) and thus,it can bias the result of detecting the metastable statessuch as diarrhea and recovery periods. The pre-processedOTU table were obtained from Zackular et al. [2015]. Theaim is to determine if there are metastable states in thisdata and if possible, the number of metastable states andtheir locations.The time-evolving graph from the given OTU table isconstructed in the same way as for the MovingPic datasetusing the relative abundance vector and Pearson correla-tion coefficients. In the real-world microbiome dataset,10able 2: Statistics of each dataset used in this paper.
Name ± std. ± ± ± ± ± graphKKE with Weisfeiler-Lehman kernel outperforms other methods. Dataset graph2vec WL kernel graphKKE +WL kernel graphKKE + GaussiankernelExperimental datasets Real-world dataset
CholeraInf 0.29 0.66 graphKKE using the Weisfeiler–Lehmangraph kernel. We set the number of iteration to 5 andthe regularization parameter to 0 . Results & Analysis
The resulting eigenvalues areshown in Figure 9. Two dominant eigenvalues close to 1implies that time-snapshots G t ∈ G constructed from thecholera infection dataset can be divided into two states.Moreover, the eigenfunctions associated with these twodominant eigenvalues contain all information about thelong-term behavior of the time-evolving graph G . Thus,applying some clustering method to two eigenfunctions,we can find the location of metastable states in G . Weagain use k -means with k = 2 for the first two eigenfunc-tions. This results in two subsets G = G ∪ G . Afterclustering eigenfunctions into two sets, we can comparethe topological structures of time-snapshots of the twostates. We compute the average adjacency matrices ineach set as discussed in Section 5.1. The result is shownin Figure 10. We see that depending on the state, differ-ent clusters of vertices have different degrees. This is dueto the fact that the cholera infection causes marked shiftsin microbiome composition. The biological meaning ofthese clusters and how they are related to the healthy/ill state are open questions and need to be analyzed in futurework.Moreover, we compare the metastable states detectedby graphKKE with the initial time periods of diarrheaand recovery. The ARI is shown in Table 3 (real-worlddataset).In addition, using the resulting embedding (eigenfunc-tions), we can further analyze the time-evolving graph G ,e.g., we can predict the state of G at the next time pointsor we can find the probability of G returning to the diar-rhea state if the person continues living in this area. The large variety of species and complex interactionsin the microbiome makes it challenging for researches toanalyze the responses of the microbiome to different per-turbations such as diseases or antibiotic exposures andits influence on the human health. However, most stud-ies aiming at understanding these dynamics are primar-ily focused on statistical constitution analysis omittingmore complex interactions that can be described as atime-evolving graph. One solution is to represent eachtime-snapshot of the time-evolving graph as a fixed-lengthfeature vector. Many existing approaches learn the em-bedding either of the static graphs or of the substruc-tures such as nodes, edges, or subgraphs, whereas forsome system it is of great importance to embed the en-tire time-snapshots of the time-evolving graph into a low-11imensional space preserving the global temporal mecha-nisms such as metastability.In this paper, we introduced an unsupervised ap-proach (i.e., class labels of single time-snapshots arenot required to learn the embedding) for learning amapping that embeds time-snapshots of a time-evolvinggraph exhibiting metastable behavior as points in a low-dimensional vector space. Our experiments on syntheticbenchmark and real-world data show that our approachis capable of learning a low-dimensional representation ofthe time-evolving graph that preserves the metastable be-havior. This embedding can then be clustered in order tosplit individual time-snapshots of the time-evolving graphinto states. Moreover, one can also analyze the dynamicsoccurring in the time-evolving graph (e.g., the probabilityof jumping from one state to another or the probabilitythat the graph will return to one of the states) and applydifferent machine learning techniques. Since we are deal-ing with graph-structured data, which usually representsthe interactions between objects, we can extract struc-tural information pertaining to particular states. Thelatter is beneficial in the case of biological interactionssuch as microbiome data, where it is crucial to under-stand the differences between states (e.g., healthy/ill).To this end, experimental results have shown that ourapproach can outperform several state-of-the-art meth-ods for representation learning of graphs. For instance,the comparative analysis has shown that applying onlyWeisfeiler–Lehman kernel to the time-evolving graph isnot sufficient to capture the underlying dynamical graphpatterns and consequently, to detect the metastable sets.We have shown that graph kernels are not only a pow-erful tool for analyzing static graphs but also for analyz-ing time-evolving graphs. The transfer operator approachin combination with graph kernels yields a method capa-ble not only of extracting structural information in eachtime-snapshot of the time-evolving graph but also of iden-tifying the evolution patterns, which may exist in time-evolving graphs with metastability over long periods oftime.
Acknowledgments
This work was supported by the German Ministry forEducation and Research (BMBF) within the Berlin BigData Center and the Berlin Center for Machine Learning(01IS14013A and 01IS18037J) and the Forschungscam-pus MODAL (project grant 3FO18501) and funded bythe Deutsche Forschungsgemeinschaft (DFG, German Re-search Foundation) under Germany’s Excellence Strat-egy – The Berlin Mathematics Research Center MATH+(EXC-2046/1, project ID: 390685689).
References
Grover A. and Leskovec J. Node2vec: Scalable FeatureLearning for Networks.
Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Dis-covery and Data Mining , pages 855–864, 2016. doi:10.1145/2939672.2939754.Karsten M. Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In , 2005.Banton Bovier and Frank den Hollander. Metastability:a potential theoretic approach. In
Proceedings of theInternational Congress of Mathematicians , pages 499–518, 2006.J Gregory Caporaso, Christian L Lauber, Elizabeth KCostello, Donna Berg-Lyons, Antonio Gonzalez, JesseStombaugh, Dan Knights, Pawel Gajer, Jacques Ravel,Noah Fierer, Jeffrey I Gordon, and Rob Knight. Mov-ing pictures of the human microbiome.
Genome biology ,12:R50, 05 2011. doi: 10.1186/gb-2011-12-5-r50.Juliana Durack and Susan V. Lynch. The gut micro-biome: Relationships with disease and opportunitiesfor therapy.
The Journal of experimental medicine ,216(1), 2019. doi: 10.1084/jem.20180448.Palash Goyal, Sujit Rokka Chhetri, and ArquimedesCanedo. dyngraph2vec: Capturing Network Dynam-ics using Dynamic Graph Representation Learning.
Knowl. Based Syst. , 187, 2019. doi: 10.1016/j.knosys.2019.06.024.Ansel Hsiao, A. M. Shamsir Ahmed, Sathish Subrama-nian, Nicholas W. Griffin, Lisa L. Drewry, William A.Petri, Rashidul Haque, Tahmeed Ahmed, and Jeffrey I.Gordon. Members of the human gut microbiota in-volved in recovery from vibrio cholerae infection.
Na-ture , 515:423–426, 2014. doi: 10.1038/nature13738.U Kang, Hanghang Tong, and Jimeng Sun. Fast Ran-dom Walk Graph Kernel.
Proceedings of the 12th SIAMInternational Conference on Data Mining, SDM 2012 ,pages 828–838, 2012. doi: 10.1137/1.9781611972825.71.Kristian Kersting, Nils M. Kriege, Christopher Mor-ris, Petra Mutzel, and Marion Neumann. Bench-mark data sets for graph kernels, 2016. URL http://graphkernels.cs.tu-dortmund.de .Stefan Klus, P´eter Koltai, and Christof Sch¨utte. On thenumerical approximation of the Perron–Frobenius andKoopman operator.
Journal of Computational Dynam-ics , 3:51 – 79, 2016. doi: 10.3934/jcd.2016003.Stefan Klus, Andreas Bittracher, Ingmar Schuster, andSch¨utte C. A kernel-based approach to molecular con-formation analysis.
Journal of Chemical Physics , 149,2018. doi: 10.1063/1.5063533.12tefan Klus, Brooke E. Husic, Mattes Mollenhauer, andFrank No´e. Kernel methods for detecting coherentstructures in dynamical data.
Chaos: An Interdis-ciplinary Journal of Nonlinear Science , 2019. doi:10.1063/1.5100267.Stefan Klus, Ingmar Schuster, and Krikamol Muandet.Eigendecompositions of Transfer Operators in Repro-ducing Kernel Hilbert Spaces.
Journal of NonlinearScience , 30, 2020. doi: 10.1007/s00332-019-09574-z.Quoc Le and Tomas Mikolov. Distributed representationsof sentences and documents. , 4, 05 2014.Chieh Lo and Radu Marculescu. MetaNN: accurate clas-sification of host phenotypes from metagenomic datausing neural networks.
BMC Bioinformatics , 314, 2019.doi: 10.1186/s12859-019-2833-2.Annamalai Narayanan, Mahinthan Chandramohan, Ra-jasekar Venkatesan, Lihui Chen, Yang Liu, and Shan-tanu Jaiswal. graph2vec: Learning Distributed Repre-sentations of Graphs.
ArXiv , 2017. doi: 10.1145/1235.Mingdong Ou, Peng Cui, Jian Pei, and Wenwu Zhu.Asymmetric transitivity preserving graph embedding.
KDD , 2016.Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deep-Walk: Online Learning of Social Representations.
As-sociation for Computing Machinery , pages 701–710,2014. doi: 10.1145/2623330.2623732.Liam P. Shaw, Hassan Bassam, Chris P. Barnes, A. SarahWalker, Nigel Klein, and Francois Balloux. Modelling microbiome recovery after antibiotics using a stabil-ity landscape framework.
The ISME Journal , 13:1–12,2019. doi: 10.1038/s41396-019-0392-1.Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri,Kurt Mehlhorn, and Karsten M. Borgwardt. Efficientgraphlet kernels for large graph comparison.
In 12thInternational Conference on Artificial Intelligence andStatistics , pages 488–495, 2009.Nino Shervashidze, Pascal Schweitzer, Erik Jan vanLeeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt.Weisfeiler–Lehman Graph Kernels.
Journal of MachineLearning Research , 12:2539–2561, 2011.Daixin Wang, Peng Cui, and Wenwu Zhu. StructuralDeep Network Embedding.
Proceedings of the 22ndACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining , pages 1225–1234,2016. doi: 10.1145/2939672.2939753.Rong Xu and QuanQiu Wang. Towards understandingbrain-gut-microbiome connections in alzheimer’s dis-ease.
BMC Systems Biology , 10, 2016. doi: 10.1186/s12918-016-0307-y.Joseph P Zackular, Nielson T Baxter, Grace Y Chen, andPatrick D Schloss. Manipulation of the gut microbiotareveals role in colon tumorigenesis. mSphere , 2015. doi:10.1128/mSphere.00001-15.Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and YuetingZhuang. Dynamic network embedding by modeling tri-adic closure process.