Non-parametric regression for networks
aa r X i v : . [ s t a t . M E ] S e p Non-parametric regression for networks
Katie E. Severn, Ian L. Dryden and Simon P. PrestonUniversity of Nottingham, University Park, Nottingham, NG7 2RD, UK
Abstract
Network data are becoming increasingly available, and so there is a need to develop suit-able methodology for statistical analysis. Networks can be represented as graph Laplacianmatrices, which are a type of manifold-valued data. Our main objective is to estimate a re-gression curve from a sample of graph Laplacian matrices conditional on a set of Euclideancovariates, for example in dynamic networks where the covariate is time. We develop anadapted Nadaraya-Watson estimator which has uniform weak consistency for estimation usingEuclidean and power Euclidean metrics. We apply the methodology to the Enron email corpusto model smooth trends in monthly networks and highlight anomalous networks. Another mo-tivating application is given in corpus linguistics, which explores trends in an authors writingstyle over time based on word co-occurrence networks.
Networks are of wide interest and able to represent many different phenomena, for example so-cial interactions and connections between regions in the brain (Kolaczyk, 2009; Ginestet et al.,2017). The study of dynamic networks has recently increased as more data of this type is becom-ing available (Rastelli et al., 2018), where networks evolve over time. In this paper we developsome non-parametric regression methods for modelling and predicting networks where covariatesare available. An application for this work is the study of dynamic networks derived from theEnron email corpus, in which each network corresponds to communications between employeesin a particular month (Diesner et al., 2005). Another motivating application is the study of evolv-ing writing styles in the novels of Jane Austen and Charles Dickens, in which each network is arepresentation of a novel based on word co-occurences, and the covariate is the time that writingof the novel began (Severn et al., 2020). In both applications, the goal is to model smooth trendsin the structure of the dynamic networks as they evolve over time.An example of previous work on dynamic network data is Friel et al. (2016) who embedded nodesof bipartite dynamic networks in a latent space, motivated by networks representing the connec-tion of leading Irish companies and board directors. We will also use embeddings in our work,although the bipartite constraints are not present. Further approaches include object functionalprincipal components analysis (Dubey and Mueller, 2019) applied to time-varying networks fromNew York taxi trip data; multi-scale time-series modelling (Kang et al., 2017) applied to magne-toencephalography data in neuroscience; and quotient space space methodology applied to brainarterial networks (Guo and Srivastava, 2020). 1he analysis of networks is a type of object oriented data analysis (Marron and Alonso, 2014), andimportant considerations are to decide what are the data objects and how are they represented. Weconsider datasets where each observation is a weighted network, denoted G m = ( V, E ) , compris-ing a set of nodes, V = { v , v , . . . , v m } , and a set of edge weights, E = { w ij : w ij ≥ , ≤ i, j ≤ m } , indicating nodes v i and v j are either connected by an edge of weight w ij > , or elseunconnected (if w ij = 0 ). An unweighted network is the special case with w ij ∈ { , } . Werestrict attention to networks that are undirected and without loops, so that w ij = w ji and w ii = 0 ,then any such network can be identified with its graph Laplacian matrix L = ( l ij ) , defined as l ij = ( − w ij , if i = j P k = i w ik , if i = j for ≤ i, j ≤ m . The graph Laplacian matrix can be written as L = D − A , in terms ofthe adjacency matrix, A = ( w ij ) , and degree matrix D = diag ( P mj =1 w j , . . . , P mj =1 w mj ) = diag ( A1 m ) , where m is the m -vector of ones. The i th diagonal element of D equals the degreeof node i . The space of m × m graph Laplacian matrices is of dimension m ( m − / and is L m = { L = ( l ij ) : L = L T ; l ij ≤ ∀ i = j ; L1 m = m } , (1)where m is the m -vector of zeroes. In fact the space L m is a closed convex subset of the coneof centred symmetric positive semi-definite m × m matrices and L m is a manifold with corners(Ginestet et al., 2017).Since the sample space L m for graph Laplacian data is non-Euclidean, standard approaches to non-parametric regression cannot be applied directly. In this paper, we use the statistical frameworkintroduced in Severn et al. (2020) for extrinsic analysis of graph Laplacian data, in which “extrin-sic” refers to the strategy of mapping data into a Euclidean space, where analysis is performed,before mapping back to L m . The choice of mapping enables freedom in the choice of metric usedfor the statistical analysis, and in various applications with manifold-valued data analysis there isevidence of advantage in using non-Euclidean metrics (Dryden et al., 2009; Pigoli et al., 2014).A summary of the key steps in the extrinsic approach is:i) embedding to another manifold M m by raising the graph Laplacian matrix to a power,ii) mapping from M m to a (Euclidean) tangent space T ν ( M m ) in which to carry out statisticalanalysis,iii) inverse mapping from the tangent space T ν ( M m ) to the embedding manifold M m ,iv) reversing the powering in i), and projecting back to graph Laplacian space L m ,which we explain in more detail as follows. First write L = U Ξ U T by the spectral decompo-sition theorem, with Ξ = diag ( ξ , . . . , ξ m ) and U = ( u , . . . , u m ) , where { ξ i } i =1 ,...,m are theeigenvalues, which are non-negative as any L is positive semi-definite, and { u i } i =1 ,...,m are thecorresponding eigenvectors of L . We consider the following map which raises the graph Laplacianto the power α > : F α ( L ) = L α = U Ξ α U T : L m → Image( L m ) ⊂ M m . (2)2n this paper we take M m to be the Euclidean space of symmetric m × m matrices. In terms ofF α ( · ) we define the power Euclidean distance between two graph Laplacians as d α ( L , L ) = k F α ( L ) − F α ( L ) k , (3)where k X k = p trace( X T X ) is the Frobenius norm, also known as the Euclidean norm. For thespecial case α = 1 , (3) is just the Euclidean distance. Severn et al. (2020) further considered a Pro-crustes distance which includes minimization over an orthogonal matrix, approximately allowingrelabelling of the nodes, and in this case the embedding manifold M m is a Riemannian manifoldknown as the size-and-shape space (Dryden and Mardia, 2016, p.99). However, in this paper forsimplicity and because the labelling is known, we shall just consider the power Euclidean metric.After the embedding the data are mapped to a tangent space T ν ( M m ) of the embedding manifoldat ν using the bijective transformation π − ν : M m → T ν ( M m ) . In our applications we take ν = 0 and the tangent co-ordinates are v = vech( H F α ( L ) H T ) , where vech() is the vector of elements above and including the diagonal and H is the Helmert sub-matrix(Dryden and Mardia, 2016, p49). The main point of note is that the tangent space is a Euclideanspace of dimension m ( m − / . Hence, multivariate linear statistical analysis can be carried outin T ν ( M m ) ; for example, Severn et al. (2020) considered estimation, two-sample hypothesis tests,and linear regression.After carrying out statistical analysis, the fitted values are then transformed back to the graphLaplacian space as follows: P L ◦ G α ◦ π ν : T ν ( M m ) → L m (4)where G α : M m → M m is a map that reverses the power, and P L : M m → L m is the projectionto the closest point in graph Laplacian space using Euclidean distance. The projection P L is ob-tained by solving a convex optimization problem using quadratic programming, and the solutionis therefore unique. See Severn et al. (2020) for full details of this framework.For visualising results, it is useful to map the data and fitted regression lines in L m into R . Inthis paper we do so using principal component analysis (PCA) such that the two plotted dimen-sions reflect the two orthogonal dimensions of greatest sample variability in the tangent space(Severn et al., 2020). We first review the classical Nadaraya-Watson estimator (Watson, 1964; Nadaraya, 1965) be-fore defining an analogous version for data on L m . Consider the regression problem where wewant to predict an unknown variable y ( x ) ∈ R with known covariate x ∈ R p for the dataset ofindependent and identically distributed random variables ( { Y , X } , . . . , { Y n , X n } ) observed at ( { y , x } , . . . , { y n , x n } ) with E {| Y |} < ∞ . The aim is to estimate the regression function m ( x ) = E [ y ( x ) | x ] . ˆ m ( x ) = P ni =1 K h ( x − x i ) y i P ni =1 K h ( x − x i ) , (5)where K h ≥ is a kernel function with bandwidth h > .Consider now a version of the regression problem with y ( x ) replaced with an m × m graph Lapla-cian matrix L ( x ) ∈ L m with known covariates x ∈ R p and dataset ( { L , x } , . . . , { L n , x n } ) with E {| ( L ) ij |} < ∞ , i = 1 , . . . , m ; j = 1 , . . . , m . We wish to estimate the regression function Λ ( x ) = E [ L ( x ) | x ] . A natural analogue of ˆ m ( x ) in (5) for graph Laplacian data given covariate, x ∈ R p , is ˆ L NW ( x ) = P ni =1 K h ( x − x i ) L i P ni =1 K h ( x − x i ) = n X i =1 W hi ( x ) L i , (6)where W hi ( x ) = K h ( x − x i ) P ni =1 K h ( x − x i ) ≥ and note that P ni =1 W hi ( x ) = 1 . A common choice of kernel function is the Gaussian kernel K h ( u ) = 1 h √ π exp (cid:18) − k u k h (cid:19) , (7)which is bounded above and strictly positive for all u . We use a truncated version of (7) such that K h ( u ) = 0 for k u k > c (with c large) in order that this truncated kernel has compact support, asrequired by theoretical results presented later. Wherever ˆ L NW is defined (meaning that at least oneof the K h ( x − x i ) is non-zero) it is a sum of positively weighted graph Laplacians. Since the space L m is a convex cone (Ginestet et al., 2017), itself defined as the sum of positively weighted graphLaplacians, thus ˆ L NW ( t ) ∈ L m as required.The estimator in (6) can equivalently be written as the graph Laplacian that minimises a weightedsum of squared Euclidean, d , distances to the sample data ˆ L NW ( x ) = arg inf L ∈L m n X i =1 W hi ( x ) d ( L i , L ) = arg inf L ∈L m n X i =1 W hi ( x ) k L i − L k , (8)using weighted least squares. In principle, the Euclidean distance d in (8) can be replaced with adifferent distance metric, d , though solving for the estimator entails an optimisation on the mani-fold L m , which can be theoretically and computationally challenging. Hence instead we generaliseto other distances via an extrinsic approach, and define ˆ L NW,d ( x ) = P L arg inf L ∈M m n X i =1 W hi ( x ) d ( L i , L ) ! , (9)which is simpler provided, as here, the embedding manifold M m is chosen such that the opti-misation is straightforward. The projection is needed to map back to graph Laplacian space L m .4or the power Euclidean metric, d α , consider the Nadaraya-Watson estimator in the tangent space T ν ( M m ) , ˆ L M m ,α ( x ) = n X i =1 W hi ( x ) π − ν ( F α ( L i )) , (10)in terms of which, after mapping back to L m using (4), the resulting Nadaraya-Watson estimatorin the graph Laplacian space is ˆ L NW,α ( x ) = P L ◦ G α ◦ π ν (cid:16) ˆ L M m ,α ( x ) (cid:17) . (11)When α = 1 this simplifies to (6). First we show that the Nadaraya-Watson estimator for graph Laplacians (6) is uniformly weaklyconsistent.Let J n = E (cid:26)Z (cid:13)(cid:13)(cid:13) ˆ L NW ( x ) − Λ ( x ) (cid:13)(cid:13)(cid:13) µ (d x ) (cid:27) (12)where µ is the probability measure of x . Proposition 1.
Suppose the kernel function K h is non-negative on R p , bounded above, has com-pact support and is strictly positive in a neighbourhood of the origin. If E {k L k } < ∞ , as h → and nh d → ∞ it follows that J n → . Hence the Nadaraya-Watson estimator ˆ L NW ( x ) isuniformly weakly consistent for the true regression function Λ ( x ) .Proof. Consider the univariate regression problem for the ( i, j ) th element of ˆ L NW ( x ) . FromDevroye and Wagner (1980) we know that under the conditions of the proposition we have J n,ij = E (Z (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) ˆ L NW ( x ) (cid:17) ij − ( Λ ( x )) ij (cid:12)(cid:12)(cid:12)(cid:12) µ (d x ) ) → , i = 1 , . . . , m ; j = 1 , . . . , m, as h → and nh d → ∞ ; and since J n = m X i =1 m X j =1 J n,ij , thus J n → .The result can be extended to the power Euclidean distance based Nadaraya-Watson estimator (11).Let J n,α = E (cid:26)Z (cid:13)(cid:13)(cid:13) ˆ L NW,α ( x ) − Λ ( x ) (cid:13)(cid:13)(cid:13) µ (d x ) (cid:27) . (13)where µ is the probability measure of X . Proposition 2.
Under the conditions of Proposition 1 it follows that J n,α → . Hence the powerEuclidean Nadaraya-Watson estimator ˆ L α ( x ) is uniformly weakly consistent for the true regressionfunction Λ ( x ) . roof. First embed the graph Laplacians in the Euclidean manifold M m and map to a tangentspace T ν ( M m ) . Consider the univariate regression problem for the ( i, j ) th element of π − ν ( F α ( L ( x ))) .Again from Devroye and Wagner (1980); Spiegelman and Sacks (1980) we know that under theconditions of the proposition we have uniform weak consistency in the tangent space. J n,α,ij = E (Z (cid:12)(cid:12)(cid:12)(cid:12)(cid:16) ˆ L M m ,α ( x ) (cid:17) ij − π − ν ( F α ( Λ ( x ))) ij (cid:12)(cid:12)(cid:12)(cid:12) µ (d x ) ) → , i = 1 , . . . , m ; j = 1 , . . . , m, as h → and nh d → ∞ . Also, using the continuous mapping theorem and Pythagorean argumentsas in Severn et al. (2020), we have J n,α ≤ m X i =1 m X j =1 J n,α,ij → , as h → and nh d → ∞ . The result that the power Euclidean Nadaraya-Watson estimator is uniformly weakly consistentgives reassurance that the method is a sensible practical approach to non-parametric regressionfor predicting networks. The result is asymptotic, however, which leaves open the question ofhow to choose the bandwidth, h , in practice. One way to do so is to select it via cross validation(Efron and Tibshirani, 1993) as follows. Denote by ˆ L − i ( x ; h ) a Nadaraya-Watson estimator at x ,based on distance metric d , with bandwidth h , trained on all the training observations excluding the i th. Selection of bandwidth by cross validation then involves choosing h to minimise the criterion n X i =1 d (cid:16) L i , ˆ L − i ( x i ; h ) (cid:17) . (14) The Enron dataset was made public during the legal investigation of Enron by the Federal EnergyRegulatory Commission (Klimt and Yang, 2004) and an overview can be found in Diesner et al.(2005). Similar to Shetty and Adibi (2004) we use this data to form social networks between m = 151 employees and the data are available from Rossi and Ahmed (2015). For each monthwe create a network with employees as nodes. The edges between nodes have weights that are thenumber of emails exchanged between the two employees in the given month. The networks weconsider are for the whole months from June 1999 (month 1) to May 2002 (month 36), and westandardise by dividing by the trace of the graph Laplacian for each month. The aim is to modelsmooth trends in the structure of the dynamic networks as they evolve over time, and we also wishto highlight anomalous months where the network is rather unusual compared to the fitted trend.In Figure 1 we plot the distances between consecutive monthly graph Laplacians using Euclideandistance (a) and square root Euclidean distance (b). Some of the largest successive distances are6 a) . . . . . month E u c li dean d i s t an c e (b) . . . . . month S qua r e r oo t E u c li dean d i s t an c e Figure 1:
Distances, d ( L i − , L i ) , i = 2 , . . . , , between consecutive observations for the monthlyEnron networks for (a) the Euclidean metric, d , and (b) the square root Euclidean metric, d . at times − , − , − , − , − , and these are possible candidate positions foranomalous networks that are rather different.We provide a PCA plot of the first two PC scores in Figure 2(a),(b) and include the Nadaraya-Watson estimator projected into the space of the first two PCs. Here the bandwidth has beenchosen by cross-validation as h = 2 for the Euclidean case and h = 1 for the square root metric.The Nadaraya-Watson estimator provides a smooth path through the data, and the structure isclearer in the square root metric plot.We are interested in finding anomalies in the Enron dynamic networks and so we compute thedistances from each network to the fitted value from the Nadaraya-Watson estimate. Figure 2shows these residual distances of each graph Laplacian to the fitted Nadaraya-Watson values for (c)the Euclidean metric and (d) the square root metric. Some of the largest residuals are months 1,7,35for Euclidean and 7,33,34,35 for the square root metric, and these are candidates for anomalies.From Figure 2(b) it looks like there is an approximate horseshoe shape in the PC score plot whichcould be an example of the horseshoe effect (Kendall, 1971; Diaconis et al., 2008; Morton et al.,2017). We might conclude there is a change point in the data around months 20-26 from theseplots but this may be misleading (Kendall, 1970). Explained in Mardia et al. (1979, page 412),the horseshoe effect occurs when the distances which are “large”, between data points, appear thesame as those that are “moderate”. Morton et al. (2017) described this as a “saturation property”of the metric, and so on the PCA plot the point corresponding to a ‘large’ time is pulled in closerto time 1 than we intuitively would expect.As an alternative to PCA, which seeks to address this horseshoe effect, we consider multidimen-sional scaling (MDS) with a Mahalanobis metric in the tangent space (Mardia et al., 1979, p.31)7 a) −0.2 0.0 0.2 0.4 . . . . coordinate 1 c oo r d i na t e (b) −0.6 −0.2 0.0 0.2 0.4 0.6 − . . . . . . coordinate 1 c oo r d i na t e (c) . . . . . . month d ( L ^ , L ) (d) . . . . . . month d ( L ^ , L ) Figure 2:
PCA plots showing the data and Nadaraya-Watson curves, and residual plots for theEnron network data. In each plot the red digits indicate the observation number (month index).In the upper plots the black lines show the Nadaraya-Watson regression curves in the space ofthe first two principal components, using (a) the Euclidean metric, d , and h = 2 ; (b) the Squareroot Euclidean metric, d , and h = 1 . We performed the calculations for a various values h =0 . , , , , , and the chosen value of h was whichever that was optimal with respect to (14) . Plots(c) and (d) are corresponding residual plots showing distance d ( ˆ L i , L i ) between the fitted values ˆ L i and the observations L i . a) −40 −20 0 20 40 60 − − Coordinate 1 C oo r d i na t e (b) −6 −4 −2 0 2 4 6 − . − . . . . Coordinate 1 C oo r d i na t e (c) −0.1 0.0 0.1 0.2 0.3 − . . . Coordinate 1 C oo r d i na t e
12 34 56 78910111213141516171819202122232425262728293031323334 3536 (d) −1.0 −0.5 0.0 0.5 1.0 − . − . − . . . . Coordinate 1 C oo r d i na t e Figure 3:
The red digits indicate the month of the data. MDS plots using the Mahalanobis metricfor (a) the Enron data with α = 1 and (b) with α = using an overall estimate from ρ , and theMahalanobis metric when ρ is estimated to maximize the variance explained by PC1 for (c) theEnron data with α = 1 and (d) with α = . L k and L l , at times k and l respectively, which is: q ( π − ( F α ( L k )) − π − ( F α ( L l )) − µ ) T Σ − kl ( π − ( F α ( L k )) − π − ( F α ( L l )) − µ ) , where µ and Σ kl are the mean and covariance matrix of π − ( F α ( L k )) − π − ( F α ( L l )) respectively.Here we take µ as zero and consider an isotropic AR(1) model which has covariance matrix Σ kl = σ ρ | k − l | − ρ I m ( m − , which is a diagonal matrix where the diagonal elements are the variance of elements and wehave assumed a 0 covariance between any other elements. Writing y k = π − ( F α ( L k )) and v k = π − ( F α ( L k − )) we estimate ρ by least squares ρ = P nk =2 ( y Tk v k ) P nk =2 ( v Tk v k ) , and we take σ = 1 asthis is just an overall scale parameter. The Mahalanobis metric between graph Laplacians, L k and L l , can now be written as = r − ρρ | k − l | ( π − (( F α ( L k )) − π − ( F α ( L l ))) T ( π − ( F α ( L k )) − π − ( F α ( L l )))= r − ρρ | k − l | k π − ( F α ( L k )) − π − ( F α ( L l )) k = r − ρρ | k − l | k F α ( L k ) − F α ( L l ) k = r − ρρ | k − l | d α ( L k , L l ) . The plot of MDS with the Mahalanobis distance are given in Figure 3(a)-(b). In both plots thereare large distances between the first few and last few observations compared to the central observa-tions, which is broadly in keeping with Figure 1(a),(b) although the middle observations do seemtoo close together in the MDS plots. We consider an alternative estimate in choosing ρ that max-imises the variance explained by the first PC scores for each example, shown in Figure 3(c)-(d).These final MDS plots are more in agreement with the distance plots of Figure 1. In particular inFigure 3(d) we see that months 7, 34, 35, 36 look rather different from the rest.Finally we consider the main features of all the results from Figure 1-Figure 3 and we see that the7th, 34th and 35th months stand out as strong anomalies. The 7th month corresponds to December1999, and this is picked out to be an anomaly in Wang et al. (2014), believed to coincide withEnron’s tentative sham energy deal with Merrill Lynch created to meet profit expectations andboost the stock price. Month 34 and 35 correspond to March and April 2002 these correspond to theformer Enron auditor, Arthur Andersen, being indicted for obstruction of justice (The Guardian,2006). We consider an application where it is of interest to analyze dynamic networks from the novelsof Jane Austen and Charles Dickens. The 7 novels of Austen and 16 novels of Dickens wererepresented as samples of network data by Severn et al. (2020). Each novel is represented by anetwork where each node of the network is a word, and edges are formed with weights proportionalto the number of times a pair of words co-occurs closely in the text. For each novel we producea network counting pairwise word co-occurrences, and words are said to co-occur if they appear10 a) −0.02 0.00 0.02 − . . . coordinate 1 c oo r d i na t e (b) −0.08 0.00 0.08 − . . . coordinate 1 c oo r d i na t e (c) −0.02 0.00 0.02 − . . . coordinate 1 c oo r d i na t e
67 3 154 267 3 154 22012 17 1615 21181913109 228 231114 2012 17 1615 21181913109 228 231114 (d) −0.08 0.00 0.08 − . . . coordinate 1 c oo r d i na t e
67 3 154 22012 17161521 181913109 228 231114 (e) −0.02 0.00 0.02 − . . . coordinate 1 c oo r d i na t e (f) −0.08 0.00 0.08 − . . . coordinate 1 c oo r d i na t e Figure 4: Regression paths for Austen novels (blue) between the years 1794 to 1815, numbered1-7 according to chronology of the novels, and for Dickens novels (red) between the years 1836to 1870, numbered 8-23; using (left to right) d = d and d = d , with bandwidth (top to bottom) h = 1 , , , of which h = 4 gave the smallest value of the cross-validation criterion (14).11ithin five words of each other in the text. A choice that needs to be made is if we allow co-occurrences over sentence boundaries and chapter boundaries (Evert, 2008, Section 3), and for thisdataset we allow it. The data are obtained from CLiC (Mahlberg et al., 2016).We take the node set V as the m = 1000 most common words across all the novels of Austen andDickens. A pre-processing step for the novels is to normalise each graph Laplacian, in order toremove the gross effects of different lengths of the novels, by dividing each graph Laplacian byits own trace, resulting in a trace of 1 for each novel. Our key statistical goal is to investigate theauthors’ evolving writing styles, by carrying out non-parametric regression with a graph Laplacianresponse on the year t that each novel was written.We apply the Nadaraya-Watson regression to the Jane Austen and Charles Dickens networks sep-arately to predict their writing styles at different times. The response is a graph Laplacian andthe covariate is time t for each novel, with a separate regression for each novelist. We comparedusing the metrics d and d . For each author a Nadaraya-Watson estimate was produced for ev-ery 6 months within the period the author was writing. We compared different bandwidths, h , inthe Gaussian kernel. The results are shown in Figure 4 plotted on the first and second principalcomponent space for all the novels.For both metrics for Dickens when h = 1 the regression lines are not at all smooth. For bothmetrics with h = 2 the regression line for Dickens appears to show an initial smooth trend, then aturning point around the years 1850 and 1851 (between David Copperfield and Bleak House whichare novels 16 and 17 in Figure 4). After 1851 there is much less dependence on time. This changein structure is especially evident in the h = 4 plot for both metrics, which has the smallest valueof the cross-validation criterion (14) out of these choices h ∈ { , , } . In the year 1851 Dickenshad a tragic year including his wife having a nervous breakdown, his father dying and his youngestchild dying (Charles Dickens Info, 2020). We see that the possible turning point is around thesame time as these significant events.As there are far fewer novels written by Austen it is less obvious if there is any turning point in herwriting, however it is clear that Lady Susan (novel 1) is an anomaly, not fitting with the regressioncurve that does follow Austen’s other works more closely. Lady Susan is Austen’s earliest work,and is a short novella published 54 years after Austen’s death. The two applications presented involve a scalar covariate, but the Nadaraya-Watson estimator isappropriate to more general covariates, e.g. spatial covariates. A further extension would be toadapt the method of kriging, also referred to as Gaussian process prediction. Kriging is a geospatialmethod for prediction at points on a random field (e.g. see Cressie, 1993), and Pigoli et al. (2016)considered kriging for manifold-valued data. The kriging predictor of an unknown graph Laplacian L ( x ) on a random field with known coordinates x for the dataset ( { L , x } , . . . , { L n , x n } ) is ofthe form Z ( x ) = P ni =1 b ( x i ) L i , where the weights, b ( x i ) , are determined by minimizing the meansquare prediction error for a given covariance function.The Nadaraya-Watson estimator can also be applied in a reverse setting where some variable t i is dependent on the graph Laplacian L i , this can be written as t i = t ( L i ) . This could be used12f, for example, one had the times networks were produced and then wanted to predict the time anew network was produced. In this case the Nadaraya-Watson estimator is a linear combination ofknown t i values, weighted by the graph Laplacian distances, given by ˆ t ( L ) = P ni =1 K h ( d ( L , L i )) t i P ni =1 K h ( d ( L , L i )) , (15)where d can be any metric between two graph Laplacians. Severn (2019) provided an applicationof this approach using the Gaussian kernel defined in (7), predicting the time that a novel waswritten using the network graph Laplacian as a covariate.Other metrics could also be used, for example the Procrustes metric of Severn et al. (2020). Tosolve (9) for the Procrustes metric, the algorithm for obtaining a weighted generalised Procrustesmean given in Dryden and Mardia (2016, Chapter 7) can be implemented.In Euclidean space there are more general results for the Nadaraya-Watson estimator includingweak convergence in L p norm, rather than p = 2 results that we have used (Devroye and Wagner,1980; Spiegelman and Sacks, 1980). More general results also exist, e.g. see Walk (2002), in-cluding strong consistency. It will be interesting to explore which of these results can be extendedto graph Laplacians, although the additive properties of the p = 2 case have been particularlyimportant in our work. Acknowledgments
This work was supported by the Engineering and Physical Sciences Research Council [grant num-ber EP/T003928/1]. The datasets were derived from the following resources available in the publicdomain: The Network Repository http://networkrepository.com and CLiC https://clic.bham.ac.uk
References
Statistics for Spatial Data,
Revised Edition. Wiley, New York.Devroye, L. P. and Wagner, T. J. (1980). Distribution-free consistency results in nonparametricdiscrimination and regression function estimation.
Ann. Statist. , 8(2):231–239.Diaconis, P., Goel, S., and Holmes, S. (2008). Horseshoes in multidimensional scaling and localkernel methods.
The Annals of Applied Statistics , 2(3):777–807.Diesner, J., Frantz, T. L., and Carley, K. M. (2005). Communication networks from the Enron emailcorpus “it’s always about the people. Enron is no different”.
Computational & MathematicalOrganization Theory , 11(3):201–228.Dryden, I. L., Koloydenko, A., and Zhou, D. (2009). Non-Euclidean statistics for covariancematrices, with applications to diffusion tensor imaging.
Ann. Appl. Stat. , 3(3):1102–1123.13ryden, I. L. and Mardia, K. V. (2016).
Statistical shape analysis with applications in R . WileySeries in Probability and Statistics. John Wiley & Sons, Ltd., Chichester, second edition.Dubey, P. and Mueller, H.-G. (2019). Functional models for time-varying random objects.
Journalof the Royal Statistical Society, Series B . to appear.Efron, B. and Tibshirani, R. (1993).
An Introduction to the Bootstrap . Chapman and Hall, London.Evert, S. (2008). Corpora and collocations.
Corpus linguistics. An international handbook ,2:1212–1248.Friel, N., Rastelli, R., Wyse, J., and Raftery, A. E. (2016). Interlocking directorates in Irish com-panies using a latent space model for bipartite networks.
Proceedings of the National Academyof Sciences , 113(24):6629–6634.Ginestet, C. E., Li, J., Balachandran, P., Rosenberg, S., Kolaczyk, E. D., et al. (2017). Hypoth-esis testing for network data in functional neuroimaging.
The Annals of Applied Statistics ,11(2):725–750.Guo, X. and Srivastava, A. (2020). Representations, metrics and statistics for shape analysis ofelastic graphs. In , pages 3629–3638, Los Alamitos, CA, USA. IEEE Computer Society.Kang, X., Ganguly, A., and Kolaczyk, E. D. (2017). Dynamic networks with multi-scale temporalstructure. In .IEEE Signal Processing Society.Kendall, D. G. (1970). A mathematical approach to seriation.
Philosophical Transactions of theRoyal Society of London. Series A, Mathematical and Physical Sciences , 269(1193):125–134.Kendall, D. G. (1971). Abundance matrices and seriation in archaeology.
Probability Theory andRelated Fields , 17(2):104–112.Klimt, B. and Yang, Y. (2004). Introducing the Enron corpus. In
Proceedings of the Conferenceon Email and Anti-Spam , Mountain View, CA.Kolaczyk, E. D. (2009).
Statistical analysis of network data: methods and models . SpringerScience & Business Media.Mahlberg, M., Stockwell, P., de Joode, J., Smith, C., and O’Donnell, M. B. (2016). CLiC Dick-ens: novel uses of concordances for the integration of corpus stylistics and cognitive poetics.
Corpora , 11(3):433–463.Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979).
Multivariate Analysis . Academic Press,London.Marron, J. S. and Alonso, A. M. (2014). Overview of object oriented data analysis.
BiometricalJournal , 56(5):732–753.Morton, J. T., Toran, L., Edlund, A., Metcalf, J. L., Lauber, C., and Knight, R. (2017). Uncoveringthe horseshoe effect in microbial analyses.
MSystems , 2(1):e00166–16.Nadaraya, ´E. (1965). On non-parametric estimates of density functions and regression curves.
Theory of Probability & Its Applications , 10(1):186–190.14igoli, D., Aston, J. A. D., Dryden, I. L., and Secchi, P. (2014). Distances and inference forcovariance operators.
Biometrika , 101(2):409–422.Pigoli, D., Menafoglio, A., and Secchi, P. (2016). Kriging prediction for manifold-valued randomfields.
Journal of Multivariate Analysis , 145:117 – 131.Rastelli, R., Latouche, P., and Friel, N. (2018). Choosing the number of groups in a latent stochasticblock model for dynamic networks.
Network Science , 6(4):469–493.Rossi, R. A. and Ahmed, N. K. (2015). The network data repository with interactive graph analyticsand visualization. In
AAAI . http://networkrepository.com.Severn, K. E. (2019).
Manifold-valued data analysis of networks and shapes . PhD thesis, Univer-sity of Nottingham.Severn, K. E., Dryden, I. L., and Preston, S. P. (2020). Manifold valued data analysis of samplesof networks, with applications in corpus linguistics. arXiv e-prints , page arXiv:1902.08290.Shetty, J. and Adibi, J. (2004). The Enron email dataset database schema and brief statistical report.
Information sciences institute technical report, University of Southern California , 4(1):120–128.Spiegelman, C. and Sacks, J. (1980). Consistent window estimation in nonparametric regression.
Ann. Statist.
Modeling Uncertainty: An Examinationof Stochastic Theory, Methods, and Applications , pages 201–223, New York, NY. Springer US.Wang, H., Tang, M., Park, Y., and Priebe, C. E. (2014). Locality statistics for anomaly detectionin time series of graphs.
IEEE Transactions on Signal Processing , 62(3):703–717.Watson, G. S. (1964). Smooth regression analysis.