A Latent Space Model for Multilayer Network Data
AA Latent Space Model for Multilayer Network Data
Juan Sosa, Universidad Nacional de Colombia, Colombia ∗ Brenda Betancourt, University of Florida, United States † Abstract
In this work, we propose a Bayesian statistical model to simultaneously char-acterize two or more social networks defined over a common set of actors. The keyfeature of the model is a hierarchical prior distribution that allows us to representthe entire system jointly, achieving a compromise between dependent and inde-pendent networks. Among others things, such a specification easily allows us tovisualize multilayer network data in a low-dimensional Euclidean space, generatea weighted network that reflects the consensus affinity between actors, establish ameasure of correlation between networks, assess cognitive judgements that subjectsform about the relationships among actors, and perform clustering tasks at differentsocial instances. Our model’s capabilities are illustrated using several real-worlddata sets, taking into account different types of actors, sizes, and relations.
Keywords:
Bayesian Modeling, Cognitive Social Structures Data, Latent Space Models,Markov Chain Monte Carlo, Multilayer Network Data, Social Networks.
The study of information that emerges from the interconnectedness among autonomouselements in a system (and the elements themselves) is extremely important in the un-derstanding of many phenomena. Structures formed by these elements (individuals oractors) and their interactions (ties or connections), commonly known as networks, arepopular in many research areas such as finance (studying alliances and conflicts amongcountries as part of the global economy), social science (studying interpersonal social ∗ [email protected] † bbetancourt@ufl.edu a r X i v : . [ c s . S I] F e b elationships and social schemes of collaboration such as legislative cosponsorship net-works), biology (studying arrangements of interacting genes, proteins or organisms), epi-demiology (studying the spread of a infectious disease), and computer science (studyingthe Internet, the World Wide Web, and also communication networks), just to mentiona few examples, primarily because interactions typically arise under several contexts orpoints of view.Relational structures consisting of J types of interactions (layers or views) establishedover a common set of I actors are regularly referred to as either multilayer or multiviewnetwork data, which results in a sequence of J adjacency matrices Y , . . . , Y J , with Y j =[ y i,i (cid:48) ,j ] i,i (cid:48) =1 ,...,I,i (cid:54) = i (cid:48) for j = 1 , . . . , J , each having structural zeros along the main diagonal(note that y i,i (cid:48) ,j ≡ y i (cid:48) ,i,j for undirected relations). This type of data is very frequentnowadays. For instance, the interactions that employees have with others accordingto their roles in work are not necessarily the same as the interpersonal relationshipsthey build among them; however, the corresponding social structures defined by thesetwo types of relationships may have some characteristics in common. Thus, given therichness of information provided in Y = { Y j } , our main goal consists of modelingdependencies both within and between layers in order to formally test features aboutthe social dynamics in the system.A very popular statistical model in the literature for a single network is the latent posi-tion model given in Hoff et al. (2002). According to this model, interaction probabilitiesmarginally depend on how close or far apart actors are on a latent “social space” (a K -dimensional vector space, typically R K , in which each individual occupies a fixedposition). This formulation is appealing because latent structures based on distancesnaturally induce transitivity and homophily, which are typical features found in manysocial networks. Other meaningful advances in latent space models for networks can befound in Nowicki and Snijders (2001), Schweinberger and Snijders (2003), Hoff (2005,2008, 2009), Handcock et al. (2007), Linkletter (2007), Krivitsky and Handcock (2008),Krivitsky et al. (2009), and Li et al. (2011).Here, we extend Hoff’s latent position model in order to describe the generative processof cross-sectional multilayer network data. The key feature of our model is a hierarchicalprior distribution that allows us to characterize the entire system jointly. Such a priorspecification is very convenient for multiple reasons. First, the model provides a directdescription of actors’ roles within and across networks at global and specific levels.Second, it provides the tools for representing several network features effortlessly at anyinstance. Finally, the proposed framework accounts for dependence structures betweenlayers which is key to perform formal tests about actor and network characteristics.Perhaps the closest in spirit to our modeling strategy is the work given in Gollini and2urphy (2016) and Salter-Townshend and McCormick (2017). Unlike our approach,Gollini and Murphy (2016) introduce a latent space model assuming that the interac-tion probabilities in each network view is explained by a unique latent variable. Later,Salter-Townshend and McCormick (2017) consider the same assumption but in the con-text of a multivariate Bernoulli likelihood, which leads to a clear estimate of interviewdependence. Our proposal builds on the latent configuration of these models, by consid-ering a full hierarchical prior specification that provides a parsimonious characterizationof actors from many perspectives.Aside from the previous work, other alternatives for studying multilayer network datahave emerged during the last two years from the latent space modeling perspective. Inbrain connectomics, Durante and Dunson (2018) present a Bayesian nonparametric ap-proach via mixture modeling, which reduces dimensionality and efficiently incorporatesnetwork information within each mixture component by leveraging latent space repre-sentations. More recently, Wang et al. (2019) propose a method to estimate a commonstructure and low-dimensional individual-specific deviations from replicated networks,based on a logistic regression mapping combined with a hierarchical singular value de-composition. In turn, D’Angelo an collaborators extend latent space models in othercontexts, by considering node-specific effects (D’Angelo et al., 2018), network-specificparameters and edge-specific covariates (D’Angelo et al., 2019), and finally, a clusteringstructure in the framework of an infinite mixture distribution (D’Angelo et al., 2020).Other important advances from a frequentist point of view are available in Zhang (2020).Additional work related to cross-sectional multilayer network data includes communitydetection (e.g., Han et al., 2015, Reyes and Rodriguez, 2016, Paul and Chen, 2016, Gaoet al., 2019, Paez et al., 2019, and Paul et al., 2020), and the perception assessmentin cognitive social structures (e.g., Swartz et al., 2015, Sewell, 2019, and Sosa andRodriguez, 2021). Finally, from the dynamic point of view, there is a large variety ofapproaches to modeling network evolution over time (e.g., Durante and Dunson, 2014,Hoff, 2015, Sewell and Chen, 2015, 2016, 2017, Gupta et al., 2018, Kim et al., 2018,Turnbull, 2020, Betancourt et al., 2020).Our contribution has many folds. In Section 2, we present our proposal for modelingmultiple layer network data, including prior elicitation. In Section 3, we discuss thetopics of identifiability and model selection for our approach. Next, in Section 4, weprovide two illustrations using popular data sets in the literature, for which we developformal tests involving network correlation and perceptual agreement, as well as a fullanalysis of the social dynamics. In Section 5, we carry out a cross-validation study onadditional datasets in order to test the predictive capabilities of our proposed model.Finally, concluding remarks and directions for future work are provided in Section 6.3 Latent space models
Since the foundational work of Hoff et al. (2002), generalized linear mixed models becamea popular alternative to model networks. In particular, consider an undirected binarynetwork Y = [ y i,i (cid:48) ] in which the y i,i (cid:48) s, i, i (cid:48) = 1 , . . . , I , i < i (cid:48) , are assumed to be condition-ally independent with interaction probabilities ϑ i,i (cid:48) ≡ Pr ( y i,i (cid:48) = 1 | ζ, γ i,i (cid:48) ) = Φ( ζ + γ i,i (cid:48) ) ,where Φ( · ) denotes the cumulative distribution function of the standard Gaussian distri-bution (other link functions can be considered), ζ is a fixed effect representing the globalpropensity of observing an edge between actors i and i (cid:48) , and γ i,i (cid:48) is an unobserved dyad-specific random effect representing any additional patterns unrelated to those capturedby ζ . Following results in Hoover (1982) and Aldous (1985) (see also Hoff, 2008), it canbe shown that if the matrix of random effects [ γ i,i (cid:48) ] is jointly exchangeable, there existsa symmetric function α ( · , · ) and a sequence of independent random variables (vectors) u , . . . , u I such that γ i,i (cid:48) = α ( u i , u i (cid:48) ) . It is mainly through α ( · , · ) that we are able tocapture relevant features of the network. A number of potential formulations for α ( · , · ) have been explored in the literature to date; see Minhas et al. (2019) and Sosa andBuitrago (2021) for a review.In particular, consider the latent position model (LPM) given in Hoff et al. (2002). Thismodel assumes that each actor i has an unknown position u i in a social space of latentcharacteristics, typically u i = ( u i, , . . . , u i,K ) ∈ R K , where K is assumed to be known,and that the probability of an edge between two actors may decrease as the latentcharacteristics of the individuals become farther apart of each other. In this spirit, thelatent effects can be specified as γ i,i (cid:48) = − e θ (cid:107) u i − u i (cid:48) (cid:107) , where e θ serves as a weightingfactor that regulates the contribution attributed to the latent effects, and therefore, y i,i (cid:48) | ζ, θ, u i , u i (cid:48) ind ∼ Ber (cid:0) Φ (cid:0) ζ − e θ (cid:107) u i − u i (cid:48) (cid:107) (cid:1)(cid:1) . In order to perform a fully Bayesiananalysis, we must specify a prior distribution on the model parameters; a standard choicethat works well in practice consists in setting mutually independent prior distributions, ζ ∼ N (0 , τ ζ ) , θ ∼ N (0 , τ θ ) , and u i iid ∼ N K ( , σ I ) , for constants τ ζ , τ θ , σ > , althoughother similar formulations are available (e.g., Rastelli et al., 2019). Thus, the entiremodel has IK + 2 unknown parameters to estimate, namely, ζ, θ, u , . . . , u I , associatedwith the hyperparameters τ ζ , τ θ , and σ , which need to be picked sensibly to ensureappropriate model performance. In our experience, letting τ ζ = τ θ = 3 and σ = 1 / isa reasonable choice; however, other heuristics are possible (e.g., Krivitsky et al., 2009).Figure 1 provides a directed acyclic graph (DAG) representation of the LPM for a singlenetwork. In the following section we present an extension of this model that is suitedfor multilayer networks. 4 igure 1: DAG representation of the LPM for a single network. Circles represent either randomvariables or random vectors, and the edges convey conditional independence. Squares representfixed quantities (constants). Here, we present our approach to simultaneously model a set of J ≥ undirected binarynetworks Y , . . . , Y J , defined over a common set of I actors, with Y j = [ y i,i (cid:48) ,j ] for j = 1 , . . . , J . Since each network contains relevant information about a determinedaspect of the social dynamics, instead of just fitting independent LPMs to each network,the main idea behind our approach consists of borrowing information across networksby means of a hierarchical prior specification on the interaction probabilities ϑ i,i (cid:48) ,j .Our model is an unequivocal hierarchical extension of the LPM for a single network thataccommodates relevant features associated with multilayer network data. Here, we stillassume that observations are conditionally independent, y i,i (cid:48) ,j | ϑ i,i (cid:48) ,j ind ∼ Ber ( ϑ i,i (cid:48) ,j ) , andconstruct a hierarchical prior on the array [ ϑ i,i (cid:48) ,j ] , by letting ϑ i,i (cid:48) ,j = Φ (cid:0) ζ j − e θ j (cid:107) u i,j − u i (cid:48) ,j (cid:107) (cid:1) , (1)where the additional index j makes explicit the reference to network j , i.e., ζ j is the globalpropensity of observing an edge between actors i and i (cid:48) in network j , e θ j is a weightingfactor that regulates the contribution attributed to the latent effects in network j , and u i,j = ( u i,j, , . . . , u i,j,K ) is the latent position of actor i in network j . In this context, notethat the interpretation of the latent structure remains unchanged: if u i,j and u i (cid:48) ,j “moveaway” from each other in the social space, then (cid:107) u i,j − u i (cid:48) ,j (cid:107) increases, and therefore,the probability of observing an edge between actors i and i (cid:48) in network j may decreasedepending on the regularization provided by e θ j .If mutually independent prior distributions were assigned to each set of ζ j s, θ j s, and u i,j s, then such a formulation would be equivalent to fitting independently a LPM toeach network. Instead, we consider a hierarchical prior distribution that characterizes theheterogeneity of the model parameters across networks. Our approach parsimoniously5laces conditionally independent Gaussian priors as follows: ζ j | µ ζ , τ ζ iid ∼ N (cid:0) µ ζ , τ ζ (cid:1) , θ j | µ θ , τ θ iid ∼ N (cid:0) µ θ , τ θ (cid:1) , u i,j | η i , σ ind ∼ N K (cid:0) η i , σ I (cid:1) . (2)On the one hand, ( µ ζ , τ ζ ) and ( µ θ , τ θ ) parameterize the sampling distributions that de-scribe the heterogeneity across networks in terms of fixed effects ζ , . . . , ζ J and weightinglog-factors θ , . . . , θ J , respectively. On the other hand, the mean η i = ( η i, , . . . , η i,K ) canbe conveniently interpreted as the average “global” position of actor i in relation to thedynamics that define social interactions in the system. Now, we can capture similaritiesamong the observed networks and borrow information across them, mainly by placing acommon prior distribution on η , . . . , η I . Thus, we let η i | ν , κ iid ∼ N K (cid:0) ν , κ I (cid:1) , σ ∼ IG ( a σ , b σ ) , (3)in order to characterize between-actor mean sampling variability in a straightforwardfashion. Furthermore, note that the sampling variability of the latent positions σ isassumed to be constant across actors and networks. We believe this is a sensible choicebecause inferences on the latent positions seem to be invariant when we eliminate suchan assumption.Finally, the model is completed by specifying prior distributions in a conjugate fashionon the remaining model parameters: µ ζ ∼ N (cid:0) m ζ , v ζ (cid:1) , µ θ ∼ N (cid:0) m θ , v θ (cid:1) , ν ∼ N K ( m ν , V ν ) ,τ ζ ∼ IG ( a ζ , b ζ ) , τ θ ∼ IG ( a θ , b θ ) , κ ∼ IG ( a κ , b κ ) , (4)where a σ , b σ , a ζ , b ζ , a θ , b θ , a κ , b κ , m ζ , v ζ , m θ , v θ , m ν , V ν are fixed hyperparameters. There-fore, the full set of model parameters is Υ ≡ Υ I,J,K = ( ζ , . . . , ζ J , θ , . . . , θ J , u , , . . . , u I,J , η , . . . , η I , σ, µ ζ , τ ζ , µ θ , τ θ , ν , κ ) , which includes IK ( J + 1) + 2 J + K + 6 unknown quantities to estimate. Figure 2 shows aDAG representation of our Multilayer Network Latent Position Model (MNLPM) givenin (1), (2), (3), and (4). Note the clear hierarchical structure in the model.This model for multilayer network data is such that the resulting joint marginal distri-bution of the data is fully jointly exchangeable, which means that the joint distributionof { y i,i (cid:48) ,j } is the same as the distribution of { y π ( i ) ,π ( i (cid:48) ) ,π ( j ) } only if π = π , where π and π are permutations of [ I ] , and π is a permutation of [ J ] . Full joint exchangeability(rather than a weaker form of exchangeability) is particularly attractive in this settingbecause all indexes i , i (cid:48) (and potentially j ) refer to the same set of actors (Sosa andRodriguez, 2021). 6 igure 2: DAG representation of the MNLPM for multilayer network data. Once again, careful elicitation of the hyperparameters is key to ensure appropriate modelperformance since the model is sensitive to this choice. In our experience, the followingheuristic procedure produces adequate results for a wide variety of multilayer networkdatasets. In the absence of prior information, we set m θ = m ζ = 0 and m ν = and V ν = v ν I in order to center the model, roughly speaking, around an Erdös-Rényi model(Erdös and Rényi, 1959), and also ensure that the prior distributions are invariant torotations of the latent space (see also Section 3.1).Now, we establish some constraints that allow us to appropriately contrast models con-structed with different values of K . Thus, for the prior distributions on the varianceparameters, we naturally let a ζ = a θ = a σ = a κ = 3 , which leads to a proper prior withfinite moments and a coefficient of variation equal to 1. Under this set up, it can beshown that marginally Var ( ζ j ) = b ζ v ζ , Var ( θ j ) = b θ v θ , Var ( u i,j,k ) = b σ b κ v ν , each of which we split equally among all terms. First, resembling a regular LPM, we set Var ( u i,j,k ) = 1 / a priori, such that b σ = b κ = 2 / and v ν = 1 / . Then, from a naive7pplication of the delta method we obtain that Var ( θ j ) . = 2 log (cid:18) − Φ − ( ϑ ) E ( (cid:107) u i,j − u i (cid:48) ,j (cid:107) ) (cid:19) where ϑ is the prior probability of observing an edge between any two actors (which canbe tuned to reflect prior information), and u i,j − u i (cid:48) ,j ∼ N K ( , I ) as long as Var ( u i,j,k ) = Var ( η i,k ) = 1 / . In our experiments, we set b θ = Var ( θ j ) and v θ = Var ( θ j ) / with ϑ =0 . . Finally, we set b ζ = Var ( ζ j ) and v ζ = Var ( ζ j ) / with Var ( ζ j ) = 4 E ( (cid:107) u i,j − u i (cid:48) ,j (cid:107) ) ,which allows a wide range of values of ζ j .Table 1 displays specific hyperparameter values for K = 1 , . . . , . In addition, Figure3 shows histograms of 10,000 independent realizations from the induced marginal priordistribution of the interaction probabilities, ϑ i,i (cid:48) ,j , for several values of K . Note thatthese distributions are quite similar, exhibiting a mode at ϑ i,i (cid:48) ,j = 0 . (as expected) andthen a somewhat uniform behavior with a slight peak towards ϑ i,i (cid:48) ,j = 1 . K E ( d i,i (cid:48) ,j ) SD ( d i,i (cid:48) ,j ) b ζ b θ b σ b κ v ζ v θ v ν Table 1: Expected value and standard deviation of the prior latent distance d i,i (cid:48) ,j = (cid:107) u i,j − u i (cid:48) ,j (cid:107) ,rate hyperparameters, and variance components for the prior distributions on the variance pa-rameters for K = 1 , . . . , . For a given latent dimension K , the posterior distribution p ( Υ | Y ) is explored us-ing Markov chain Monte Carlo methods (MCMC; e.g., Gamerman and Lopes, 2006).The computational algorithm entails a combination of Gibbs sampling and Metropolis-Hastings steps (e.g., Haario et al., 2001). Details about the MCMC algorithm can befound in the Appendix A. In the following sections we discuss the issues of identifiabilityand model selection which are common in the latent space modeling framework.8 D en s i t y (a) K = 1 J D en s i t y (b) K = 2 J D en s i t y (c) K = 3 J D en s i t y (d) K = 4 J D en s i t y (e) K = 5 J D en s i t y (f) K = 6 Figure 3: Marginal prior distribution of the interaction probabilities ϑ i,i (cid:48) ,j for K = 1 , . . . , . Our proposed MNLPM inherits the property of invariance to rotations and reflectionsof the social space from the simple latent space model of Hoff et al. (2002). Indeed, forany K × K orthogonal matrix Q , the likelihood associated with the reparameterization ˜ u i,j = Q u i,j is independent of Q since (cid:107) u i,j − u i (cid:48) ,j (cid:107) = (cid:107) ˜ u i,j − ˜ u i (cid:48) ,j (cid:107) . This lack of iden-tifiability does not affect our ability to make inferences on the interaction probabilities ϑ i,i (cid:48) ,j s (which are identifiable). However, it does hinder our ability to provide posteriorestimates of latent-position based measures (e.g., network correlation), including thelatent positions themselves.We address this invariance issue using a common parameter expansion used by Hoffet al. (2002) and many others. In particular, the problem of identifiability is addressedthrough a post-processing step in which the B posterior samples are rotated/reflected9o a shared coordinate system. For each sample Υ ( b ) , for b = 1 , . . . , B , an orthogonaltransformation matrix Q ( b ) is obtained by minimizing the Procrustes distance, ˜ Q ( b ) = arg min Q ∈S K tr (cid:110)(cid:0) E (1) − E ( b ) Q (cid:1) T (cid:0) E (1) − E ( b ) Q (cid:1)(cid:111) , (5)where S K denotes the set of K × K orthogonal matrices and E ( b ) is the I × K matrixwhose I rows correspond to the transpose of η ( b )1 , . . . , η ( b ) I . The minimization problem in(5) can be easily solved using singular value decompositions (e.g., see Borg and Groenen,2005). Indeed, ˜ Q ( b ) = R ( b ) L ( b ) T , where L ( b ) D ( b ) R ( b ) T is the singular value decompositionof E (1) T E ( b ) . Once the matrices ˜ Q (1) , . . . , ˜ Q ( B ) have been obtained, posterior inference forthe latent positions of our model is based on the transformed coordinates ˜ u ( b ) i,j = ˜ Q ( b ) u ( b ) i,j and ˜ η ( b ) i = ˜ Q ( b ) η ( b ) i . In general, setting K = 2 for the dimension of the latent space is a sensible choice,as it simplifies the visualization and description of social relationships. However, ourobjective goes beyond a mere description of multilayer network data, an in consequencethe value of K plays a critical role in the results.The network literature has largely focused on the Bayesian Information Criterion (BIC;e.g., Hoff, 2005, Handcock et al., 2007, Airoldi et al., 2009). However, the BIC is ofteninappropriate for hierarchical models since the hierarchical structure implies that theeffective number of parameters will be typically less than the actual number of param-eters in the likelihood. An alternative to the BIC is the Watanabe-Akaike InformationCriterion ( WAIC ; Watanabe, 2010, Watanabe, 2013, Gelman et al., 2014),
WAIC ( K ) = − (cid:88) j (cid:88) i,i (cid:48) : i
In this section, we illustrate and evaluate the performance of our MNLPM using twobenchmark data sets: The bank wiring room data of Roethlisberger and Dickson (2003)and the friendship cognitive social structure of Krackhardt (1987). The main contri-butions involving the characterization of the social dynamics with our MNLPM areformal approaches to measure network correlation and perceptual agreement, which arerespectively illustrated with each data set.
These are the observational data on I = 14 Western Electric (Hawthorne Plant) employ-ees from the bank wiring room presented in Roethlisberger and Dickson (2003). Theemployees worked in a single room and include two inspectors (actors 1 and 2), threesolderers (actors 12, 13, and 14), and nine wiremen or assemblers (actors 3 to 11). Theauthors gathered data about J = 4 symmetric interaction categories including: par-ticipation in horseplay (Horseplay, network 1), participation in arguments about openwindows (Arguments, network 2), friendship (Friendship, network 3), and antagonisticbehaviour (Antagonism, network 4). This dataset is considered nowadays as a standardreferent to test models for multilayer network data (e.g., Bartz-Beielstein et al., 2014,Liu, 2020, and Abdollahpouri et al., 2020). Figure 4 displays a visualization of all therelational layers.
12 3 45678 91011 12 1314 (a) Horseplay (b) Arguments
12 34 567 8 910 111213 14 (c) Frienship (d) Antagonism
Figure 4: Visualization of the bank wiring room data. Inspectors are shown in blue, solderersin yellow, and wiremen or assemblers in red.
We implement our MNLPM for the bank wiring room data following the computationalapproach described in Section 3. The results presented below are based on B = 10 , samples of the posterior distribution obtained after thinning the original Markov chains11very 10 observations and a burn-in period of 100,000 iterations. All chains mix reason-ably well. Left panel of Figure 5 shows the WAIC computed for several MNLPMs fittedusing a range of latent dimensions of the social space. This criterion clearly supports K = 3 ( WAIC = 197.1) as an optimal choice, which is the latent dimension we use in allour analyses henceforth. The effective sample sizes of the model parameters followingthe MCMC algorithm discussed above range from 4,081 to 10,000. Right panel of Figure5 displays the log-likelihood chain associated with the latent dimension that optimizesthe
WAIC , which shows no signs of lack of convergence. l l l l l l K W A I C (a) Information criterion − − − − − − − Iteration
Log − li k e li hood (b) Log-likelihood, K = 3 Figure 5:
WAIC values to select the latent dimension K of the social space for the Bayesiananalysis of the bank wiring room data using our MNLPM, and log-likelihood chain associatedwith the value of K optimizes the WAIC ( K = 3 ). Unlike other models for multilayer network data (e.g., Salter-Townshend and McCormick,2017), our approach provides a straightforward mechanism to construct a “consensus”network. Indeed, the average positions η , . . . , η I can be used to generate a weighted net-work, υ i,i (cid:48) = Φ ( µ ζ − e µ θ (cid:107) η i − η i (cid:48) (cid:107) ) , that “collapses” all the relational layers in a singlenetwork by weighting them according to the mean parameters of the hierarchical priordistribution. The consensus network can be very useful when an overall summary ofthe social dynamics is required. Figure 6 displays heat-maps for the matrix of posterior12eans E ( υ i,i (cid:48) | Y ) and the proportion of observed links across networks, J (cid:80) Ij =1 y i,i (cid:48) ,j .Although the estimate provided by our model is “denser” than the empirical propor-tion, note that the estimates are similar. This also suggests that the model correctlycharacterizes the data generating process. (a) E ( υ i,i (cid:48) | Y ) (b) J (cid:80) Ij =1 y i,i (cid:48) ,j Figure 6: Consensus network estimates for the bank wiring room data. The left panel providesthe posterior mean under our MNLPM, and the right panel shows the proportion of observedlinks across networks.
Actor-specific latent positions u , , . . . , u I,J provide a powerful tool for describing socialinteractions. Figure 7 shows Procrustes-transformed latent position estimates E ( ˜ u i,j | Y ) along the two dimensions with the highest variability for each layer of the system. Eventhough the social behaviour is similar across layers, there are important particularities.First, note that the social dynamics of Horseplay and Friendship are quite similar, ex-cept that in Horseplay latent positions seem to be more clustered together. Furthermore,many social patterns are evident. For instance, actors who have close positions in Horse-play and Friendship, typically have distant positions in Antagonism; such an effect isparticularly clear among actors 3 and 6, and actors 10 and 14. Lastly, complementing13he insights provided by the consensus network, average positions η , . . . , η I can be alsovery useful to perform an “global” visualization of the social dynamics. (a) Horseplay (b) Arguments (c) Frienship (d) Antagonism Figure 7: Posterior means of Procrustes-transformed latent positions E ( ˜ u i,j | Y ) along the twodimensions with the highest variability, for the bank wiring room data. Inspectors are shown inblue, solderers in yellow, and wiremen or assemblers in red. As pointed out above, latent positions allow us to distinguish groups of actors thatfulfil similar social roles. In order to identify such clusters, we can either apply someunsupervised clustering technique (e.g., hierarchical clustering, k -means clustering) orinclude directly into the model a set of parameters that assign actors to groups. Thelatter is quite preferable since the uncertainty related to the clustering task can bedirectly quantified along with its relationship to other model parameters (see Section 6for more details). Another key feature of the MNLPM is that it implicitly allows us to obtain correlationmeasures between layers as a direct by-product of the model parameterization. Indeed,we define the correlation between layers j and j (cid:48) , for j, j (cid:48) = 1 , . . . , J , as ρ j,j (cid:48) = cor ( u ∗ ,j , . . . , u ∗ I,j , u ∗ ,j (cid:48) , . . . , u ∗ I,j (cid:48) ) , where u ∗ i,j is the maximum Procrustes-transformed latent characteristic across latentdimensions of actor i in layer j , i.e., u ∗ i,j = max { ˜ u i,j, , . . . , ˜ u i,j,K } (alternative definitionsfor ρ j,j (cid:48) are possible; e.g., by considering the median instead of the maximum). Thisapproach represents the network correlation after accounting jointly for social structureencoded within each layer, thanks to the MNLPM’s hierarchical specification. Left panelin Figure 8 shows credible intervals along with point estimates for all pairwise networkcorrelations. We see that all pair of layers are positively correlated. This characteristic14s particularly evident between Horseplay and Friendship (networks 1 and 3), which isconsistent with the social dynamics described above.Lastly, we test our correlation approach by considering a set of independent networkswith no underlying structure. To do so, we independently generate J = 4 Erdös-Rényinetworks with I = 14 actors and interaction probability 0.1, and then fit the MNLPMin order to obtain the pairwise network correlations. Right panel in Figure 8 present thecorresponding inference for these quantities. We see that the correlation between everypair of layers is negligible since the credible intervals are centered at zero, which is theexpected behaviour for data with no correlation structure. −1.0 −0.5 0.0 0.5 1.0Correlation N e t w o r ks l llll l (a) Bank wiring room data −1.0 −0.5 0.0 0.5 1.0Correlation N e t w o r ks llllll (b) Synthetic data Figure 8: 95% credible intervals and posterior means for all pairwise network correlations ρ j,j (cid:48) .Left panel: Bank wiring room data (participation in horseplay, network 1; participation inarguments about open windows, network 2; friendship, network 3; and antagonistic behaviour,network 4). Right panel: Synthetic data. We assess our MNLPM fit using both in-sample and out-of-sample metrics. Our in-sample assessment relies on two approaches. First, we compare the observed data y i,i (cid:48) ,j against the corresponding probability of interaction posterior means E ( ϑ i,i (cid:48) ,j | Y ) . Wesee in Figure 9 that point estimates concur with the raw data, which clearly suggests15hat the model fits the data well in terms of reproducibility.Horseplay Arguments Friendship Antagonism R a w d a t a P r o b a b ili t i e s Figure 9: Multilayer network data y i,i (cid:48) ,j and probability of interaction posterior means E (cid:0) ϑ i,i (cid:48) ,j | Y (cid:1) , for the bank wiring room data. Now, following Gelman et al. (2013), we further explore the in-sample fit of each modelby replicating pseudo-data from the fitted model and calculating a variety of summarystatistics for each sample, whose distributions are then compared against their values inthe original sample. Figure 10 shows credible intervals along with point estimates fora set of relevant network measures, including the density, assortativity, and clusteringcoefficient, among others (see Kolaczyk and Csárdi, 2014 for details about these struc-tural summaries). Note that the model appropriately captures these structural features(perhaps with the exception of the assortativity in Antagonism), since observed valuesbelong to the corresponding credible intervals; even most of the estimates virtually co-incide with the observed values. Thus, pseudo-data generation also provides evidence ofproper in-sample properties in favour of our model. Finally, the out-of-sample predictiveperformance of the model is presented in Section 5.16 . . . . Network1 2 3 4 l l l l (a) Density . . . . . Network1 2 3 4 l l l l (b) Clustering coeff. − . . . Network1 2 3 4 l l l l (c) Assortativity . . . . . . Network1 2 3 4 l l l l (d) Mean geodesic distance . . . . . . Network1 2 3 4 l l l l (e) Mean eigen-centrality
Network1 2 3 4 l l l l (f) Mean degree
Figure 10: 95% credible intervals, posterior means (black circle), and observed values (redsquare) associated with the empirical distribution of a battery of summary statistics, based on10,000 replicas of the bank wiring room dataset (participation in horseplay, network 1; partic-ipation in arguments about open windows, network 2; friendship, network 3; and antagonisticbehaviour, network 4).
In this section we develop a formal test to assess the level of “agreement” (as opposedto “accuracy”, which requires the definition of an external “gold standard”), between anactor’s self perception of their own position in a social environment and that of otheractors embedded in the same system (e.g., Swartz et al., 2015, Sewell, 2019, Sosa andRodriguez, 2021). Our approach relies on the hierarchical structure of the MNLPM,which allow us to define a measure of cognitive agreement. The posterior distribution ofsuch a measure makes possible to identify those individuals whose position in the socialspace agrees with the judgements of other actors.17 cognitive social structure (CSS) is defined by a set of cognitive judgements thatsubjects form about the relationships among actors (themselves as well as others) whoare embedded in a common environment. Hence, each subject reports a full descriptionof the social network structure. We consider a CSS reported by Krackhardt (1987) inwhich I = 21 management personnel in a high-tech machine manufacturing organizationwere observed in order to evaluate the effects of a recent management interventionprogram. Each person was asked to fill out a questionnaire indicating not only whohe/she believes his/hers friends are, but also his/her perception of others friendships.Thus, we have a collection composed of J = 21 undirected binary networks Y , . . . , Y I ,with Y j = [ y i,i (cid:48) ,j ] , defined over a common set of I = 21 actors, such that y i,i (cid:48) ,j = 1 if i and i (cid:48) are friends of each other, and y i,i (cid:48) ,j = 0 otherwise. Some attribute information abouteach executive was also available, including corporate level (president, vice-president,or general manager), and department membership (there are four departments labelledfrom 1 to 4; the CEO is not in any department). Such information can potentially beincluded in the analysis (see Section 6 for details).Part of these multilayer network data along with the “consensus” network are representedin Figure 11. In this context, there is a link present between two actors according tothe consensus if at least half of the personnel have reported that link. Note that eventhough the variability on the perceptions is not negligible, there are some commonalitiesacross networks. For instance, more than half of the management personnel believesthat actors 2 and 18 (both vice-presidents), actors 21 (vice-president) and 17 (manager)both in department 2, and actors 14 (vice-president) and 3 (manager) both in depart- (a) Actor 7 (b) Actor 14 (c) Actor 17 (d) Consensus Figure 11: Visualization of some networks in the friendship CSS data corresponding to actors7, 14, and 17, along with the consensus network. Vertex shape indicates the executive’s levelin the company (star: president, actor 7; square: vice-presidents, actors 2, 14, 18, 21; andcircles: managers), whereas vertex color indicates the executive’s department in the company(the president does not belong to any department).
We consider the agreement question in which we ask whether an individual’s perceptionof their relationships is the same as the perception that others hold. To answer thisquestion, we define the assessment parameter δ i , for i = 1 , . . . , I , as the differencebetween subject i ’s self-assessment and the mean assessment of subject i by others, i.e., δ i = (cid:13)(cid:13)(cid:13) ˜ u i,i (cid:13)(cid:13)(cid:13) − (cid:13)(cid:13)(cid:13) I − (cid:88) j (cid:54) = i ˜ u i,j (cid:13)(cid:13)(cid:13) , where ˜ u i,j is the Procrustes-transformed version of u i,j . This quantity is an effort toparametrize the accuracy of self-assessment in perceiving ties.Bottom panel in Figure 12 provides credible intervals along with point estimates for thepersonal assessment parameters δ , . . . , δ I , based on B = 10 , samples of the posteriordistribution obtained after thinning the original Markov chains every 10 observations anda burn-in period of 100,000 iterations, associated with the value of K that optimizes the WAIC ( K = 6 ). We see that most actors have a slightly elevated view of themselves interms of their capacity to befriend others, whereas very few have a negative view. Onthe other hand, actors 10, 15, 17, and 19 have a significant inflated perception of theirability to form friendship ties. Note that the results of this test are quite consistent withthe exploratory data analysis discussed previously.Finally, in order to exemplify the social behaviour for an unskillful actor in perceivingrelations, we show in Figure 13 Procrustes-transformed latent positions estimates alongthe two dimensions with highest variance for actor 17 (who clearly has a misleading viewof his/hers surroundings according to the test), as perceived by this actor, E ( ˜ u i, | Y ) ,19 . . . . . . Actor N o r m a li z ed deg r ee (a) Normalized degree − . − . − . . . . . Actor D e l t a (b) Assessment index Figure 12: Top panel: Normalized degree distribution across networks. The i -th boxplot sum-marizes the distribution of the degree for all reporters except i , while the self-perceived degree isrepresented by a triangle ( (cid:52) ) and the respective degree in the consensus network by a cross ( × ).Bottom panel: 95% credible intervals and posterior means for the distribution of the personalassessment parameters δ i . Thicker lines correspond to credible intervals that do not containzero. E ( ˜ u ,i | Y ) . These plots areconsistent with those from Figure 12. Actor 17 see him/herself in quite a “central”position of the friendship relations; however, according with the general opinion, actor17 is clearly isolated from the others. This is again consistent with the test showed inFigure 12. (a) E ( ˜ u i, | Y ) (b) E ( ˜ u ,i | Y ) Figure 13: Posterior means of procrustes-transformed latent positions along the two dimensionswith highest variance for actor 17 (circled). Left panel: As perceived by actor 17, E ( ˜ u i, | Y ) .Right panel: As perceived by all actors, E ( ˜ u ,i | Y ) . As an additional goodness-of-fit assessment, we carry out cross-validation experimentson several multilayer network datasets (see Table 2) exhibiting different kinds of actors,sizes, and relations. More specifically, we performed a five-fold cross-validation (CV) inwhich five randomly selected subsets of roughly equal size in the dataset are treated asmissing and then predicted using the rest of the data.We summarize our findings in Table 3, where we report the average area under thereceiver operating characteristic curve (
AUC ) and the
WAIC for each dataset describedin Table 2. The values correspond to the prediction of missing links using indepen-dently fitted LPMs (IFLPM), our MNLPM, and also, a variant of MNLPM that isvery reminiscent of Gollini and Murphy (2016). The latter, referred to as GMLPM,considers unique latent positions with no hierarchical structure in such a way that ϑ i,i (cid:48) ,j = Φ( ζ j − e θ j (cid:107) u i − u i (cid:48) (cid:107) ) . In this context, the AUC is a measure of how well a21cronym Reference Actors Layers Edges wiring
Roethlisberger and Dickson (2003) 14 4 79 tech
Krackhardt (1987) 21 21 550 seven
Vickers and Chan (1981) 29 3 222 girls
Steglich et al. (2006) 50 3 119 aarhus
Magnani et al. (2013) 61 5 620 micro
Banerjee et al. (2013) 77 6 903
Table 2: Multilayer network datasets for which a series of cross-validation experiments areperformed using independently fitted LPMs (baseline) and our MNLPM. Note that wiring and tech are widely analysed in Section 4. Also, micro corresponds to the network data of villagenumber 10. given model is capable of predicting missing links (higher
AUC values are better). Wereport the
AUC for the models with the optimal value of K according to the WAIC criteria. As before, our predictions are based on B = 10 , samples of the posteriordistribution obtained after thinning the original Markov chains every 10 observationsand a burn-in period of 100,000 iterations.We see that MNLPM is clearly the best alternative in terms of both prediction andgoodness-of-fit. Specifically, the out-of-sample performance of IFLPM and NMLPM ispractically the same for wiring , as well as that of GMLPM and MNLPM for seven , girls ,and micro . For all the other datasets, MNLPM has a better predictive behaviour thanits competitors. Such an effect is particularly clear when fitting MNLPM as opposed toIFLPM, which provides even more evidence about why considering our hierarchical prioras in MNLPM is beneficial. On the other hand, not surprisingly, the WAIC of GMLPM isMeasure
AUC WAIC
Acronym IFLPM GMLPM MNLPM IFLPM GMLPM MNLPM wiring tech seven girls aarhus micro
Table 3: Average
AUC s and
WAIC corresponding to the prediction of missing links in a series ofCV experiments to assess the predictive performance of IFLPM, GMLPM, and MNLPM, usingeach dataset provided in Table 2.
This paper presents a novel approach to modeling multilayer network data with a methodthat encourages the flow of information across networks, as opposed to an independentcharacterization of each of them. Our proposal is based on a natural hierarchical exten-sion of a latent space distance model, which provides a direct description of actors’ roleswithin and across networks at global and specific levels. Furthermore, our experimentsprovide sufficient empirical evidence to establish that our approach is highly competitivein terms of prediction and goodness-of-fit.Our MNLPM is susceptible to many generalizations. First, the model can be extended torepresent patterns in the data related to known covariates by letting ϑ i,i (cid:48) ,j = Φ( x T i,i (cid:48) ζ j − e θ j (cid:107) u i,j − u i (cid:48) ,j (cid:107) ) , where x i,i (cid:48) = ( x i,i (cid:48) , , . . . , x i,i (cid:48) ,P ) , in addition to a global intercept,is a vector of predictors that incorporates known attributes associated with actors i and i (cid:48) , and ζ j = ( ζ j, , . . . , ζ j,P ) is an unknown vector of fixed effects. Furthermore, inorder to represent more general combinations of structural equivalence and homophilyin varying degrees, it is also possible to consider other types of latent effects as in afactorial model, by letting ϑ i,i (cid:48) ,j = Φ( ζ j + u i Λ j u i (cid:48) ) , where Λ j = diag ( λ j, , . . . , λ j,K ) is a K × K diagonal matrix. Lastly, the model can also be modified to handle undirectednetworks by distinguishing latent “sender” positions, u i,j , and latent “receiver” positions, v i,j , which leads to ϑ i,i (cid:48) ,j = Φ( ζ j − e θ j (cid:107) u i,j − v i (cid:48) ,j (cid:107) ) .In the same spirit of Green and Hastie (2009), we can also conceive a trans-dimensionalversion of the model that treats the latent dimension K as a model parameter (as opposedto a fixed pre-specified quantity), which is quite challenging aside from the computationalcomplexity, since in a varying-dimension case it is not clear how to provide a meaningfulinterpretation to the latent dimensions. Moreover, following Guhaniyogi and Rodriguez(2020), a truncation of a non-parametric process can be also incorporated into the model,but based on the authors’ experience, results are likely to be quite similar.Also, note that social positions might exhibit clustering patterns, which can be modelleddirectly by considering cluster assignment parameters ξ , . . . , ξ I into the hierarchical23pecification of the model through a Categorical-Dirichlet prior (e.g., Handcock et al.,2007, Krivitsky and Handcock, 2008, Krivitsky et al., 2009). Specifically, we can assumethat all the actors in the system are clustered into H groups, each of which occupiesa position ϕ h in the social space, h = 1 . . . , H . In this way, we can think of actor i ’saverage position η i as a Normal deviation from the group position to which it belongs,i.e., η i | { ϕ h } , κ , ξ i ind ∼ N K ( ϕ ξ i , κ I ) , where ξ i = h means that actor i belongs to cluster h . Nonparametric Bayes approaches in the same spirit of Rodriguez (2015) and D’Angeloet al. (2019) are also possible.Finally, we recommend consider alternative inference methods in order to account for“big networks”, which is currently an active research area in computational statistics(e.g, Gollini and Murphy (2016), Ma and Ma, 2017, Spencer et al., 2020, Aliverti andRusso, 2020). References
Abdollahpouri, A., Salavati, C., Arkat, J., Tab, F. A., and Manbari, Z. (2020). A multi-objective model for identifying valuable nodes in complex networks with minimumcost.
Cluster Computing , 23(4):2719–2733.Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2009). Mixed membership stochasticblockmodels. In
Advances in Neural Information Processing Systems , pages 33–40.Aldous, D. J. (1985). Exchangeability and related topics. In
École d’Été de Probabilitésde Saint-Flour XIII—1983 , pages 1–198. Springer.Aliverti, E. and Russo, M. (2020). Stratified stochastic variational inference for high-dimensional network factor model. arXiv preprint arXiv:2006.14217 .Banerjee, A., Chandrasekhar, A., Duflo, E., and Jackson, M. (2013). The diffusion ofmicrofinance.
Science , 341(6144).Bartz-Beielstein, T., Branke, J., Filipič, B., and Smith, J. (2014).
Parallel ProblemSolving from Nature–PPSN XIII: 13th International Conference, Ljubljana, Slovenia,September 13-17, 2014, Proceedings , volume 8672. Springer.Betancourt, B., Rodriguez, A., and Boyd, N. (2020). Modelling and prediction of finan-cial trading networks: an application to the new york mercantile exchange natural gasfutures market.
Journal of the Royal Statistical Society: Series C (Applied Statistics) ,69(1):195–218. 24org, I. and Groenen, P. (2005).
Modern multidimensional scaling: Theory and appli-cations . Springer Science & Business Media.D’Angelo, S., Alfò, M., and Fop, M. (2020). Model-based clustering for multivariatenetworks. arXiv preprint arXiv:2001.05260 .D’Angelo, S., Alfò, M., and Murphy, T. B. (2018). Node-specific effects in latent spacemodelling of multidimensional networks. In .D’Angelo, S., Murphy, T. B., and Alfò, M. (2019). Latent space modelling of multi-dimensional networks with application to the exchange of votes in eurovision songcontest.
Annals of Applied Statistics , 13(2):900–930.Durante, D. and Dunson, D. (2014). Nonparametric bayes dynamic modelling of rela-tional data.
Biometrika , 101(4):883–898.Durante, D. and Dunson, D. (2018). Bayesian inference and testing of group differencesin brain networks.
Bayesian Analysis , 13(1):29–58.Erdös, P. and Rényi, A. (1959). On random graphs.
Publicationes Mathematicae , 6(290-297):5.Gamerman, D. and Lopes, H. (2006).
Markov chain Monte Carlo: stochastic simulationfor Bayesian inference . CRC Press.Gao, L., Witten, D., and Bien, J. (2019). Testing for association in multi-view networkdata. arXiv preprint arXiv:1909.11640 .Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B.(2013).
Bayesian data analysis . CRC press.Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive informationcriteria for bayesian models.
Statistics and computing , 24(6):997–1016.Gollini, I. and Murphy, T. (2016). Joint modeling of multiple network views.
Journalof Computational and Graphical Statistics , 25(1):246–265.Green, P. and Hastie, D. (2009). Reversible jump mcmc.
Genetics , 155(3):1391–1403.Guhaniyogi, R. and Rodriguez, A. (2020). Joint modeling of longitudinal relational dataand exogenous variables.
Bayesian Analysis .25upta, S., Sharma, G., and Dukkipati, A. (2018). Evolving latent space model fordynamic networks. arXiv preprint arXiv:1802.03725 .Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive metropolis algorithm.
Bernoulli , 7(2):223–242.Han, Q., Xu, K., and Airoldi, E. (2015). Consistent estimation of dynamic and multi-layer block models. In
International Conference on Machine Learning , pages 1511–1520.Handcock, M., Raftery, A., and Tantrum, J. (2007). Model-based clustering for socialnetworks.
Journal of the Royal Statistical Society: Series A (Statistics in Society) ,170(2):301–354.Hoff, P. (2009). Multiplicative latent factor models for description and prediction ofsocial networks.
Computational and Mathematical Organization Theory , 15(4):261–272.Hoff, P. (2015). Multilinear tensor regression for longitudinal relational data.
The annalsof applied statistics , 9(3):1169.Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data.
Journal of the americanStatistical association , 100(469):286–295.Hoff, P. D. (2008). Modeling homophily and stochastic equivalence in symmetric rela-tional data. In
Advances in neural information processing systems , pages 657–664.Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002). Latent space approaches tosocial network analysis.
Journal of the american Statistical association , 97(460):1090–1098.Hoover, D. N. (1982). Row-column exchangeability and a generalized model for proba-bility.
Exchangeability in probability and statistics , pages 281–291.Kim, B., Lee, K., Xue, L., and Niu, X. (2018). A review of dynamic network modelswith latent variables.
Statistics surveys , 12:105.Kolaczyk, E. D. and Csárdi, G. (2014).
Statistical analysis of network data with R ,volume 65. Springer.Krackhardt, D. (1987). Cognitive social structures.
Social networks , 9(2):109–134.26rivitsky, P. and Handcock, M. (2008). Fitting latent cluster models for networks withlatentnet.
Journal of Statistical Software , 24(5).Krivitsky, P. N., Handcock, M. S., Raftery, A. E., and Hoff, P. D. (2009). Representingdegree distributions, clustering, and homophily in social networks with latent clusterrandom effects models.
Social networks , 31(3):204–213.Li, W.-J., Yeung, D.-Y., and Zhang, Z. (2011). Generalized latent factor models forsocial network analysis. In
Proceedings of the 22nd International Joint Conference onArtificial Intelligence (IJCAI), Barcelona, Spain .Linkletter, C. (2007).
Spatial process models for social network analysis . PhD thesis,Citeseer.Liu, Q. (2020).
The 10th International Conference on Computer Engineering and Net-works , volume 1274. Springer Nature.Ma, Z. and Ma, Z. (2017). Exploration of large networks with covariates via fast anduniversal latent space model fitting. arXiv preprint arXiv:1705.02372 .Magnani, M., Micenkova, B., and Rossi, L. (2013). Combinatorial analysis of multiplenetworks. arXiv preprint arXiv:1303.4986 .Minhas, S., Hoff, P. D., and Ward, M. D. (2019). Inferential approaches for networkanalysis: Amen for latent factor models.
Political Analysis , 27(2):208–222.Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block-structures.
Journal of the American statistical association , 96(455):1077–1087.Paez, M., Amini, A., and Lin, L. (2019). Hierarchical stochastic block model for com-munity detection in multiplex networks. arXiv preprint arXiv:1904.05330 .Paul, S. and Chen, Y. (2016). Consistent community detection in multi-relational datathrough restricted multi-layer stochastic blockmodel.
Electronic Journal of Statistics ,10(2):3807–3870.Paul, S., Chen, Y., et al. (2020). Spectral and matrix factorization methods for consistentcommunity detection in multi-layer networks.
The Annals of Statistics , 48(1):230–250.Rastelli, R., Maire, F., and Friel, N. (2019). Computationally efficient inference forlatent position network models. arXiv preprint arXiv:1804.02274 .27eyes, P. and Rodriguez, A. (2016). Stochastic blockmodels for exchangeable collectionsof networks. arXiv preprint arXiv:1606.05277 .Rodriguez, A. (2015). A bayesian nonparametric model for exchangeable multinetworkdata based on fragmentation and coagulation processes. Technical report, Technicalreport, University of California, Santa Cruz.Roethlisberger, F. and Dickson, W. (2003).
Management and the Worker , volume 5.Psychology Press.Salter-Townshend, M. and McCormick, T. H. (2017). Latent space models for multiviewnetwork data.
The annals of applied statistics , 11(3):1217.Schweinberger, M. and Snijders, T. (2003). Settings in social networks: A measurementmodel.
Sociological Methodology , 33(1):307–341.Sewell, D. (2019). Latent space models for network perception data.
Network Science ,7(2):160–179.Sewell, D. and Chen, Y. (2015). Latent space models for dynamic networks.
Journal ofthe American Statistical Association , 110(512):1646–1657.Sewell, D. and Chen, Y. (2016). Latent space models for dynamic networks with weightededges.
Social Networks , 44:105–116.Sewell, D. and Chen, Y. (2017). Latent space approaches to community detection indynamic networks.
Bayesian Analysis , 12(2):351–377.Sosa, J. and Buitrago, L. (2021). A review of latent space models for social networks.
Revista Colombiana de Estadística , 44(1):171–200.Sosa, J. and Rodriguez, A. (2021). A latent space model for cognitive social structuresdata.
Social Networks , 65(1):85–97.Spencer, N. A., Junker, B., and Sweet, T. M. (2020). Faster mcmc for gaussian latentposition network models. arXiv preprint arXiv:2006.07687 .Steglich, C., Snijders, T. A., and West, P. (2006). Applying siena.
Methodology , 2(1):48–56.Swartz, T., Gill, P., and Muthukumarana, S. (2015). A Bayesian approach for theanalysis of triadic data in cognitive social structures.
Journal of the Royal StatisticalSociety: Series C (Applied Statistics) , 64(4):593–610.28urnbull, K. (2020).
Advancements in latent space network modelling . PhD thesis,Lancaster University.Vickers, M. and Chan, S. (1981). Representing classroom social structure.
VictoriaInstitute of Secondary Education, Melbourne .Wang, L., Zhang, Z., and Dunson, D. (2019). Common and individual structure of brainnetworks.
The Annals of Applied Statistics , 13(1):85–112.Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely appli-cable information criterion in singular learning theory.
Journal of Machine LearningResearch , 11(Dec):3571–3594.Watanabe, S. (2013). A widely applicable Bayesian information criterion.
Journal ofMachine Learning Research , 14(Mar):867–897.Zhang, X. (2020).
Statistical Analysis for Network Data using Matrix Variate Modelsand Latent Space Models . PhD thesis, The University of Michigan.
A MCMC algorithm
The posterior distribution is given by: p ( Υ | Y ) = (cid:89) j,ii (cid:48) Ber (cid:0) y i (cid:48) ,i,j | Φ( ζ j − e θ j (cid:107) u i (cid:48) ,j − u i,j (cid:107) ) (cid:1) × N K ( u i,j | η i , σ I ) .
2. Sample η ( b +1) i , i = 1 , . . . , I , from: η i | rest ∼ N K (cid:32)(cid:20) κ + Jσ (cid:21) − (cid:34) κ ν + 1 σ (cid:88) j u i,j (cid:35) , (cid:20) κ + Jσ (cid:21) − I (cid:33) .
3. Sample ( σ ) ( b +1) from: σ | rest ∼ IG (cid:32) a σ + I J K , b σ + 12 (cid:88) i,j ( u i,j − η i ) T ( u i,j − η i ) (cid:33) .
4. Sample ν ( b +1) from: ν | rest ∼ N K (cid:32)(cid:20) V − ν + Iκ I (cid:21) − (cid:34) V − ν m ν + 1 κ (cid:88) i η i (cid:35) , (cid:20) V − ν + Iκ I (cid:21) − (cid:33) .
5. Sample ( κ ) ( b +1) from: κ | rest ∼ IG (cid:32) a κ + I K , b κ + 12 (cid:88) i ( η i − θ ) T ( η i − θ ) (cid:33) .
6. Sample θ ( b +1) j , j = 1 , . . . , J , according to an adaptive Metropolis-Hastings algo-rithm with the full conditional distribution: p ( θ j | rest ) ∝ (cid:89) i,i (cid:48) : i
7. Sample µ ( b +1) θ from: µ θ | rest ∼ N (cid:32)(cid:20) v θ + Jτ θ (cid:21) − (cid:20) m θ v θ + (cid:80) j θ j τ θ (cid:21) , (cid:20) v θ + Jτ θ (cid:21) − (cid:33) .
30. Sample ( τ θ ) ( b +1) from: τ θ | rest ∼ IG (cid:32) a θ + J , b θ + 12 (cid:88) j ( θ j − µ θ ) (cid:33) .
9. Sample ζ ( b +1) j , j = 1 , . . . , J , according to an adaptive Metropolis-Hastings algo-rithm with the full conditional distribution: p ( ζ j | rest ) ∝ (cid:89) i,i (cid:48) : i
10. Sample µ ( b +1) ζ from: µ ζ | rest ∼ N (cid:34) v ζ + Jτ ζ (cid:35) − (cid:34) m ζ v ζ + (cid:80) j ζ j τ ζ (cid:35) , (cid:34) v ζ + Jτ ζ (cid:35) − .
11. Sample ( τ ζ ) ( b +1) from: τ ζ | rest ∼ IG (cid:32) a ζ + J , b ζ + 12 (cid:88) j ( ζ j − µ ζ ) (cid:33) . B Notation
The cardinality of a set A is denoted by | A | . If P is a logical proposition, then { P } = 1 if P is true, and { P } = 0 if P is false. (cid:98) x (cid:99) denotes the floor of x , whereas [ n ] denotesthe set of all integers from 1 to n , i.e., { , . . . , n } . The Gamma function is given by Γ( x ) = (cid:82) ∞ u x − e − u d u .Matrices and vectors with entries consisting of subscripted variables are denoted by aboldfaced version of the letter for that variable. For example, x = ( x , . . . , x n ) denotesan n × column vector with entries x , . . . , x n . We use and to denote the columnvector with all entries equal to 0 and 1, respectively, and I to denote the identity matrix.A subindex in this context refers to the corresponding dimension; for instance, I n denotesthe n × n identity matrix. The transpose of a vector x is denoted by x T ; analogouslyfor matrices. Moreover, if X is a square matrix, we use tr ( X ) to denote its trace and X − to denote its inverse. The norm of x , given by √ x T x , is denoted by (cid:107)(cid:107)
The cardinality of a set A is denoted by | A | . If P is a logical proposition, then { P } = 1 if P is true, and { P } = 0 if P is false. (cid:98) x (cid:99) denotes the floor of x , whereas [ n ] denotesthe set of all integers from 1 to n , i.e., { , . . . , n } . The Gamma function is given by Γ( x ) = (cid:82) ∞ u x − e − u d u .Matrices and vectors with entries consisting of subscripted variables are denoted by aboldfaced version of the letter for that variable. For example, x = ( x , . . . , x n ) denotesan n × column vector with entries x , . . . , x n . We use and to denote the columnvector with all entries equal to 0 and 1, respectively, and I to denote the identity matrix.A subindex in this context refers to the corresponding dimension; for instance, I n denotesthe n × n identity matrix. The transpose of a vector x is denoted by x T ; analogouslyfor matrices. Moreover, if X is a square matrix, we use tr ( X ) to denote its trace and X − to denote its inverse. The norm of x , given by √ x T x , is denoted by (cid:107)(cid:107) x (cid:107)(cid:107)