[PDF] Network Psychometrics

Abstract

This chapter provides a general introduction of network modeling in psychometrics. The chapter starts with an introduction to the statistical model formulation of pairwise Markov random fields (PMRF), followed by an introduction of the PMRF suitable for binary data: the Ising model. The Ising model is a model used in ferromagnetism to explain phase transitions in a field of particles. Following the description of the Ising model in statistical physics, the chapter continues to show that the Ising model is closely related to models used in psychometrics. The Ising model can be shown to be equivalent to certain kinds of logistic regression models, loglinear models and multi-dimensional item response theory (MIRT) models. The equivalence between the Ising model and the MIRT model puts standard psychometrics in a new light and leads to a strikingly different interpretation of well-known latent variable models. The chapter gives an overview of methods that can be used to estimate the Ising model, and concludes with a discussion on the interpretation of latent variables given the equivalence between the Ising model and MIRT.

Full PDF

NNetwork Psychometrics

Sacha Epskamp, Gunter K. J. Maris, Lourens J. Waldorp and DennyBorsboom

University of Amsterdam, Department of Psychological MethodsAbstractThis chapter provides a general introduction of network modeling in psy-chometrics. The chapter starts with an introduction to the statistical modelformulation of pairwise Markov random ﬁelds (PMRF), followed by an in-troduction of the PMRF suitable for binary data: the

Ising model . The Isingmodel is a model used in ferromagnetism to explain phase transitions in aﬁeld of particles. Following the description of the Ising model in statisticalphysics, the chapter continues to show that the Ising model is closely re-lated to models used in psychometrics. The Ising model can be shown to beequivalent to certain kinds of logistic regression models, loglinear models andmulti-dimensional item response theory (MIRT) models. The equivalencebetween the Ising model and the MIRT model puts standard psychometricsin a new light and leads to a strikingly diﬀerent interpretation of well-knownlatent variable models. The chapter gives an overview of methods that canbe used to estimate the Ising model, and concludes with a discussion onthe interpretation of latent variables given the equivalence between the Isingmodel and MIRT.In fact, statistical ﬁeld theory may have even more to oﬀer. It always struckme that there appears to be a close connection between the basic expressionsunderlying item-response theory and the solutions of elementary lattice ﬁeldsin statistical physics. For instance, there is almost a one-to-one formal corre-spondence of the solution of the Ising model (a lattice with nearest neighborinteraction between binary-valued sites; e.g., Kindermann et al. 1980, Chapter1) and the Rasch model (Fischer, 1974). —Peter Molenaar (2003, p. 82)

Introduction

In recent years, network models have been proposed as an alternative way of lookingat psychometric problems (Van Der Maas et al., 2006; Cramer et al., 2010; Borsboom and

Please cite as: Epskamp, S., Maris, G., Waldorp, L.J., and Borsboom, D. (in press). Network Psycho-metrics. In Irwing, P., Hughes, D., and Booth, T. (Eds.),

Handbook of Psychometrics . New York: Wiley. a r X i v : . [ s t a t . M E ] J un ETWORK PSYCHOMETRICS 2Cramer, 2013). In these models, psychometric item responses are conceived of as proxies forvariables that directly interact with each other. For example, the symptoms of depression(such as loss of energy, sleep problems, and low self esteem) are traditionally thought ofas being determined by a common latent variable (depression, or the liability to becomedepressed; Aggen et al. 2005). In network models, these symptoms are instead hypothesizedto form networks of mutually reinforcing variables (e.g., sleep problems may lead to loss ofenergy, which may lead to low self esteem, which may cause rumination that in turn mayreinforce sleep problems). On the face of it, such network models oﬀer an entirely diﬀerentconceptualization of why psychometric variables cluster in the way that they do. However,it has also been suggested in the literature that latent variables may somehow correspond tosets of tightly intertwined observables (e.g., see the Appendix of Van Der Maas et al. 2006),and as the above quote shows, Molenaar (2003) already suspected that network models inphysics are closely connected to psychometric models with latent variables.In the current chapter, we aim to make this connection explicit. As we will show, aparticular class of latent variable models (namely, multidimensional Item Response Theorymodels) yields exactly the same probability distribution over the observed variables as aparticular class of network models (namely, Ising models). In the current chapter, we exploitthe consequences of this equivalence. We will ﬁrst introduce the general class of models usedin network analysis called Markov Random Fields. Speciﬁcally, we will discuss the Markovrandom ﬁeld for binary data called the

Ising Model , which originated from statistical physicsbut has since been used in many ﬁelds of science. We will show how the Ising Model relatesto psychometrical practice, with a focus on the equivalence between the Ising Model andmultidimensional item response theory. We will demonstrate how the Ising model can beestimated and ﬁnally, we will discuss the conceptual implications of this equivalence.

Notation

Throughout this chapter we will denote random variables with capital letters and pos-sible realizations with lower case letters; vectors will be represented with bold-faced letters.For parameters, we will use boldfaced capital letters to indicate matrices instead of vectorswhereas for random variables we will use boldfaced capital letters to indicate a randomvector. Roman letters will be used to denote observable variables and parameters (such asthe number of nodes) and Greek letters will be used to denote unobservable variables andparameters that need to be estimated.In this chapter we will mainly model the random vector X : X > = h X X . . . X P i , containing P binary variables that take the values 1 (e.g., correct, true or yes) and − state , of X with x > = h x x . . . x p i . Let N be the number of observations and n ( xxx ) the numberof observations that have response pattern xxx . Furthermore, let i denote the subscript of arandom variable and j the subscript of a diﬀerent random variable ( j = i ). Thus, X i isthe i th random variable and x i its realization. The superscript − ( . . . ) will indicate thatelements are removed from a vector; for example, X − ( i ) indicates the random vector XXX

ETWORK PSYCHOMETRICS 3 X X X Figure 1 . Example of a PMRF of three nodes, X , X and X , connected by two edges,one between X and X and one between X and X .without X i : X − ( i ) = h X , . . . , X i − , X i +1 , . . . .X P i , and x − ( i ) indicates its realization. Sim-ilarly, X − ( i,j ) indicates XXX without X i and X j and x − ( i,j ) its realization. An overview of allnotations used in this chapter can be seen in Appendix B. Markov Random Fields

A network, also called a graph, can be encoded as a set G consisting of two sets: V ,which contains the nodes in the network, and E , which contains the edges that connect thesenodes. For example, the graph in Figure 1 contains three nodes: V = { , , } , which areconnected by two edges: E = { (1 , , (2 , } . We will use this type of network to representa pairwise Markov random ﬁeld (PMRF; Lauritzen 1996; Murphy 2012), in which nodesrepresent observed random variables and edges represent (conditional) association betweentwo nodes. More importantly, the absence of an edge represents the Markov property thattwo nodes are conditionally independent given all other nodes in the network: X i ⊥⊥ X j | X − ( i,j ) = x − ( i,j ) ⇐⇒ ( i, j ) E (1)Thus, a PMRF encodes the independence structure of the system of nodes. In the case ofFigure 1, X and X are independent given that we know X = x . This could be due toseveral reasons; there might be a causal path from X to X or vise versa, X might be thecommon cause of X and X , unobserved variables might cause the dependencies between X and X and X and X , or the edges in the network might indicate actual pairwiseinteractions between X and X and X and X .Of particular interest to psychometrics are models in which the presence of latentcommon causes induces associations among the observed variables. If such a commoncause model holds, we cannot condition on any observed variable to completely remove the Throughout this chapter, nodes in a network designate variables, hence the terms are used interchange-ably.

ETWORK PSYCHOMETRICS 4association between two nodes (Pearl, 2000). Thus, if an unobserved variable acts as acommon cause to some of the observed variables, we should ﬁnd a fully connected clique inthe PMRF that describes the associations among these nodes. The network in Figure 1, forexample, cannot represent associations between three nodes that are subject to the inﬂuenceof a latent common cause; if that were the case, it would be impossible to obtain conditionalindependence between X and X by conditioning on X . Parameterizing Markov Random Fields

A PMRF can be parameterized as a product of strictly positive potential functions φ ( x ) (Murphy, 2012): Pr ( XXX = xxx ) = 1 Z Y i φ i ( x i ) Y φ ij ( x i , x j ) , (2)in which Q i takes the product over all nodes, i = 1 , , . . . , P , Q takes the product overall distinct pairs of nodes i and j ( j > i ), and Z is a normalizing constant such that theprobability function sums to unity over all possible patterns of observations in the samplespace: Z = X xxx Y i φ i ( x i ) Y φ ij ( x i , x j ) . Here, P xxx takes the sum over all possible realizations of XXX . All φ ( x ) functions result inpositive real numbers, which encode the potentials : the preference for the relevant part of XXX to be in some state. The φ i ( x i ) functions encode the node potentials of the network;the preference of node X i to be in state x i , regardless of the state of the other nodes inthe network. Thus, φ i ( x i ) maps the potential for X i to take the value x i regardless of therest of the network. If φ i ( x i ) = 0, for instance, then X i will never take the value x i , while φ i ( x i ) = 1 indicates that there is no preference for X i to take any particular value and φ i ( x i ) = ∞ indicates that the system always prefers X i to take the value x i . The φ ij ( x i , x j )functions encode the pairwise potentials of the network; the preference of nodes X i and X j to both be in states x i and x j . As φ ij ( x i , x j ) grows higher we would expect to observe X j = x j whenever X i = x i . Note that the potential functions are not identiﬁed; we canmultiply both φ i ( x i ) or φ ij ( x i , x j ) with some constant for all possible outcomes of x i , inwhich case this constant becomes a constant multiplier to (2) and is cancelled out in thenormalizing constant Z . A typical identiﬁcation constraint on the potential functions is toset the marginal geometric means of all outcomes equal to 1; over all possible outcomes ofeach argument, the logarithm of each potential function should sum to 0: X x i ln φ i ( x i ) = X x i ln φ ij ( x i , x j ) = X x j ln φ ij ( x i , x j ) = 0 ∀ x i , x j (3)in which P x i denotes the sum over all possible realizations for X i , and P x j denotes thesum over all possible realizations of X j .We assume that every node has a potential function φ i ( x i ) and nodes only have a rel-evant pairwise potential function φ ij ( x i , x j ) when they are connected by an edge; thus, twoETWORK PSYCHOMETRICS 5unconnected nodes have a constant pairwise potential function which, due to identiﬁcationabove, is equal to 1 for all possible realizations of X i and X j : φ ij ( x i , x j ) = 1 ∀ x i , x j ⇐⇒ ( i, j ) E. (4)From Equation (2) it follows that the distribution of XXX marginalized over X k and X l , that is, the marginal distribution of XXX − ( k,l ) (the random vector XXX without elements X k and X l ), has the following form: Pr (cid:0) XXX − ( k,l ) = xxx − ( k,l ) (cid:1) = X x k ,x l Pr (

XXX = xxx )= 1 Z Y i k,l } φ i ( x i ) Y φ ij ( x i , x j ) (5) X x k ,x l  φ k ( x k ) φ l ( x l ) φ kl ( x k , x l ) Y i k,l } φ ik ( x i , x k ) φ il ( x i , x l )  , in which Q i k,l } takes the product over all nodes except node k and l and Q takes the product over all unique pairs of nodes that do not involve k and l . The expressionin (5) has two important consequences. First, (5) does not have the form of (2); a PMRFis not a PMRF under marginalization. Second, dividing (2) by (5) an expression can beobtained for the conditional distribution of { X k , X l } given that we know XXX − ( k,l ) = xxx − ( k,l ) :Pr (cid:16) X k , X l | XXX − ( k,l ) = xxx − ( k,l ) (cid:17) = Pr ( XXX = xxx )Pr (cid:0) XXX − ( k,l ) = xxx − ( k,l ) (cid:1) = φ ∗ k ( x k ) φ ∗ l ( x l ) φ kl ( x k , x l ) P x k ,x l φ ∗ k ( x k ) φ ∗ l ( x l ) φ kl ( x k , x l ) , (6)in which: φ ∗ k ( x k ) = φ k ( x k ) Y i k,l } φ ik ( x i , x k )and: φ ∗ l ( x l ) = φ l ( x l ) Y i k,l } φ il ( x i , x l ) . Now, (6) does have the same form as (2); a PMRF is a PMRF under conditioning. Further-more, if there is no edge between nodes k and l , φ kl ( x k , x l ) = 1 according to (4), in whichcase (6) reduces to a product of two independent functions of x k and x l which renders X k and X l independent; thus proving the Markov property in (1). The Ising Model

The node potential functions φ i ( x i ) can map a unique potential for every possiblerealization of X i and the pairwise potential functions φ ij ( x i , x j ) can likewise map uniquepotentials to every possible pair of outcomes for X i and X j . When the data are binary,only two realizations are possible for x i , while four realizations are possible for the pair x i and x j . Under the constraint that the log potential functions should sum to 0 over allmarginals, this means that in the binary case each potential function has one degree ofETWORK PSYCHOMETRICS 6freedom. If we let all X ’s take the values 1 and −

1, there exists a conveniently loglinearmodel representation for the potential functions:ln φ i ( x i ) = τ i x i ln φ ij ( x i , x j ) = ω ij x i x j . The parameters τ i and ω ij are real numbers. In the case that x i = 1 and x j = 1, it canbe seen that these parameters form an identity link with the logarithm of the potentialfunctions: τ i = ln φ i (1) ω ij = ln φ ij (1 , . These parameters are centered on 0 and have intuitive interpretations. The τ i parameterscan be interpreted as threshold parameters . If τ i = 0 the model does not prefer to be inone state or the other, and if τ i is higher (lower) the model prefers node X i to be in state1 (-1). The ω ij parameters are the network parameters and denote the pairwise interactionbetween nodes X i and X j ; if ω ij = 0 there is no edge between nodes X i and X j : ω ij ( = 0 if ( i, j ) E ∈ R if ( i, j ) ∈ E . (7)The higher (lower) ω ij becomes, the more nodes X i and X j prefer to be in the same (dif-ferent) state. Implementing these potential functions in (2) gives the following distributionfor XXX : Pr ( X = x ) = 1 Z exp X i τ i x i + X ω ij x i x j  (8) Z = X x exp X i τ i x i + X ω ij x i x j  , which is known as the Ising model (Ising, 1925).Table 1 Probability of all states from the network in Figure 1. x x x Potential Probability-1 -1 -1 3.6693 0.35141 -1 -1 1.1052 0.1058-1 1 -1 0.4066 0.03891 1 -1 0.9048 0.0866-1 -1 1 1.1052 0.10581 -1 1 0.3329 0.0319-1 1 1 0.9048 0.08661 1 1 2.0138 0.1928ETWORK PSYCHOMETRICS 7

NSSN (a)

SNSN (b)

NSNS (c)

SNNS (d)

Figure 2 . Example of the eﬀect of holding two magnets with a north and south pole closeto each other. The arrows indicate the direction the magnets want to move; the same poles,as in (b) and (c), repulse each other and opposite poles, as in (a) and (d), attract eachother.For example, consider the PMRF in Figure 1. In this network there are three nodes( X , X and X ), and two edges (between X and X , and between X and X ). Supposethese three nodes are binary, and take the values 1 and −

1. We can then model this PMRFas an Ising model with 3 threshold parameters, τ , τ and τ and two network parameters, ω and ω . Suppose we set all threshold parameters to τ = τ = τ = − .

1, whichindicates that all nodes have a general preference to be in the state −

1. Furthermore wecan set the two network parameters to ω = ω = 0 .

5. Thus, X and X prefer to be in thesame state, and X and X prefer to be in the same state as well. Due to these interactions, X and X become associated; these nodes also prefer to be in the same state, even thoughthey are independent once we condition on X . We can then compute the non-normalizedpotentials exp (cid:16)P i τ i x i + P ω ij x i x j (cid:17) for all possible outcomes of XXX and ﬁnally dividethat value by the sum over all non-normalized potentials to compute the probabilities ofeach possible outcome. For instance, for the state X = − , X = 1 and X = −

1, we cancompute the potential as exp ( − . . − . − . − . ≈ . Z ≈ . P ( X = − , X = − , X = −

1) is the highestprobable state in Table 1, due to the threshold parameters being all negative. Furthermore,the probability P ( X = 1 , X = 1 , X = 1) is the second highest probability in Table 1; ifone node is put into state 1 then all nodes prefer to be in that state due to the networkstructure.The Ising model was introduced in statistical physics, to explain the phenomenon ofmagnetism. To this end, the model was originally deﬁned on a ﬁeld of particles connected ona lattice. We will give a short introduction on this application in physics because it exempli-ﬁes an important aspect of the Ising model; namely, that the interactions between nodes canETWORK PSYCHOMETRICS 8 NSNSSNNS NSNSSNNS NSNSSNNS NSSNSNNS (a) X = = X = - X = = = X = - X = = X = X = - X = = X = - = - X = (b) Figure 3 . A ﬁeld of particles (a) can be repressented by a network shaped as a lattice as in(b). +1 indicates that the north pole is alligned upwards and − ferromagnetism .Exactly the same process causes the arrow of a compass to align with the magnetic ﬁeldof the Earth itself, causing it to point north. Any material that is ferromagnetic, such asa plate of iron, consists of particles that behave in the same way as magnets; they have anorth and south pole and lie in some direction. Suppose the particles can only lie in twodirections: the north pole can be up or the south pole can be up. Figure ?? shows a simple2-dimensional representation of a possible state for a ﬁeld of 4 × X i , which can take the values − X i being in state x i only depends on the direct neighbors (north, south east and west) of particle i . With thisassumption in place, the system in Figure ?? can be represented as a PMRF on a lattice,as represented in Figure ?? .A certain amount of energy is required for a system of particles to be in some state,such as in Figure 2. For example, in Figure ?? the node X is in the state − X and X are both in the same state and thus aligned, which reducesstress on the system and thus reduces the energy function. The other neighbors of X , X and X , are in the opposite state of X , and thus are not aligned, which increasing theETWORK PSYCHOMETRICS 9stress on the system. The total energy conﬁguration can be summarized in the Hamiltonian function: H ( x ) = − X i τ i x i − X ω ij x i x j , which is used in the Gibbs distribution (Murphy, 2012) to model the probability of XXX beingin some state xxx : Pr ( X = x ) = exp ( − βH ( x )) Z . (9)The parameter β indicates the inverse temperature of the system, which is not identiﬁablesince we can multiply β with some constant and divide all τ and ω parameters with thatsame constant to obtain the same probability. Thus, it can arbitrarily be set to β = 1.Furthermore, the minus signs in the Gibbs distribution and Hamiltonian cancel out, leadingto the Ising model as expressed in (8).The threshold parameters τ i indicate the natural deposition for particle i to pointup or down, which could be due to the inﬂuence of an external magnetic ﬁeld not part ofthe system of nodes in XXX . For example, suppose we model a single compass, there is onlyone node thus the Hamiltonian reduces to − τ x . Let X = 1 indicate the compass pointsnorth and X = − τ should be positive as thecompass has a natural tendency to point north due to the presence of the Earth’s magneticﬁeld. As such, the τ parameters are also called external ﬁelds. The network parameters ω ij indicate the interaction between two particles. Its sign indicates if particles i and j tend to be in the same state (positive; ferromagnetic) or in diﬀerent states (negative; anti-ferromagnetic). The absolute value, | ω ij | , indicates the strength of interaction. For any twonon-neighboring particles ω ij will be 0 and for neighboring particles the stronger ω ij thestronger the interaction between the two. Because the closer magnets, and thus particles,are moved together the stronger the magnetic force, we can interpret | ω ij | as a measure for closeness between two nodes.While the inverse temperature β is not identiﬁable in the sense of parameter esti-mation, it is an important element in the Ising model; in physics the temperature can bemanipulated whereas the ferromagnetic strength or distance between particles cannot. Theinverse temperature plays a crucial part in the entropy of (9) (Wainwright and Jordan,2008): Entropy ( XXX ) = E [ − ln Pr ( X = x )]= − β E (cid:20) − ln exp ( − H ( x )) Z ∗ (cid:21) , (10)in which Z ∗ is the rescaled normalizing constant without inverse temperature β . Theexpectation E h − ln exp( − H ( x )) Z ∗ i can be recognized as the entropy of the Ising model as deﬁnedin (8). Thus, the inverse temperature β directly scales the entropy of the Ising model. As β shrinks to 0, the system is “heated up” and all states become equally likely, causing ahigh level of entropy. If β is subsequently increased, then the probability function becomesconcentrated on a smaller number of states, and the entropy shrinks to eventually onlyallow the state in which all particles are aligned. The possibility that all particles becomealigned is called spontaneous magnetization (Lin, 1992; Kac, 1966); when all particles areETWORK PSYCHOMETRICS 10aligned (all X are either 1 or −

1) the entire ﬁeld of particles becomes magnetized, whichis how iron can be turned into a permanent magnet. We take this behavior as a particularimportant aspect of the Ising model; behavior on microscopic level (interactions betweenneighboring particles) can cause noticeable behavior on macroscopic level (the creation ofa permanent magnet).In our view, psychological variables may behave in the same way. For example,interactions between components of a system (e.g., symptoms of depression) can causesynchronized eﬀects of the system as a whole (e.g., depression as a disorder). Do note that,in setting up such analogies, we need to interpret the concepts of closeness and neighborhoodless literally than in the physical sense. Concepts such as “sleep deprivation” and “fatigue”can be said to be close to each other, in that they mutually inﬂuence each other; sleepdeprivation can lead to fatigue and in turn fatigue can lead to a disrupted sleeping rhythm.The neighborhood of these symptoms can then be deﬁned as the symptoms that frequentlyco-occur with sleep deprivation and fatigue, which can be seen in a network as a cluster ofconnected nodes. As in the Ising model, the state of these nodes will tend to be the sameif the connections between these nodes are positive. This leads to the interpretation thata latent trait, such as depression, can be seen as a cluster of connected nodes (Borsboomet al., 2011). In the next section, we will prove that there is a clear relationship betweennetwork modeling and latent variable modeling; indeed, clusters in a network can causedata to behave as if they were generated by a latent variable model.

The Ising Model in Psychometrics

In this section, we show that the Ising model is equivalent or closely related to promi-nent modeling techniques in psychometrics. We will ﬁrst discuss the relationship betweenthe Ising model and loglinear analysis and logistic regressions, next show that the Isingmodel can be equivalent to Item Response Theory (IRT) models that dominate psychomet-rics. In addition, we highlight relevant earlier work on the relationship between IRT andthe Ising model.To begin, we can gain further insight in the Ising model by looking at the conditionaldistribution of X i given that we know the value of the remaining nodes: X ( − i ) = x ( − i ) :Pr (cid:16) X i | X ( − i ) = x ( − i ) (cid:17) = Pr ( X = x )Pr (cid:16) X ( − i ) = x ( − i ) (cid:17) = Pr ( X = x ) P x i Pr (cid:16) X i = x i , X ( − i ) = x ( − i ) (cid:17) = exp (cid:16) x i (cid:16) τ i + P j ω ij x j (cid:17)(cid:17)P x i exp (cid:16) x i (cid:16) τ k + P j ω ij x j (cid:17)(cid:17) , (11)in which P x i takes the sum over both possible outcomes of x i . We can recognize thisexpression as a logistic regression model (Agresti, 1990). Thus, the Ising model can be seenas the joint distribution of response and predictor variables, where each variable is predictedby all other variables in the network. The Ising model therefore forms a predictive networkin which the neighbors of each node, the set of connected nodes, represent the variablesthat predict the outcome of the node of interest.ETWORK PSYCHOMETRICS 11Note that the deﬁnition of Markov random ﬁelds in (2) can be extended to includehigher order interaction terms:Pr ( XXX = xxx ) = 1 Z Y i φ i ( x i ) Y φ ij ( x i , x j ) Y φ ijk ( x i , x j , x k ) · · · , all the way up to the P -th order interaction term, in which case the model becomes sat-urated. Specifying ν ... ( . . . ) = ln φ ... ( . . . ) for all potential functions, we obtain a log-linearmodel: Pr (

XXX = xxx ) = 1 Z exp X i ν i ( x i ) + X ν ij ( x i , x j ) + X ν ijk ( x i , x j , x k ) · · · ! . Let n ( xxx ) be the number of respondents with response pattern xxx from a sample of N respondents. Then, we may model the expected frequency n ( xxx ) as follows: E [ n ( xxx )] = N Pr (

XXX = xxx )= exp ν + X i ν i ( x i ) + X ν ij ( x i , x j ) + X ν ijk ( x i , x j , x k ) · · · ! , (12) in which ν = ln N − ln Z . The model in (12) has extensively been used in loglinear analysis(Agresti, 1990; Wickens, 1989) . In loglinear analysis, the same constrains are typically usedas in (3); all ν functions should sum to 0 over all margins. Thus, if at most second-orderinteraction terms are included in the loglinear model, it is equivalent to the Ising modeland can be represented exactly as in (8). The Ising model, when represented as a loglinearmodel with at most second-order interactions, has been used in various ways. Agresti(1990) and Wickens (1989) call the model the homogeneous association model. Becauseit does not include three-way or higher order interactions, the association between X i and X j —the odds-ratio—is constant for any conﬁguration of XXX − ( i,j ) . Also, Cox (1972; Coxand Wermuth 1994) used the same model, but termed it the quadratic exponential binarydistribution , which has since often been used in biometrics and statistics (e.g., Fitzmauriceet al. 1993; Zhao and Prentice 1990). Interestingly, none of these authors mention the Isingmodel. The Relation Between the Ising Model and Item Response Theory

In this section we will show that the Ising model is a closely related modeling frame-work of Item Response Theory (IRT), which is of central importance to psychometrics. Infact, we will show that the Ising model is equivalent to a special case of the multivariate2-parameter logistic model (MIRT). However, instead of being hypothesized common causesof the item responses, in our representation the latent variables in the model are generated by cliques in the network.In IRT, the responses on a set of binary variables

XXX are assumed to be determinedby an set of M ( M ≤ P ) latent variables ΘΘΘ:ΘΘΘ > = h Θ Θ . . . Θ M i . both Agresti and Wickens used λ rather than ν to denote the log potentials, which we changed in thischapter to avoid confusion with eigenvalues and the LASSO tuning parameter. ETWORK PSYCHOMETRICS 12These latent variables are often denoted as abilities , which betrays the roots of the model ineducational testing. In IRT, the probability of obtaining a realization x i on the variable X i —often called items —is modeled through item response functions, which model the probabilityof obtaining one of the two possible responses (typically, scored 1 for correct responses and0 for incorrect responses) as a function of θθθ . For instance, in the Rasch (1960) model, alsocalled the one parameter logistic model (1PL), only one latent trait is assumed ( M = 1 andΘΘΘ = Θ) and the conditional probability of a response given the latent trait takes the formof a simple logistic function:Pr( X i = x i | Θ = θ ) = exp ( x i α ( θ − δ i )) P x i exp ( x i α ( θ − δ i )) , in which δ i acts as a diﬃculty parameter and α is a common discrimination parameter forall items. A typical generalization of the 1PL is the Birnbaum (1968) model, often called thetwo-parameter logistic model (2PL), in which the discrimination is allowed to vary betweenitems: Pr( X i = x i | Θ = θ ) = exp ( x i α i ( θ − δ i )) P x i exp ( x i α i ( θ − δ i )) . The 2PL reduces to the 1PL if all discrimination parameters are equal: α = α = . . . = α .Generalizing the 2PL model to more than 1 latent variable ( M >

1) leads to the 2PLmultidimensional IRT model (MIRT; Reckase 2009):Pr( X i = x i | ΘΘΘ = θθθ ) MIRT = exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17)P x i exp (cid:0) x i (cid:0) ααα > i θθθ − δ i (cid:1)(cid:1) , (13)in which θθθ is a vector of length M that contains the realization of ΘΘΘ, while ααα i is a vectorof length M that contains the discrimination of item i on every latent trait in the multidi-mensional space. The MIRT model reduces to the 2PL model if ααα i equals zero in all butone of its elements.Because IRT assumes local independence—the items are independent of each otherafter conditioning on the latent traits—the joint conditional probability of XXX = xxx can bewritten as product of the conditional probabilities of each item:Pr( XXX = xxx | ΘΘΘ = θθθ ) = Y i Pr( X i = x i | ΘΘΘ = θθθ ) . (14)The marginal probability, and thus the likelihood, of the 2PL MIRT model can be obtainedby integrating over distribution f ( θθθ ) of ΘΘΘ:Pr( XXX = xxx ) = Z ∞−∞ f ( θθθ ) Pr( XXX = xxx | ΘΘΘ = θθθ ) d θθθ, (15)in which the integral is over all M latent variables. For typical distributions of ΘΘΘ, such asa multivariate Gaussian distribution, this likelihood does not have a closed form solution.Furthermore, as M grows it becomes hard to numerically approximate (15). However, if thedistribution of ΘΘΘ is chosen such that it is conditionally Gaussian—the posterior distributionof ΘΘΘ given that we observed XXX = xxx takes a Gaussian form—we can obtain a closed formETWORK PSYCHOMETRICS 13solution for (15). Furthermore, this closed form solution is, in fact, the Ising model aspresented in (8).As also shown by Marsman et al. (2015) and in more detail in Appendix A of thischapter, after reparameterizing τ i = − δ i and − q λ j / q ij = α ij , in which q ij is the i thelement of the j th eigenvector of ΩΩΩ (with an arbitrary diagonal chosen such that ΩΩΩ ispositive deﬁnite), the Ising model is equivalent to a MIRT model in which the posteriordistribution of the latent traits is equal to the product of univariate normal distributionswith equal variance: Θ j | X = x ∼ N ± X i a ij x i , r ! . The mean of these univariate posterior distributions for Θ j is equal to the weighted sumscore ± P i a ij x i . Finally, since f ( θθθ ) = X xxx f ( θθθ | XXX = xxx ) Pr( XXX = xxx ) , we can see that the marginal distribution of ΘΘΘ in (15) is a mixture of multivariate Gaussiandistributions with homogenous variance–covariance , with the mixing probability equal tothe marginal probability of observing each response pattern.Whenever α ij = 0 for all i and some dimension j —i.e., none of the items discriminateon the latent trait—we can see that the marginal distribution of Θ j becomes a Gaussiandistribution with mean 0 and standard-deviation p /

2. This corresponds to completerandomness; all states are equally probable given the latent trait. When discriminationparameters diverge from 0, the probability function becomes concentrated on particularresponse patterns. For example, in case X designates the response variable for a very easyitem, while X is the response variable for a very hard item, the state in which the ﬁrst itemis answered correctly and the second incorrectly becomes less likely. This corresponds to adecrease in entropy and, as can be seen in (10), is related to the temperature of the system.The lower the temperature, the more the system prefers to be in states in which all itemsare answered correctly or incorrectly. When this happens, the distribution of Θ j divergesfrom a Gaussian distribution and becomes a bi-modal distribution with two peaks, centeredon the weighted sumscores that correspond to situations in which all items are answeredcorrectly or incorrectly. If the entropy is relatively high, f (Θ j ) can be well approximated bya Gaussian distribution, whereas if the entropy is (extremely) low a mixture of two Gaussiandistributions best approximates f (Θ j ).For example, consider again the network structure of Figure 1. When we parameter-ized all threshold functions τ = τ = τ = − . ω = ω = 0 .  . . .

50 0 .  , which is not positive semi-deﬁnite. Subtracting the lowest eigenvalue, − .  .

707 0 . . .

707 0 .

50 0 . .  . It’s eigenvalue decomposition is as follows:

QQQ =  .

500 0 .

707 0 . .

707 0 . − . . − .

707 0 .  λλλ = h .

414 0 .

707 0 . i . Using the transformations τ i = − δ i and − q λ j / q ij = α ij (arbitrarily using the negativeroot) deﬁned above we can then form the equivalent MIRT model with discriminationparameters AAA and diﬃculty parameters δδδ : δδδ = h . . . i AAA =  .

841 0 .

841 01 .

189 0 00 . − .

841 0  . Thus, the model in Figure 1 is equivalent to a model with two latent traits: one deﬁningthe general coherence between all three nodes and one deﬁning the contrast between theﬁrst and the third node. The distributions of all three latent traits can be seen in Figure 4.In Table 1, we see that the probability is the highest for the two states in which all threenodes take the same value. This is reﬂected in the distribution of the ﬁrst latent trait in ?? : because all discrimination parameters relating to this trait are positive, the weightedsumscores of X = X = X = − X = X = X = 1 are dominant and causea small bimodality in the distribution. For the second trait, ?? shows an approximatelynormal distribution, because this trait acts as a contrast and cancels out the preference forall variables to be in the same state. Finally, the third latent trait is nonexistent, sinceall of its discrimination parameters equal 0; ?? simply shows a Gaussian distribution withstandard deviation q .This proof serves to demonstrate that the Ising model is equivalent to a MIRT modelwith a posterior Gaussian distribution on the latent traits; the discrimination parametercolumn vector α j α j α j —the item discrimination parameters on the j th dimension—is directlyrelated to the j th eigenvector of the Ising model graph structure ΩΩΩ, scaled by its j th eigen-vector. Thus, the latent dimensions are orthogonal, and the rank of ΩΩΩ directly correspondsto the number of latent dimensions. In the case of a Rasch model, the rank of ΩΩΩ should be 1and all ω ij should have exactly the same value, corresponding to the common discriminationparameter; for the uni-dimensional Birnbaum model the rank of ΩΩΩ still is 1 but now the ω ij parameters can vary between items, corresponding to diﬀerences in item discrimination.The use of a posterior Gaussian distribution to obtain a closed form solution for (15)is itself not new in the psychometric literature, although it has not previously been linkedETWORK PSYCHOMETRICS 15 −2 0 2 q (a) −2 0 2 q (b) −2 0 2 q (c) Figure 4 . The distributions of the three latent traits in the equivalent MIRT model to theIsing model from Figure ?? to the Ising model and the literature related to it. Olkin and Tate (1961) already pro-posed to model binary variables jointly with conditional Gaussian distributed continuousvariables. Furthermore, Holland (1990) used the “Dutch identity” to show that a represen-tation equivalent to an Ising model could be used to characterize the marginal distributionof an extended Rasch model (Cressie and Holland, 1983). Based on these results, Andersonand colleagues proposed an IRT modeling framework using log-multiplicative associationmodels and assuming conditional Gaussian latents (Anderson and Vermunt, 2000; Andersonand Yu, 2007); this approach has been implemented in the R package “plRasch” (Andersonet al., 2007; Li and Hong, 2014).With our proof we furthermore show that the clique factorization of the networkstructure generated a latent trait with a functional distribution through a mathematicaltrick. Thus, the network perspective and common cause perspectives could be interpreted astwo diﬀerent explanations of the same phenomena: cliques of correlated observed variables.In the next section, we show how the Ising model can be estimated. Estimating the Ising Model

We can use (8) to obtain the log-likelihood function of a realization xxx : L ( τττ , ΩΩΩ; xxx ) = ln Pr (

XXX = xxx ) = X i τ i x i + X ω ij x i x j − ln Z. (16)Note that the constant Z is only constant with regard to xxx (as it sums over all possiblerealizations) and is not a constant with regard to the τ and ω parameters; Z is often calledthe partition function because it is a function of the parameters. Thus, while when samplingfrom the Ising distribution Z does not need to be evaluated, but it does need to be evaluatedwhen maximizing the likelihood function. Estimating the Ising model is notoriously hardbecause the partition function Z is often not tractable to compute (Kolaczyk, 2009). Ascan be seen in (8), Z requires a sum over all possible conﬁgurations of xxx ; computing Z requires summing over 2 k terms, which quickly becomes intractably large as k grows. Thus,maximum likelihood estimation of the Ising model is only possible for trivially small datasets (e.g., k < Z (Sebastiani and Sørbye, 2002; Green and Richardson, 2002; DrydenETWORK PSYCHOMETRICS 16et al., 2003) or circumventing Z entirely via sampling auxiliary variables (Møller et al.,2006; Murray, 2007; Murray et al., 2006). Such sampling algorithms can however still becomputationally costly.Because the Ising model is equivalent to the homogeneous association model in log-linear analysis (Agresti, 1990), the methods used in log-linear analysis can also be used toestimate the Ising model. For example, the iterative proportional ﬁtting algorithm (Haber-man, 1972), which is implemented in the loglin function in the statistical programminglanguage R (R Core Team, 2016), can be used to estimate the parameters of the Isingmodel. Furthermore, log-linear analysis can be used for model selection in the Ising modelby setting certain parameters to zero. Alternatively, while the full likelihood in (8) is hard tocompute, the conditional likelihood for each node in (11) is very easy and does not includean intractable normalizing constant; the conditional likelihood for each node correspondsto a multiple logistic regression (Agresti, 1990): L i ( τττ , ΩΩΩ; xxx ) = x i  τ i + X j ω ij x j  − X x i exp  x i  τ i + X j ω ij x j  . Here, the subscript i indicates that the likelihood function is based on the conditionalprobability for node i given the other nodes. Instead of optimizing the full likelihood of(8), the pseudolikelihood (PL; Besag 1975) can be optimized instead. The pseudolikelihoodapproximates the likelihood with the product of univariate conditional likelihoods in (11):ln PL = k X i =1 L i ( τττ , ΩΩΩ; xxx )Finally, disjoint pseudolikelihood estimation can be used. In this approach, each condi-tional likelihood is optimized separately (Liu and Ihler, 2012). This routine correspondsto repeatedly performing a multiple logistic regression in which one node is the responsevariable and all other nodes are the predictors; by predicting x i from xxx ( − i ) estimates can beobtained for ωωω i and τ i . After estimating a multiple logistic regression for each node on allremaining nodes, a single estimate is obtained for every τ i and two estimates are obtainedfor every ω ij –the latter can be averaged to obtain an estimate of the relevant network pa-rameter. Many statistical programs, such as the R function glm , can be used to performlogistic regressions. Estimation of the Ising model via log-linear modeling, maximal pseu-dolikelihood, and repeated multiple logistic regressions and have been implemented in the EstimateIsing function in the R package IsingSampler (Epskamp, 2014b).While the above-mentioned methods of estimating the Ising model are tractable,they all require a considerable amount of data to obtain reliable estimates. For example,in log-linear analysis, cells in the 2 P contingency table that are zero—which will occuroften if N < P —can cause parameter estimates to grow to ∞ (Agresti, 1990), and inlogistic regression predictors with low variance (e.g., a very hard item) can substantivelyincrease standard errors (Whittaker, 1990). To estimate the Ising model, P thresholds and P ( P − / ‘ regularization–commonly known asthe least absolute shrinkage and selection operator (LASSO; Tibshirani 1996)–in which thesum of absolute parameter values is penalized to be under some value. Ravikumar et al.(2010) employed ‘ -regularized logistic regression to estimate the structure of the Isingmodel via disjoint maximum pseudolikelihood estimation. For each node i the followingexpression is maximized (Friedman et al., 2010):max τ i ,ωωω i [ L i ( τττ , ΩΩΩ; xxx ) − λ Pen ( ωωω i )] (17)Where ωωω i is the i th row (or column due to symmetry) of ΩΩΩ and Pen ( ωωω i ) denotes the penaltyfunction, which is deﬁned in the LASSO as follows:Pen ‘ ( ωωω i ) = || ωωω i || = k X j =1 ,j != i | ω ij | The λ in (17) is the regularization tuning parameter. The problem in above is equivalentto the constrained optimization problem:max τ i ,ωωω i [ L i ( τττ , ΩΩΩ; xxx )] , subject to || ωωω i || < C in which C is a constant that has a one-to-one monotone decreasing relationship with λ (Lee et al., 2006). If λ = 0, C will equal the sum of absolute values of the maximumlikelihood solution; increasing λ will cause C to be smaller, which forces the estimates of ωωω i to shrink. Because the penalization uses absolute values, this causes parameter estimatesto shrink to exactly zero. Thus, in moderately high values for λ a sparse solution to thelogistic regression problem is obtained in which many coeﬃcients equal zero; the LASSOresults in simple predictive models including only a few predictors.Ravikumar et al. (2010) used LASSO to estimate the neighborhood—the connectednodes—of each node, resulting in an unweighted graph structure. In this approach, an edgeis selected in the model if either ω ij and ω ji is nonzero (the OR-rule) or if both are nonzero(the AND-rule). To obtain estimates for the weights ω ij and ω ji can again be averaged. The λ parameter is typically speciﬁed such that an optimal solution is obtained, which is com-monly done through cross-validation or, more recently, by optimizing the extended Bayesianinformation criterion (EBIC; Chen and Chen 2008; Foygel and Drton 2010; Foygel Barberand Drton 2015; van Borkulo et al. 2014).In K -fold cross-validation, the data are subdivided in K (usually K = 10) blocks.For each of these blocks a model is ﬁtted using only the remaining K − λ values, the predictive accuracy of this model can be computed, andsubsequently the λ under which the data were best predicted is chosen. If the sample sizeis relatively low, the predictive accuracy is typically much better for λ > λ = 0; it is preferred to regularize to avoid over-ﬁtting.ETWORK PSYCHOMETRICS 18Alternatively, an information criterion can be used to directly penalize the likelihoodfor the number of parameters. The EBIC (Chen and Chen, 2008) augments the Bayesianinformation Criterion (BIC) with a hyperparameter γ to additionally penalize the largespace of possible models (networks):EBIC = − L i ( τττ , ΩΩΩ; xxx ) + | ωωω i | ln ( N ) + 2 γ | ωωω i | ln ( k − | ωωω i | is the number of nonzero parameters in ωωω i . Setting γ = 0 .

25 works well forthe Ising model (Foygel Barber and Drton, 2015). An optimal λ can be chosen either forthe entire Ising model, which improves parameter estimation, or for each node separately indisjoint pseudolkelihood estimation, which improves neighborhood selection. While K -foldcross-validation does not require the computation of the intractable likelihood function,EBIC does. Thus, when using EBIC estimation λ need be chosen per node. We haveimplemented ‘ -regularized disjoint pseudolikelihood estimation of the Ising model usingEBIC to select a tuning parameter per node in the R package IsingFit (van Borkulo andEpskamp, 2014; van Borkulo et al., 2014), which uses glmnet for optimization (Friedmanet al., 2010).The LASSO works well in estimating sparse network structures for the Ising modeland can be used in combination with cross-validation or an information criterion to arriveat an interpretable model. However, it does so under the assumption that the true modelin the population is sparse. So what if reality is not sparse, and we would not expect manymissing edges in the network? As discussed earlier in this chapter, the absence of edgesindicate conditional independence between nodes; if all nodes are caused by an unobservedcause we would not expect missing edges in the network but rather a low-rank networkstructure. In such cases, ‘ regularization—also called ridge regression—can be used whichuses a quadratic penalty function:Pen ‘ ( ωωω i ) = || ωωω i || = k X j =1 ,j != i ω ij With this penalty parameters will not shrink to exactly zero but more or less smooth out;when two predictors are highly correlated the LASSO might pick only one where ridgeregression will average out the eﬀect of both predictors. Zou and Hastie (2005) proposed acompromise between both penalty functions in the elastic net , which uses another tuningparameter, α , to mix between ‘ and ‘ regularization:Pen ElasticNet ( ωωω i ) = k X j =1 ,j != i

12 (1 − α ) ω ij + α | ω ij | If α = 1, the elastic net reduces to the LASSO penalty, and if α = 0 the elastic netreduces to the ridge penalty. When α > λ and α . Since moving towards ‘ regularization reducessparsity, selection of the tuning parameters using EBIC is less suited in the elastic net.Crossvalidation, however, is still capable of sketching the predictive accuracy for diﬀerentvalues of both α and λ . Again, the R package glmnet (Friedman et al., 2010) can be used forestimating parameters using the elastic net. We have implemented a procedure to computeETWORK PSYCHOMETRICS 19the Ising model for a range of λ and α values and obtain the predictive accuracy in the R package elasticIsing (Epskamp, 2014a).One issue that is currently debated is inference of regularized parameters. Since thedistribution of LASSO parameters is not well-behaved (Bühlmann and van de Geer, 2011;Bühlmann, 2013), Meinshausen et al. (2009) developed the idea of using repeated samplesplitting, where in the ﬁrst sample the sparse set of variables are selected, followed bymultiple comparison corrected p -values in the second sample. Another interesting idea isto remove the bias introduced by regularization, upon which ‘standard’ procedures can beused (van de Geer et al., 2013). As a result the asymptotic distribution of the so-calledde-sparsiﬁed LASSO parameters is normal with the true parameter as mean and eﬃcientvariance (i.e., achieves the Cramér-Rao bound).. Standard techniques are then applied andeven conﬁdence intervals with good coverage are obtained. The limitations here are (i) thesparsity level, which has to be ≤ p n/ ln( P ), and (ii) the ’beta-min’ assumption, whichimposes a lower bound on the value of the smallest obtainable coeﬃcient (Bühlmann andvan de Geer, 2011).Finally, we can use the equivalence between MIRT and the Ising model to estimatea low-rank approximation of the Ising Model. MIRT software, such as the R package mirt (Chalmers, 2012), can be used for this purpose. More recently, Marsman et al. (2015) haveused the equivalence also presented in this chapter as a method for estimating low-rankIsing model using Full-data-information estimation. A good approximation of the Isingmodel can be obtained if the true Ising model is indeed low-rank, which can be checked bylooking at the eigenvalue decomposition of the elastic Net approximation or by sequentiallyestimating the ﬁrst eigenvectors through adding more latent factors in the MIRT analysisor estimating sequentially higher rank networks using the methodology of Marsman et al.(2015). Example Analysis

To illustrate the methods described in this chapter we simulated two datasets, bothwith 500 measurements on 10 dichotomous scored items. The ﬁrst dataset, dataset A, wassimulated according to a multidimensional Rasch model, in which the ﬁrst ﬁve items aredetermined by the ﬁrst factor and the last ﬁve items by the second factor. Factor levelswhere sampled from a multivariate normal distribution with unit variance and a correlationof 0 .

5, while item diﬃculties where sampled from a standard normal distribution. Thesecond dataset, dataset B, was sampled from a sparse network structure according to aBoltzmann Machine. A scale-free network was simulated using the Barabasi game algorithm(Barabási and Albert, 1999) in the R package igraph (Csardi and Nepusz, 2006) and arandom connection probability of 5%. The edge weights where subsequently sampled froma uniform distribution between 0 .

75 and 1 (in line with the conception that most itemsin psychometrics relate positively with each other) and thresholds where sampled froma uniform distribution between − −

1. To simulate the responses the R package IsingSampler was used. The datasets where analyzed using the elasticIsing package in R (Epskamp, 2014a); 10-fold cross-validation was used to estimate the predictive accuracy oftuning parameters λ and α on a grid of 100 logarithmically spaced λ values between 0 . α values equally spaced between 0 and 1.ETWORK PSYCHOMETRICS 20 La m bda A l pha A cc u r a cy (a) La m bda A l pha A cc u r a cy (b)(c) (d) l l l l l l l l l l l l l l l l l l l l Component E i gen v a l ue (e) l l l l l l l l l l l l l l l l l l l l Component E i gen v a l ue (f) Figure 5 . Analysis results of two simulated datasets; left panels show results based ona dataset simulated according to a 2-factor MIRT Model, while right panels show resultsbased on a dataset simulated with a sparse scale-free network. Panels (a) and (b) show thepredictive accuracy under diﬀerent elastic net tuning parameters λ and α , panels (c) and(d) the estimated optimal graph structures and panels (e) and (f) the eigenvalues of thesegraphs.ETWORK PSYCHOMETRICS 21Figure 5 shows the results of the analyses. The left panels show the results for datasetA and the right panel shows the result for dataset B. The top panels show the negativemean squared prediction error for diﬀerent values of λ and α . In both datasets, regularizedmodels perform better than unregularized models. The plateaus on the right of the graphsshow the performance of the independence graph in which all network parameters are setto zero. Dataset A obtained a maximum accuracy at α = 0 and λ = 0 . ‘ -regularization is preferred over ‘ regularization, which is to be expected since thedata were simulated under a model in which none of the edge weights should equal zero.In dataset B a maximum was obtained at α = 0 .

960 and λ = 0 . ‘ is preferred. The middle panels show visualizationsof the obtained best performing networks made with the qgraph package (Epskamp et al.,2012); green edges represent positive weights, red edges negative weights and the widerand more saturated an edge the stronger the absolute weight. It can be seen that datasetA portrays two clusters while Dataset B portrays a sparse structure. Finally, the bottompanels show the eigenvalues of both graphs; Dataset A clearly indicates two dominantcomponents whereas Dataset B does not indicate any dominant component.These results show that the estimation techniques perform adequately, as expected.As discussed earlier in this chapter, the eigenvalue decomposition directly corresponds tothe number of latent variables present if the common cause model is true, as is the case indataset A. Furthermore, if the common cause model is true the resulting graph should notbe sparse but low rank, as is the case in the results on dataset A. The Interpretation of Latent Variables in Psychometric Models

Since Spearman’s (1904) conception of general intelligence as the common determi-nant of observed diﬀerences in cognitive test scores, latent variables have played a centralrole in psychometric models. The theoretical status of the latent variable in psychometricmodels has been controversial and the topic of heated debates in various subﬁelds of psychol-ogy, like those concerned with the study of intelligence (e.g., Jensen 1998) and personality(McCrae and Costa, 2008). The pivotal issue in these debates is whether latent variablesposited in statistical models have referents outside of the model; that is, the central questionis whether latent variables like g in intelligence or “extraversion” in personality research re-fer to a property of individuals that exists independently of the model ﬁtting exercise of theresearcher (Borsboom et al., 2003; Van Der Maas et al., 2006; Cramer et al., 2010). If theydo have such independent existence, then the model formulation appears to dictate a causalrelation between latent and observed variables, in which the former cause the latter; afterall, the latent variable has all the formal properties of a common cause because it screens oﬀthe correlation between the item responses (a property denoted local independence in thepsychometric literature; Borsboom 2005; Reichenbach 1991). The condition of vanishingtetrads , that Spearman (1904) introduced as a model test for the veracity of the commonfactor model is currently seen as one of the hallmark conditions of the common cause model(Bollen and Lennox, 1991).This would suggest that the latent variable model is intimately intertwined with aso-called reﬂective measurement model interpretation (Edwards and Bagozzi, 2000; Howellet al., 2007), also known as an eﬀect indicators model (Bollen and Lennox, 1991) in which themeasured attribute is represented as the cause of the test scores. This conceptualizationETWORK PSYCHOMETRICS 22is in keeping with causal accounts of measurement and validity (Borsboom et al., 2003;Markus and Borsboom, 2013) and indeed seems to ﬁt the intuition of researchers in ﬁeldswhere psychometric models dominate, like personality. For example, McCrae and Costa(2008) note that they assume that extraversion causes party-going behavior, and as suchthis trait determines the answer to the question “do you often go to parties” in a causalfashion. Jensen (1998) oﬀers similar ideas on the relation between intelligence and the g -factor. Also, in clinical psychology, Reise and Waller (2009, p. 26) note that “to modelitem responses to a clinical instrument [with IRT], a researcher must ﬁrst assume that theitem covariation is caused by a continuous latent variable”.However, not all researchers are convinced that a causal interpretation of the rela-tion between latent and observed variable makes sense. For instance, McDonald (2003)notes that the interpretation is somewhat vacuous as long as no substantive theoreticalof empirical identiﬁcation of the latent variable can be given; a similar point is made byBorsboom and Cramer (2013). That is, as long as the sole evidence for the existence of alatent variable lies in the structure of the data to which it is ﬁtted, the latent variable ap-pears to have a merely statistical meaning and to grant such a statistical entity substantivemeaning appears to be tantamount to overinterpreting the model. Thus, the common causeinterpretation of latent variables at best enjoys mixed support.A second interpretation of latent variables that has been put forward in the literatureis one in which latent variables do not ﬁgure as common causes of the item responses, but asso-called behavior domains. Behavior domains are sets of behaviors relevant to substantiveconcepts like intelligence, extraversion, or cognitive ability (Mulaik and McDonald, 1978;McDonald, 2003). For instance, one can think of the behavior domain of addition as beingdeﬁned through the set of all test items of the form x + y = . . . . The actual items in a test areconsidered to be a sample from that domain. A latent variable can then be conceptualizedas a so-called tail-measure deﬁned on the behavior domain (Ellis and Junker, 1997). Onecan intuitively think of this as the total test score of a person on the inﬁnite set of itemsincluded in the behavior domain. Ellis and Junker (1997) have shown that, if the itemresponses included in the domain satisfy the properties of monotonicity, positive association,and vanishing conditional independence, the latent variable can indeed be deﬁned as a tailmeasure. The relation between the item responses and the latent variable is, in this case,not sensibly construed as causal, because the item responses are a part of the behaviordomain; this violates the requirement, made in virtually all theories of causality, that causeand eﬀect should be separate entities (Markus and Borsboom, 2013). Rather, the relationbetween item responses and latent variable is conceptualized as a sampling relation, whichmeans the inference from indicators to latent variable is not a species of causal inference,but of statistical generalization.Although in some contexts the behavior domain interpretation does seem plausible, ithas several theoretical shortcomings of its own. Most importantly, the model interpretationappears to beg the important explanatory question of why we observe statistical associationsbetween item responses. For instance, Ellis and Junker (1997) manifest conditions specifythat the items included in a behavior domain should look exactly as if they were generatedby a common cause; in essence, the only sets of items that would qualify as behavior domainsare inﬁnite sets of items that would ﬁt a unidimensional IRT model perfectly. The questionof why such sets would ﬁt a unidimensional model is thus left open in this interpretation.ETWORK PSYCHOMETRICS 23A second problem is that the model speciﬁes inﬁnite behavior domains (measures on ﬁnitedomains cannot be interpreted as latent variables because the axioms of Ellis and Junkerwill not be not satisﬁed in this case). In many applications, however, it is quite hard tocome up with more than a few dozen of items before one starts repeating oneself (e.g.,think of psychopathology symptoms or attitude items), and if one does come up with largersets of items the unidimensionality requirement is typically violated. Even in applicationsthat would seem to naturally suit the behavior domain interpretation, like the additionability example given earlier, this is no trivial issue. Thus, the very property that buys thebehavior domain interpretation its theoretical force (i.e., the construction of latent variablesas tail measures on an inﬁnite set of items that satisﬁes a unidimensional IRT model) is itssubstantive Achilles’ heel.Thus, the common cause interpretation of the latent variable model seems too makeassumptions about the causal background of test scores that appear overly ambitious giventhe current scientiﬁc understanding of test scores. The behavior domain interpretation ismuch less demanding, but appears to be of limited use in situations where only a limitednumber of items is of interest and in addition oﬀers no explanatory guidance with respectto answering the question why items hang together as they do. The network model mayoﬀer a way out of this theoretical conundrum because it speciﬁes a third way of lookingat latent variables, as explained in this chapter. As Van Der Maas et al. (2006) showed,data generated under a network model could explain the positive manifold often found inintelligence research which is often described as the g factor or general intelligence; a g factor emerged from a densely connected network even though it was not “real”. This ideasuggests the interpretation of latent variables as functions deﬁned as cliques in a networkof interacting components (Borsboom et al., 2011; Cramer et al., 2010, 2012). As we haveshown in this chapter, this relation between networks and latent variables is quite general:given simple models of the interaction between variables, as encoded in the Ising model, oneexpects data that conform to psychometric models with latent variables. The theoreticalimportance of this result is that (a) it allows for a model interpretation that invokes nocommon cause of the item responses as in the reﬂective model interpretation, but (b) doesnot require assumptions about inﬁnite behavior domains either.Thus, network approaches can oﬀer a theoretical middle ground between causal andsampling interpretations of psychometric models. In a network, there clearly is nothing thatcorresponds to a causally eﬀective latent variable, as posited in the reﬂective measurementmodel interpretation (Bollen and Lennox, 1991; Edwards and Bagozzi, 2000). The networkmodel thus evades the problematic assignment of causal force to latent variables like theg-factor and extraversion. These arise out of the network structure as epiphenomena; totreat them as causes of item responses involves an unjustiﬁed reiﬁcation. On the other hand,however, the latent variable model as it arises out of a network structure does not requirethe antecedent identiﬁcation of an inﬁnite set of response behaviors as hypothesized to existin behavior domain theory. Networks are typically ﬁnite structures that involve a limitednumber of nodes engaged in a limited number of interactions. Each clique in the networkstructure will generate one latent variable with entirely transparent theoretical propertiesand an analytically tractable distribution function. Of course, for a full interpretation ofthe Ising model analogous to that in physics, one has to be prepared to assume that theconnections between nodes in the network signify actual interactions (i.e., they are notETWORK PSYCHOMETRICS 24merely correlations); that is, connections between nodes are explicitly not spurious as theyare in the reﬂective latent variable model, in which the causal eﬀect of the latent variableproduces the correlations between item responses. But if this assumption is granted, thetheoretical status of the ensuing latent variable is transparent and may in many contextsbe less problematic than the current conceptions in terms of reﬂective measurement modelsand behavior domains are.Naturally, even though the Ising and IRT models have statistically equivalent repre-sentations, the interpretations of the model in terms of common causes and networks arenot equivalent. That is, there is a substantial diﬀerence between the causal implications of areﬂective latent variable model and of an Ising model. However, because for a given datasetthe models are equivalent, distinguishing network models from common cause models re-quires the addition of (quasi-) experimental designs into the model. For example, supposethat in reality an Ising model holds for a set of variables; say we consider the depressionsymptoms “insomnia” and “feelings of worthlessness”. The model implies that, if we wereto causally intervene on the system by reducing or increasing insomnia, a change in feelingsof worthlessness should ensue. In the latent variable model, in which the association be-tween feelings of worthlessness and insomnia is entirely due to the common inﬂuence of alatent variable, an experimental intervention that changes insomnia will not be propagatedthrough the system. In this case, the intervention variable will be associated only withinsomnia, which means that the items will turn out to violate measurement invariance withrespect to the intervention variable (Mellenbergh, 1989; Meredith, 1993). Thus, interven-tions on individual nodes in the system can propagate to other nodes in a network model,but not in a latent variable model. This is a testable implication in cases where one hasexperimental interventions that plausibly target a single node in the system. Fried et al.(2014) have identiﬁed a number of factors in depression that appear to work in this way.Note that a similar argument does not necessarily work with variables that are causalconsequences of the observed variables. Both in a latent variable model and in a networkmodel, individual observed variables might have distinct outgoing eﬀects, i.e., aﬀect uniquesets of external variables. Thus, insomnia may directly cause bags under the eyes, whilefeelings of worthlessness do not, without violating assumptions of either model. In thenetwork model, this is because the outgoing eﬀects of nodes do not play a role in the networkif they do not feed back into the nodes that form the network. In the reﬂective model,this is because the model only speaks on the question of where the systematic variance inindicator variables comes from (i.e., this is produced by a latent variable), but not on whatthat systematic variance causes. As an example, one may measure the temperature of waterby either putting a thermometer into the water, or by testing whether one can boil an eggin it. Both the thermometer reading and the boiled egg are plausibly construed as eﬀects ofthe temperature in the water (the common cause latent variable in the system). However,only the boiled egg has the outgoing eﬀect of satisfying one’s appetite.In addition to experimental interventions on the elements of the system, a networkmodel rather than a latent variable model allows one to deduce what would happen uponchanging the connectivity of the system. In a reﬂective latent variable model, the associa-tions between variables are a function of the eﬀect of the latent variable and the amount ofnoise present in the individual variables. Thus, the only ways to change the correlation be-tween items is by changing the eﬀect of the latent variable (e.g., by restricting the varianceETWORK PSYCHOMETRICS 25in the latent variable so as to produce restriction of range eﬀects in the observables) or byincreasing noise in the observed variables (e.g., by increasing variability in the conditionsunder which the measurements are taken). Thus, in a standard reﬂective latent variablemodel, the connection between observed variables is purely a correlation, and one can onlychange it indirectly through the variable that have proper causal roles in the system (i.e.,latent variables and error variables).However, in a network model, the associations between observed variables are notspurious; they are real, causally potent pathways, and thus externally forced changes inconnection strengths can be envisioned. Such changes will aﬀect the behavior of the systemin a way that can be predicted from the model structure. For example, it is well knownthat increasing the connectivity of an Ising model can change its behavior from being linear(in which the total number of active nodes grows proportionally to the strength of externalperturbations of the system) to being highly nonlinear. Under a situation of high connec-tivity, an Ising network features tipping points: in this situation, very small perturbationscan have catastrophic eﬀects. To give an example, a weakly connected network of depres-sion symptoms could only be made depressed by strong external eﬀects (e.g., the death ofa spouse), whereas a strongly connected network could tumble into a depression throughsmall perturbations (e.g., an annoying phone call from one’s mother in law). Such a vul-nerable network will also feature very speciﬁc behavior; for instance, when the network isapproaching a transition, it will send out early warning signals like increased autocorrela-tion in a time series (Scheﬀer et al., 2009). Recent investigations suggest that such signalsare indeed present in time series of individuals close to a transition (van de Leemput et al.,2014). Latent variable models have no such consequences.Thus, there are at least three ways in which network models and reﬂective latentvariable models can be distinguished: through experimental manipulations of individualnodes, through experimental manipulations of connections in the network, and throughinvestigation of the behavior of systems under highly frequent measurements that allowone to study the dynamics of the system in time series. Of course, a ﬁnal and directrefutation of the network model would occur if one could empirically identify a latentvariable (e.g., if one could show that the latent variable in a model for depression items wasin fact identical with a property of the system that could be independently identiﬁed; say,serotonin shortage in the brain). However, such identiﬁcations of abstract psychometriclatent variables with empirically identiﬁable common causes do not appear forthcoming.Arguably, then, psychometrics may do better to bet on network explanations of associationpatterns between psychometric variables than to hope for the empirical identiﬁcation oflatent common causes. Conclusion

The correspondence between the Ising model and the MIRT model oﬀers novel inter-pretations of long standing psychometric models, but also opens a gateway through whichthe psychometric can be connected to the physics literature. Although we have only begunto explore the possibilities that this connection may oﬀer, the results are surprising and,in our view, oﬀer a fresh look on the problems and challenges of psychometrics. In thecurrent chapter, we have illustrated how network models could be useful in the conceptu-alization of psychometric data. The bridge between network models and latent variablesETWORK PSYCHOMETRICS 26oﬀers research opportunities that range from model estimation to the philosophical analysisof measurement in psychology, and may very well alter our view of the foundations on whichpsychometric models should be built.As we have shown, network models may yield probability distributions that are exactlyequivalent to this of IRT models. This means that latent variables can receive a novelinterpretation: in addition to an interpretation of latent variables as common causes of theitem responses (Bollen & Lennox, 1991; Edwards & Bagozzi, 2000), or as behavior domainsfrom which the responses are a sample (Ellis and Junker, 1997; McDonald, 2003), we can nowalso conceive of latent variables as mathematical abstractions that are deﬁned on cliquesof variables in a network. The extension of psychometric work to network modeling ﬁtscurrent developments in substantive psychology, in which network models have often beenmotivated by critiques of the latent variable paradigm. This has for instance happened inthe context of intelligence research (Van Der Maas et al., 2006), clinical psychology (Crameret al., 2010; Borsboom and Cramer, 2013), and personality (Cramer et al., 2012; Costantiniet al., 2015). It should be noted that, in view of the equivalence between latent variablemodels and network models proven here, even though these critiques may impinge on thecommon cause interpretation of latent variable models, they do not directly apply to latentvariable models themselves. Latent variable models may in fact ﬁt psychometric data well because these data result from a network of interacting components. In such a case, thelatent variable should be thought of as a convenient ﬁction, but the latent variable modelmay nevertheless be useful; for instance, as we have argued in the current chapter, theMIRT model can be proﬁtably used to estimate the parameters of a (low rank) network.Of course, the reverse holds as well: certain network structures may ﬁt the data becausecliques of connected network components result from unobserved common causes in thedata. An important question is under which circumstances the equivalence between theMIRT model and the Ising model breaks down, i.e., which experimental manipulationsor extended datasets could be used to decide between a common cause versus a networkinterpretation of the data. In the current paper, we have oﬀered some suggestions for furtherwork in this direction, which we think oﬀers considerable opportunities for psychometricprogress.As psychometrics starts to deal with network models, we think the Ising model of-fers a canonical form for network psychometrics, because it deals with binary data andis equivalent to well-known models from IRT. The Ising model has several intuitive inter-pretations: as a model for interacting components, as an association model with at mostpairwise interactions, and as the joint distribution of response and predictor variables ina logistic regression. Especially the analogy between networks of psychometric variables(e.g., psychopathology symptoms such as depressed mood, fatigue, and concentration loss)and networks of interacting particles (e.g., as in the magnetization examples) oﬀers sugges-tive possibilities for the construction of novel theoretical accounts of the relation betweenconstructs (e.g., depression) and observables as modeled in psychometrics (e.g., symptoma-tology). In the current chapter, we only focused on the Ising model for binary data, but ofcourse the work we have ignited here invites extensions in various other directions. For ex-ample, for polymotous data, the generalized Potts model could be used, although it shouldbe noted that this model does require the response options to be discrete values that areshared over all variables, which may not suit typical psychometric applications. AnotherETWORK PSYCHOMETRICS 27popular type of PMRF is the Gaussian Random Field (GRF; Lauritzen 1996), which hasexactly the same form as the model in (18) except that now x is continuous and assumedto follow a multivariate Gaussian density. This model is considerably appealing as it hasa tractable normalizing constant rather than the intractable partition function of the Isingmodel. The inverse of the covariance matrix—the precision matrix—can be standardizedas a partial correlation matrix and directly corresponds to the Ω matrix of the Ising model.Furthermore, where the Ising model reduces to a series of logistic regressions for each node,the GRF reduces to a multiple linear regression for each node. It can easily be proventhat also in the GRF the rank of the (partial) correlation matrix—cliques in the network—correspond to the latent dimensionality if the common cause model is true (Chandrasekaranet al., 2012). A great body of literature exists on estimating and ﬁtting GRFs even when theamount of observations is limited versus the amount of nodes (Meinshausen and Bühlmann,2006; Friedman et al., 2008; Foygel and Drton, 2010). Furthermore, promising methods arenow available for the estimation of a GRF even in non-Gaussian data, provided the dataare continuous (Liu et al., 2009, 2012). ReferencesAggen, S. H., Neale, M. C., and Kendler, K. S. (2005). DSM criteria for major depres-sion: evaluating symptom patterns using latent-trait item response models. PsychologicalMedicine , 35(04):475–487.Agresti, A. (1990).

Categorical data analysis . John Wiley & Sons Inc, New York, NY.Anderson, C. J., Li, Z., and Vermunt, J. (2007). Estimation of models in the rasch family forpolytomous items and multiple latent variables.

Journal of Statistical Software , 20(6):1–36.Anderson, C. J. and Vermunt, J. K. (2000). Log-multiplicative association models as latentvariable models for nominal and/or ordinal data.

Sociological Methodology , 30(1):81–121.Anderson, C. J. and Yu, H.-T. (2007). Log-multiplicative association models as item re-sponse models.

Psychometrika , 72(1):5–23.Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. science ,286(5439):509–512.Besag, J. (1975). Statistical analysis of non-lattice data.

The statistician , 24:179–195.Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’sability. In Lord, F. and Novick, M., editors,

Statistical theories of mental test scores .Addison-Wesley, Reading, MA.Bollen, K. and Lennox, R. (1991). Conventional Wisdom on Measurement: A StructuralEquation Perspective.

Psychological Bulletin , 110(2):305–314.Borsboom, D. (2005).

Measuring the mind: Conceptual issues in contemporary psychomet-rics . Cambridge University Press, Cambridge, UK.ETWORK PSYCHOMETRICS 28Borsboom, D. and Cramer, A. O. J. (2013). Network analysis: an integrative approach tothe structure of psychopathology.

Annual review of clinical psychology , 9:91–121.Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., and Waldorp, L. J.(2011). The small world of psychopathology.

PloS one , 6(11):e27407.Borsboom, D., Mellenbergh, G. J., and Van Heerden, J. (2003). The theoretical status oflatent variables.

Psychological review , 110(2):203–219.Bühlmann, P. (2013). Statistical signiﬁcance in high-dimensional linear models.

Bernoulli ,19(4):1212–1242.Bühlmann, P. and van de Geer, S. (2011).

Statistics for High-Dimensional Data: Methods,Theory and Applications . Springer, New York, NY, USA.Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the Renvironment.

Journal of Statistical Software , 48(6):1–29.Chandrasekaran, V., Parrilo, P. A., and Willsky, A. S. (2012). Latent variable graphicalmodel selection via convex optimization (with discussion).

The Annals of Statistics ,40(4):1935–1967.Chen, J. and Chen, Z. (2008). Extended bayesian information criteria for model selectionwith large model spaces.

Biometrika , 95(3):759–771.Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mõttus, R., Waldorp, L. J., andCramer, A. O. J. (2015). State of the aRt personality research: A tutorial on networkanalysis of personality data in R.

Journal of Research in Personality , 54:13–29.Cox, D. R. (1972). The analysis of multivariate binary data.

Applied statistics , 21:113–120.Cox, D. R. and Wermuth, N. (1994). A note on the quadratic exponential binary distribu-tion.

Biometrika , 81(2):403–408.Cramer, A. O. J., Sluis, S., Noordhof, A., Wichers, M., Geschwind, N., Aggen, S. H.,Kendler, K. S., and Borsboom, D. (2012). Dimensions of normal personality as networksin search of equilibrium: You can’t like parties if you don’t like people.

European Journalof Personality , 26(4):414–431.Cramer, A. O. J., Waldorp, L., van der Maas, H., and Borsboom, D. (2010). Comorbidity:A Network Perspective.

Behavioral and Brain Sciences , 33(2-3):137–150.Cressie, N. and Holland, P. W. (1983). Characterizing the manifest probabilities of latenttrait models.

Psychometrika , 48(1):129–141.Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network re-search.

InterJournal , Complex Systems:1695.Dryden, I. L., Scarr, M. R., and Taylor, C. C. (2003). Bayesian texture segmentation ofweed and crop images using reversible jump markov chain monte carlo methods.

Journalof the Royal Statistical Society: Series C (Applied Statistics) , 52(1):31–50.ETWORK PSYCHOMETRICS 29Edwards, J. and Bagozzi, R. (2000). On the Nature and Direction of Relationships BetweenConstructs and Measures.

Psychological Methods , 5(2):155–174.Ellis, J. L. and Junker, B. W. (1997). Tail-measurability in monotone latent variable models.

Psychometrika , 62(4):495–523.Epskamp, S. (2014a). elasticIsing: Ising network estimation using Elastic net and k-foldcross-validation . R package version 0.1.Epskamp, S. (2014b).

IsingSampler: Sampling methods and distribution functions for theIsing model . R package version 0.1.1.Epskamp, S., Cramer, A., Waldorp, L., Schmittmann, V. D., and Borsboom, D. (2012).qgraph: Network visualizations of relationships in psychometric data.

Journal of Statis-tical Software , 48(1):1–18.Fischer, G. H. (1974).

Einführung in die Theorie psychologischer tests: Grundlagen undAnwendungen . Huber, Bern, CH.Fitzmaurice, G. M., Laird, N. M., and Rotnitzky, A. G. (1993). Regression models fordiscrete longitudinal responses.

Statistical Science , 8:284–299.Foygel, R. and Drton, M. (2010). Extended Bayesian information criteria for Gaussiangraphical models.

Advances in Neural Information Processing Systems , 23:2020–2028.Foygel Barber, R. and Drton, M. (2015). High-dimensional Ising model selection withbayesian information criteria.

Electronic Journal of Statistics , 9(1):567âĂŞ607.Fried, E. I., Nesse, R. M., Zivin, K., Guille, C., and Sen, S. (2014). Depression is morethan the sum score of its parts: individual DSM symptoms have diﬀerent risk factors.

Psychological medicine , 44(10):2067–2076.Friedman, J. H., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimationwith the graphical lasso.

Biostatistics , 9(3):432–441.Friedman, J. H., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalizedlinear models via coordinate descent.

Journal of Statistical Software , 33(1):1–22.Green, P. J. and Richardson, S. (2002). Hidden markov models and disease mapping.

Journal of the American statistical association , 97(460):1055–1070.Haberman, S. J. (1972). Log-linear ﬁt for contingency tables—algorithm AS51.

AppliedStatistics , 21:218–225.Holland, P. W. (1990). The dutch identity: A new tool for the study of item responsemodels.

Psychometrika , 55(1):5–18.Howell, R. D., Breivik, E., and Wilcox, J. B. (2007). Reconsidering formative measurement.

Psychological methods , 12(2):205–218.ETWORK PSYCHOMETRICS 30Ising, E. (1925). Beitrag zur theorie des ferromagnetismus.

Zeitschrift für Physik A Hadronsand Nuclei , 31(1):253–258.Jensen, A. R. (1998).

The g factor: The science of mental ability . Praeger, Westport, CT,USA.Kac, M. (1966). Mathematical mechanism of phase transition . Gordon & Breach, NewYork, NY, USA.Kindermann, R., Snell, J. L., et al. (1980).

Markov random ﬁelds and their applications ,volume 1. American Mathematical Society, Providence, RI, USA.Kolaczyk, E. D. (2009).

Statistical analysis of network data . Springer, New York, NY, USA.Lauritzen, S. L. (1996).

Graphical models . Clarendon Press, Oxford, UK.Lee, S.-I., Lee, H., Abbeel, P., and Ng, A. Y. (2006). Eﬃcient ‘ regularized logisticregression. In Proceedings of the National Conference on Artiﬁcial Intelligence , volume 21,page 401. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Li, Z. and Hong, F. (2014). plRasch: Log Linear by Linear Association models and Raschfamily models by pseudolikelihood estimation . R package version 1.0.Lin, K.-Y. (1992). Spontaneous magnetization of the Ising model.

Chinese Journal ofPhysics , 30(3):287–319.Liu, H., Han, F., Yuan, M., Laﬀerty, J. D., and Wasserman, L. (2012). High-dimensionalsemiparametric gaussian copula graphical models.

The Annals of Statistics , 40(4):2293–2326.Liu, H., Laﬀerty, J. D., and Wasserman, L. (2009). The nonparanormal: Semiparametricestimation of high dimensional undirected graphs.

The Journal of Machine LearningResearch , 10:2295–2328.Liu, Q. and Ihler, A. (2012). Distributed parameter estimation via pseudo-likelihood.

Pro-ceedings of the International Conference on Machine Learning (ICML) .Markus, K. A. and Borsboom, D. (2013). Reﬂective measurement models, behavior domains,and common causes.

New Ideas in Psychology , 31(1):54–64.Marsman, M., Maris, G., Bechger, T., and Glas, C. (2015). Bayesian inference for low-rankising networks.

Scientiﬁc reports , 5(9050):1–7.McCrae, R. R. and Costa, P. T. (2008). Empirical and theoretical status of the ﬁve-factormodel of personality traits.

Sage handbook of personality theory and assessment , 1:273–294.McDonald, R. P. (2003). Behavior domains in theory and in practice.

Alberta Journal ofEducational Research , 49:212–230.ETWORK PSYCHOMETRICS 31Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selectionwith the lasso.

The annals of statistics , pages 1436–1462.Meinshausen, N., Meier, L., and Bühlmann, P. (2009). P-values for high-dimensional re-gression.

Journal of the American Statistical Association , 104(488):1671–1681.Mellenbergh, G. J. (1989). Item bias and item response theory.

International Journal ofEducational Research , 13(2):127–143.Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance.

Psychometrika , 58(4):525–543.Molenaar, P. C. M. (2003). State space techniques in structural equation modeling.Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). An eﬃcient markovchain monte carlo method for distributions with intractable normalising constants.

Biometrika , 93(2):451–458.Mulaik, S. A. and McDonald, R. P. (1978). The eﬀect of additional variables on factorindeterminacy in models with a single common factor.

Psychometrika , 43(2):177–192.Murphy, K. P. (2012).

Machine learning: a probabilistic perspective . MIT press, Cambridge,MA, USA.Murray, I. (2007).

Advances in Markov chain Monte Carlo methods . PhD thesis, GatsbyComputational neuroscience unit, University College London.Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). MCMC for doubly-intractabledistributions. In

Uncertainty in Artiﬁcial Intellegence (UAI) , pages 359–366. AUAI Press.Olkin, I. and Tate, R. F. (1961). Multivariate correlation models with mixed discrete andcontinuous variables.

The Annals of Mathematical Statistics , 32:448–465.Pearl, J. (2000).

Causality: Models, Reasoning, and Inference . Cambridge Univ Pr.R Core Team (2016).

R: A Language and Environment for Statistical Computing . R Foun-dation for Statistical Computing, Vienna, Austria.Rasch, G. (1960).

Probabilistic models for some intelligence and attainment tests . DanishInstitute for Educational Research, Copenhagen, DM.Ravikumar, P., Wainwright, M. J., and Laﬀerty, J. D. (2010). High-dimensional Ising modelselection using ‘ -regularized logistic regression. The Annals of Statistics , 38(3):1287–1319.Reckase, M. D. (2009).

Multidimensional item response theory . Springer, New York, NY,USA.Reichenbach, H. (1991).

The direction of time , volume 65. University of California Press,Berkeley, CA, USA.ETWORK PSYCHOMETRICS 32Reise, S. P. and Waller, N. G. (2009). Item response theory and clinical measurement.

Annual review of clinical psychology , 5:27–48.Scheﬀer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., Held,H., van Nes, E. H., Rietkerk, M., and Sugihara, G. (2009). Early-warning signals forcritical transitions.

Nature , 461(7260):53–59.Sebastiani, G. and Sørbye, S. H. (2002). A bayesian method for multispectral image dataclassiﬁcation.

Journal of Nonparametric Statistics , 14(1-2):169–180.Spearman, C. (1904). "general intelligence," objectively determined and measured.

TheAmerican Journal of Psychology , 15(2):201–292.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) , 58:267–288.van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers,R. A., and Waldorp, L. J. (2014). A new method for constructing networks from binarydata.

Scientiﬁc reports , 4(5918):1–10.van Borkulo, C. D. and Epskamp, S. (2014).

IsingFit: Fitting Ising models using the eLassomethod . R package version 0.2.0.van de Geer, S., Bühlmann, P., and Ritov, Y. (2013). On asymptotically optimal conﬁdenceregions and tests for high-dimensional models. arXiv preprint , page arXiv:1303.0518.van de Leemput, I. A., Wichers, M., Cramer, A. O. J., Borsboom, D., Tuerlinckx, F.,Kuppens, P., van Nes, E. H., Viechtbauer, W., Giltay, E. J., Aggen, S. H., Derom, C.,Jacobs, N., Kendler, K. S., van der Maas, H. L. J., Neale, M. C., Peeters, F., Thiery, E.,Zachar, P., and Scheﬀer, M. (2014). Critical slowing down as early warning for the onsetand termination of depression. 111(1):87–92.Van Der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M.,and Raijmakers, M. E. (2006). A dynamical model of general intelligence: the positivemanifold of intelligence by mutualism.

Psychological review , 113(4):842–861.Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, andvariational inference.

Foundations and Trends R (cid:13) in Machine Learning , 1(1-2):1–305.Whittaker, J. (1990). Graphical models in applied multivariate statistics . John Wiley &Sons, Chichester, UK.Wickens, T. D. (1989).

Multiway contingency tables analysis for the social sciences .Lawrence Erlbaum Associates, Hillsdale, NJ, USA.Zhao, L. P. and Prentice, R. L. (1990). Correlated binary regression using a quadraticexponential model.

Biometrika , 77(3):642–648.Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 67(2):301–320.ETWORK PSYCHOMETRICS 33Appendix AProof of Equivalence Between the Ising Model and MIRTTo prove the equivalence between the Ising model and MIRT, we ﬁrst need to rewrite theIsing Model in matrix form: p ( XXX = xxx ) = 1 Z exp (cid:18) τττ > xxx + 12 xxx > ΩΩΩ xxx (cid:19) , (18)in which ΩΩΩ is an P × P matrix containing network parameters ω ij as its elements, whichcorresponds in graph theory to the adjacency or weights matrix. Note that, in this represen-tation, the diagonal values of ΩΩΩ are used. However, since x i can be only − x i = 1 forany combination, and the diagonal values are cancelled out in the normalizing constant Z .Thus, arbitrary values can be used in the diagonal of ΩΩΩ. Since ΩΩΩ is a real and symmetricalmatrix, we can take the usual eigenvalue decomposition:ΩΩΩ = QQQ

ΛΛΛ

QQQ > , in which ΛΛΛ is a diagonal matrix containing eigenvalues λ , λ , . . . , λ P on its diagonal, and QQQ is an orthonormal matrix containing eigenvectors qqq , . . . , qqq P as its columns. Inserting theeigenvalue decomposition into (18) gives: p ( XXX = xxx ) = 1 Z exp X i τ i x i ! Y j exp  λ j X i q ij x i !  . (19)Due to the unidentiﬁed and arbitrary diagonal of ΩΩΩ we can force ΩΩΩ to be positive semi-deﬁnite—requiring all eigenvalues to be nonnegative—by shifting the eigenvalues with someconstant c : ΩΩΩ + cIII = QQQ (ΛΛΛ + cIII ) QQQ > . Following the work of Kac (1966), we can use the following identity: e y = Z ∞−∞ e − ct − t √ π d t, with y = q λ j ( P i q ij x i ) and t = θ j to rewrite (19) as follows: p ( XXX = xxx ) = 1 Z Z ∞−∞ exp (cid:16)P j − θ j (cid:17) √ π P Y i exp x i τ i + X j − r λ j q ij θ j !! d θθθ. Reparameterizing τ i = − δ i and − q λ j q ij = α ij we obtain: p ( XXX = xxx ) = Z ∞−∞ Z exp (cid:16)P j − θ j (cid:17) √ π P Y i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17) d θθθ. (20)ETWORK PSYCHOMETRICS 34The same transformations can be used to obtain a diﬀerent expression for Z : Z = Z ∞−∞ exp (cid:16)P j − θ j (cid:17) √ π P X xxx Y i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17) d θθθ = Z ∞−∞ exp (cid:16)P j − θ j (cid:17) √ π P Y i X x i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17) d θθθ. (21)Finally, inserting (21) into (20), multiplying by Q i P xi exp ( x i ( ααα > i θθθ − δ i )) Q i P xi exp ( x i ( ααα > i θθθ − δ i )) , and rearranginggives: p ( XXX = xxx ) = Z ∞−∞ exp (cid:16)P j − θ j (cid:17) √ π P Q i P x i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17)R ∞−∞ exp (cid:16)P j − θ j (cid:17) √ π P Q i P x i exp (cid:0) x i (cid:0) ααα > i θθθ − δ i (cid:1)(cid:1) d θθθ · Y i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17)P x i exp (cid:0) x i (cid:0) ααα > i θθθ − δ i (cid:1)(cid:1) d θθθ. (22)The ﬁrst part of the integral on the right hand side of (22) corresponds to a distri-bution that sums to 1 for a P -dimensional random vector ΘΘΘ: f ( θθθ ) ∝ exp (cid:16)P j − θ j (cid:17) √ π P Y i X x i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17) , and the second part corresponds to the 2-parameter logistic MIRT probability of the re-sponse vector as in (13): P ( XXX = xxx | ΘΘΘ = θθθ ) = Y i exp (cid:16) x i (cid:16) ααα > i θθθ − δ i (cid:17)(cid:17)P x i exp (cid:0) x i (cid:0) ααα > i θθθ − δ i (cid:1)(cid:1) . We can look further at this distribution by using Bayes’ rule to examine the conditionaldistribution of θθθ given

XXX = xxx : f ( θθθ | XXX = xxx ) ∝ Pr (

XXX = xxx | ΘΘΘ = θθθ ) f ( θθθ ) ∝ exp (cid:16) xxx > AAAθθθ − θθθ > θθθ (cid:17) ∝ exp − (cid:18) θθθ − AAA > xxx (cid:19) > III (cid:18) θθθ − AAA > xxx (cid:19)! and see that the posterior distribution of ΘΘΘ is a multivariate Gaussian distribution:ΘΘΘ | XXX = xxx ∼ N P ± A > x , r III ! , (23)ETWORK PSYCHOMETRICS 35in which AAA is a matrix containing the discrimination parameters ααα i as its rows and ± indicates that columns a j could be multiplied with − q λ j , simply indicating whether the items overall are positivelyor negatively inﬂuenced by the latent trait θ . Additionally, Since the variance–covariancematrix of θ equals zero in all nondiagonal elements, θ is orthogonal. Thus, the multivariatedensity can be decomposed as the product of univariate densities:Θ j | X = x ∼ N ± X i a ij x i , r ! . Appendix BETWORK PSYCHOMETRICS 36Glossary of Notation

Symbol Dimension Description { . . . } Set of distinct values.( a, b ) Interval between a and b . P N Number of variables. N N Number of observations. X {− , } P Random vector of binary variables. x {− , } P A possible realization of X . n ( xxx ) N Number of observations withresponse pattern xxx . i , j , k and l { , , . . . , P } , j = i Subscripts of random variables. X − ( i ) {− , } P − Random vector of binary variableswithout X i . x − ( i ) {− , } P − A possible realization of X − ( i ) . X − ( i,j ) {− , } P − Random vector of binary variableswithout X i and X j . x − ( i,j ) {− , } P − A possible realization of X − ( i ) .Pr ( . . . ) → (0 ,

1) Probability function. φ i ( x i ) {− , } → R > Node potential function. φ i ( x i , x j ) {− , } → R > Pairwise potential function. τ i R Threshold parameter for node X i in the Ising model. Deﬁned as τ i = ln φ i (1). τττ R P Vector of threshold parameters,containing τ i as its i th element. ω ij R Network parameter between nodes X i and X j in the Ising model.Deﬁned as ω ij = ln φ ij (1 , R P × P andsymmetrical Matrix of network parameters,containing ω ij as its ij th element. ωωω i R P The i th row or column of ΩΩΩ.Pen ( ωωω i ) R P → R Penalization function of ωωω i . β R > Inverse temperature in the Isingmodel. H ( xxx ) {− , } P → R Hamiltonian function denoting theenergy of state xxx in the Isingmodel. ν ... ( . . . ) → R The log potential functions, usedin loglinear analysis. M N The number of latent factors. Θ R M Random vector of continuouslatent variables. θ R M Realization of ΘΘΘ. L ( τττ , ΩΩΩ; xxx ) → R Likelihood function based onPr (

XXX = xxx ). L i ( τττ , ΩΩΩ; xxx ) → R Likelihood function based onPr (cid:0) X i = x i | XXX − ( i ) = xxx − ( i ) (cid:1) . λ R > LASSO tuning parameter α (0 ,,