Peter D. Hoff
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter D. Hoff.
Journal of the American Statistical Association | 2002
Peter D. Hoff; Adrian E. Raftery; Mark S. Handcock
Network models are widely used to represent relational information among interacting units. In studies of social networks, recent emphasis has been placed on random graph models where the nodes usually represent individual social actors and the edges represent the presence of a specified relation between actors. We develop a class of models where the probability of a relation between actors depends on the positions of individuals in an unobserved “social space.” We make inference for the social space within maximum likelihood and Bayesian frameworks, and propose Markov chain Monte Carlo procedures for making inference on latent positions and the effects of observed covariates. We present analyses of three standard datasets from the social networks literature, and compare the method to an alternative stochastic blockmodeling approach. In addition to improving on model fit for these datasets, our method provides a visual and interpretable model-based spatial representation of social relationships and improves on existing methods by allowing the statistical uncertainty in the social space to be quantified and graphically represented.
Journal of the American Statistical Association | 2005
Peter D. Hoff
This article discusses the use of a symmetric multiplicative interaction effect to capture certain types of third-order dependence patterns often present in social networks and other dyadic datasets. Such an effect, along with standard linear fixed and random effects, is incorporated into a generalized linear model, and a Markov chain Monte Carlo algorithm is provided for Bayesian estimation and inference. In an example analysis of international relations data, accounting for such patterns improves model fit and predictive performance.
The Annals of Applied Statistics | 2007
Peter D. Hoff
Quantitative studies in many fields involve the analysis of multivariate data of diverse types, including measurements that we may consider binary, ordinal and continuous. One approach to the analysis of such mixed data is to use a copula model, in which the associations among the variables are parameterized separately from their univariate marginal distributions. The purpose of this article is to provide a method of semiparametric inference for copula models via the construction of what we call a marginal set likelihood function for the association parameters. The proposed method of inference can be viewed as a generalization of marginal likelihood estimation, in which inference for a parameter of interest is based on a summary statistic whose sampling distribution is not a function of any nuisance parameters. In the context of copula estimation, the marginal set likelihood is a function of the association parameters only and its applicability does not depend on any assumptions about the marginal distributions of the data, thus making it appropriate for the analysis of mixed continuous and discrete data with arbitrary margins. Estimation and inference for parameters of the Gaussian copula are available via a straightforward Markov chain Monte Carlo algorithm based on Gibbs sampling.
Social Networks | 2009
Pavel N. Krivitsky; Mark S. Handcock; Adrian E. Raftery; Peter D. Hoff
Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actor degrees. We propose a latent cluster random effects model to represent all of these features, and we describe a Bayesian estimation method for it. The model is applicable to both binary and non-binary network data. We illustrate the model using two real datasets. We also apply it to two simulated network datasets with the same, highly skewed, degree distribution, but very different network behavior: one unstructured and the other with transitivity and clustering. Models based on degree distributions, such as scale-free, preferential attachment and power-law models, cannot distinguish between these very different situations, but our model does.
Computational and Mathematical Organization Theory | 2009
Peter D. Hoff
We discuss a statistical model of social network data derived from matrix representations and symmetry considerations. The model can include known predictor information in the form of a regression term, and can represent additional structure via sender-specific and receiver-specific latent factors. This approach allows for the graphical description of a social network via the latent factors of the nodes, and provides a framework for the prediction of missing links in network data.
Journal of Computational and Graphical Statistics | 2009
Peter D. Hoff
Orthonormal matrices play an important role in reduced-rank matrix approximations and the analysis of matrix-valued data. A matrix Bingham–von Mises–Fisher distribution is a probability distribution on the set of orthonormal matrices that includes linear and quadratic terms in the log-density, and arises as a posterior distribution in latent factor models for multivariate and relational data. This article describes rejection and Gibbs sampling algorithms for sampling from this family of distributions, and illustrates their use in the analysis of a protein–protein interaction network. Supplemental materials, including code and data to generate all of the numerical results in this article, are available online.
Bayesian Analysis | 2011
Peter D. Hoff
Modern datasets are often in the form of matrices or arrays, potentially having correlations along each set of data indices. For example, data involving repeated measurements of several variables over time may exhibit temporal correlation as well as correlation among the variables. A possible model for matrix-valued data is the class of matrix normal distributions, which is parametrized by two covariance matrices, one for each index set of the data. In this article we describe an extension of the matrix normal model to accommodate multidimensional data arrays, or tensors. We generate a class of array normal distributions by applying a group of multilinear transformations to an array of independent standard normal random variables. The covariance structures of the resulting class take the form of outer products of dimension-specific covariance matrices. We derive some properties of these covariance structures and the corresponding array normal distributions, discuss maximum likelihood and Bayesian estimation of covariance parameters and illustrate the model in an analysis of multivariate longitudinal network data. Some key words: Gaussian, matrix normal, multiway data, network, tensor, Tucker decomposition.
Journal of Peace Research | 2007
Michael D. Ward; Peter D. Hoff
The authors examine a standard gravity model of international commerce augmented to include political as well as institutional influences on bilateral trade. Using annual data from 1980-2001, they estimate regression coefficients and residual dependencies using a hierarchy of models in each year. Rather than gauge the generalizability of these patterns via traditional measures of statistical significance such as p-values, this article develops and employs a strategy to evaluate the out-of-sample predictive strength of various models. The analysis of recent international commerce shows that in addition to a typical gravity-model specification, political and institutional variables are important. The article also demonstrates that the often-reported link between international conflict and bilateral trade is elusive, and that inclusion of conflict in a trade model can sometimes lead to reduced out-of-sample predictive performance. Further, this article illustrates that there are substantial, persistent residual exporter- and importer-specific effects, and that ignoring such patterns in relational trade data results in an incomplete picture of international commerce, even in the context of a well-established framework such as the gravity model.
Journal of the American Statistical Association | 2007
Peter D. Hoff
Many multivariate data-analysis techniques for an m × n matrix Y are related to the model Y = M + E, where Y is an m × n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the least-squares estimate of M is obtained via the singular value decomposition of Y, yielding an estimate that can have a very high variance. In this article we suggest a model-based alternative to the preceding approach by providing prior distributions and posterior estimation for the rank of M and the components of its singular value decomposition. In addition to providing more accurate inference, such an approach has the advantage of being extendable to more general data-analysis situations, such as inference in the presence of missing data and estimation in a generalized linear modeling framework.
Journal of Computational and Graphical Statistics | 2012
Adrian E. Raftery; Xiaoyue Niu; Peter D. Hoff; Ka Yee Yeung
Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N 2), where N is the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control log-likelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N 2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein–protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.