Pierre Latouche
University of Paris
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pierre Latouche.
The Annals of Applied Statistics | 2011
Pierre Latouche; Etienne Birmelé; Christophe Ambroise
Complex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained overlapping clusters. To tackle this issue, we present in this paper the Overlapping Stochastic Block Model. Our approach allows the vertices to belong to multiple clusters, and, to some extent, generalizes the well-known Stochastic Block Model [Nowicki and Snijders (2001)]. We show that the model is generically identifiable within classes of equivalence and we propose an approximate inference procedure, based on global and local variational techniques. Using toy data sets as well as the French Political Blogosphere network and the transcriptional network of Saccharomyces cerevisiae, we compare our work with other approaches.
Statistical Modelling | 2012
Pierre Latouche; Etienne Birmelé; Christophe Ambroise
It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to the connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work, and numerous inference strategies such as variational expectation maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model-based criterion, Integrated Complete-data Likelihood (ICL), has been derived for SBM in the literature. It relies on an asymptotic approximation of the integrated complete-data likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call Integrated Likelihood Variational Bayes (ILvb), based on a non-asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm.
Statistical Modelling | 2015
Etienne Côme; Pierre Latouche
The stochastic block model (SBM) is a mixture model for the clustering of nodes in networks. The SBM has now been employed for more than a decade to analyze very different types of networks in many scientific fields, including biology and the social sciences. Recently, an analytical expression based on the collapsing of the SBM parameters has been proposed, in combination with a sampling procedure that allows the clustering of the vertices and the estimation of the number of clusters to be performed simultaneously. Although the corresponding algorithm can technically accommodate up to 10 000 nodes and millions of edges, the Markov chain, however, tends to exhibit poor mixing properties, that is, low acceptance rates, for large networks. Therefore, the number of clusters tends to be highly overestimated, even for a very large number of samples. In this article, we rely on a similar expression, which we call the integrated complete data log likelihood, and propose a greedy inference algorithm that focuses on maximizing this exact quantity. This algorithm incurs a smaller computational cost than existing inference techniques for the SBM and can be employed to analyze large networks (several tens of thousands of nodes and millions of edges) with no convergence problems. Using toy datasets, the algorithm exhibits improvements over existing strategies, both in terms of clustering and model selection. An application to a network of blogs related to illustrations and comics is also provided.
GfKl | 2009
Pierre Latouche; Etienne Birmelé; Christophe Ambroise
Networks are used in many scientific fields such as biology, social science, and information technology. They aim at modelling, with edges, the way objects of interest, represented by vertices, are related to each other. Looking for clusters of vertices, also called communities or modules, has appeared to be a powerful approach for capturing the underlying structure of a network. In this context, the Block-Clustering model has been applied on random graphs. The principle of this method is to assume that given the latent structure of a graph, the edges are independent and generated from a parametric distribution. Many EM-like strategies have been proposed, in a frequentist setting, to optimize the parameters of the model. Moreover, a criterion, based on an asymptotic approximation of the Integrated Classification Likelihood (ICL), has recently been derived to estimate the number of classes in the latent structure. In this paper, we show how the Block-Clustering model can be described in a full Bayesian framework and how the posterior distribution, of all the parameters and latent variables, can be approximated efficiently applying Variational Bayes (VB). We also propose a new non-asymptotic Bayesian model selection criterion. Using simulated data sets, we compare our approach to other strategies. We show that our criterion can outperform ICL.
The Annals of Applied Statistics | 2014
Yacine Jernite; Pierre Latouche; Charles Bouveyron; Patrick Rivera; Laurent Jegou; Stéphane Lamassé
In the last two decades, many random graph models have been proposed to extract knowledge from networks. Most of them look for communities or more generally clusters of vertices with homogeneous connection profiles. While the first models focused on networks with binary edges only, extensions now allow to deal with valued networks. Recently, new models were also introduced in order to characterize connection patterns in networks through mixed memberships. This work was motivated by the need of analyzing a historical network where a partition of the vertices is given and where edges are typed. A known partition is seen as a decomposition of a network into subgraphs that we propose to model using a stochastic model with unknown latent clusters. Each subgraph has its own mixing vector and sees its vertices associated to the clusters. The vertices then connect with a probability depending on the subgraphs only, while the types of the edges are assumed to be sampled from the latent clusters. A variational Bayes expectation-maximization algorithm is proposed for inference as well as a model selection criterion for the estimation of the cluster number. Experiments are carried out on simulated data to assess the approach. The proposed methodology is then applied to an ecclesiastical network in merovingian Gaul. An R package, called Rambo, implementing the inference algorithm is available on the CRAN.
Network Science | 2017
Jason Wyse; Nial Friel; Pierre Latouche
We consider the task of simultaneous clustering of the two node sets involved in a bipartite network. The approach we adopt is based on use of the exact integrated complete likelihood for the latent block model. Using this allows one to infer the number of clusters as well as cluster memberships using a greedy search. This gives a model-based clustering of the node sets. Experiments on simulated bipartite network data show that the greedy search approach is vastly more scalable than competing Markov chain Monte Carlo based methods. Application to a number of real observed bipartite networks demonstrate the algorithms discussed.
Statistics and Computing | 2016
Pierre Latouche; Stéphane Robin
W-graph refers to a general class of random graph models that can be seen as a random graph limit. It is characterized by both its graphon function and its motif frequencies. In this paper, relying on an existing variational Bayes algorithm for the stochastic block models (SBMs) along with the corresponding weights for model averaging, we derive an estimate of the graphon function as an average of SBMs with increasing number of blocks. In the same framework, we derive the variational posterior frequency of any motif. A simulation study and an illustration on a social network complete our work.
Statistics and Computing | 2018
Charles Bouveyron; Pierre Latouche; Rawya Zreik
Due to the significant increase of communications between individuals via social media (Facebook, Twitter, Linkedin) or electronic formats (email, web, e-publication) in the past two decades, network analysis has become an unavoidable discipline. Many random graph models have been proposed to extract information from networks based on person-to-person links only, without taking into account information on the contents. This paper introduces the stochastic topic block model, a probabilistic model for networks with textual edges. We address here the problem of discovering meaningful clusters of vertices that are coherent from both the network interactions and the text contents. A classification variational expectation-maximization algorithm is proposed to perform inference. Simulated datasets are considered in order to assess the proposed approach and to highlight its main features. Finally, we demonstrate the effectiveness of our methodology on two real-word datasets: a directed communication network and an undirected co-authorship network.
Social Network Analysis and Mining | 2016
Marco Corneli; Pierre Latouche; Fabrice Rossi
We develop a model in which interactions between nodes of a dynamic network are counted by non-homogeneous Poisson processes. In a block modelling perspective, nodes belong to hidden clusters (whose number is unknown) and the intensity functions of the counting processes only depend on the clusters of nodes. In order to make inference tractable, we move to discrete time by partitioning the entire time horizon in which interactions are observed in fixed-length time sub-intervals. First, we derive an exact integrated classification likelihood criterion and maximize it relying on a greedy search approach. This allows to estimate the memberships to clusters and the number of clusters simultaneously. Then, a maximum likelihood estimator is developed to estimate nonparametrically the integrated intensities. We discuss the over-fitting problems of the model and propose a regularized version solving these issues. Experiments on real and simulated data are carried out in order to assess the proposed methodology.
advances in social networks analysis and mining | 2015
Marco Corneli; Pierre Latouche; Fabrice Rossi
The stochastic block model (SBM) [1] describes interactions between nodes of a network following a probabilistic approach. Nodes belong to hidden clusters and the probabilities of interactions only depend on these clusters. Interactions of time varying intensity are not taken into account. By partitioning the whole time horizon, in which interactions are observed, we develop a non stationary extension of the SBM, allowing us to simultaneously cluster the nodes of a network and the fixed time intervals in which interactions take place. The number of clusters as well as memberships to clusters are finally obtained through the maximization of the complete-data integrated likelihood relying on a greedy search approach. Experiments are carried out in order to assess the proposed methodology.