Xiaoran Yan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoran Yan is active.

Explore More

Publication

Featured researches published by Xiaoran Yan.

PLOS ONE | 2016

The "Majority Illusion" in Social Networks

Kristina Lerman; Xiaoran Yan; Xin-Zeng Wu

Individual’s decisions, from what product to buy to whether to engage in risky behavior, often depend on the choices, behaviors, or states of other people. People, however, rarely have global knowledge of the states of others, but must estimate them from the local observations of their social contacts. Network structure can significantly distort individual’s local observations. Under some conditions, a state that is globally rare in a network may be dramatically over-represented in the local neighborhoods of many individuals. This effect, which we call the “majority illusion,” leads individuals to systematically overestimate the prevalence of that state, which may accelerate the spread of social contagions. We develop a statistical model that quantifies this effect and validate it with measurements in synthetic and real-world networks. We show that the illusion is exacerbated in networks with a heterogeneous degree distribution and disassortative structure.

Journal of Statistical Mechanics: Theory and Experiment | 2014

Model selection for degree-corrected block models

Xiaoran Yan; Cosma Rohilla Shalizi; Jacob E. Jensen; Florent Krzakala; Cristopher Moore; Lenka Zdeborová; Pan Zhang; Yaojia Zhu

The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically high-dimensional and have large numbers of latent variables. Together, these issues mean that the usual model-selection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key network-analysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for undertaking this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degree-corrected block models add a parameter for each node, modulating its overall degree. The choice between ordinary and degree-corrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop linear-time approximations for log-likelihoods under both the stochastic block model and the degree-corrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction and point to a general approach to model selection in network analysis.

knowledge discovery and data mining | 2013

Scalable text and link analysis with mixed-topic link models

Yaojia Zhu; Xiaoran Yan; Lise Getoor; Cristopher Moore

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.

knowledge discovery and data mining | 2011

Active learning for node classification in assortative and disassortative networks

Cristopher Moore; Xiaoran Yan; Yaojia Zhu; Jean-Baptiste Rouquier; Terran Lane

Active learning for networked data that focuses on predicting the labels of other nodes accurately by knowing the labels of a small subset of nodes is attracting more and more researchers because it is very useful especially in cases, where labeled data are expensive to obtain. However, most existing research either only apply to networks with assortative community structure or focus on node attribute data with links or are designed for working in single mode that will work at a higher learning and query cost than batch active learning in general. In view of this, in this paper, we propose a batch mode active learning method which uses information-theoretic techniques and random walk to select which nodes to label. The proposed method requires only network topology as its input, does not need to know the number of blocks in advance, and makes no initial assumptions about how the blocks connect. We test our method on two different types of networks: assortative structure and diassortative structure, and then compare our method with a single mode active learning method that is similar to our method except for working in single mode and several simple batch mode active learning methods using information-theoretic techniques and simple heuristics, such as employing degree or betweenness centrality. The experimental results show that the proposed method in this paper significantly outperforms them.

knowledge discovery and data mining | 2014

The interplay between dynamics and networks: centrality, communities, and cheeger inequality

Rumi Ghosh; Shang-Hua Teng; Kristina Lerman; Xiaoran Yan

We study the interplay between a dynamic process and the structure of the network on which it is defined. Specifically, we examine the impact of this interaction on the quality-measure of network clusters and node centrality. This enables us to effectively identify network communities and important nodes participating in the dynamics. As the first step towards this objective, we introduce an umbrella framework for defining and characterizing an ensemble of dynamic processes on a network. This framework generalizes the traditional Laplacian framework to continuous-time biased random walks and also allows us to model some epidemic processes over a network. For each dynamic process in our framework, we can define a function that measures the quality of every subset of nodes as a potential cluster (or community) with respect to this process on a given network. This subset-quality function generalizes the traditional conductance measure for graph partitioning. We partially justify our choice of the quality function by showing that the classic Cheegers inequality, which relates the conductance of the best cluster in a network with a spectral quantity of its Laplacian matrix, can be extended from the Laplacian-conductance setting to this more general setting.

Journal of Complex Networks | 2014

Oriented and degree-generated block models: generating and inferring communities with inhomogeneous degree distributions

Yaojia Zhu; Xiaoran Yan; Cristopher Moore

The stochastic block model is a powerful tool for inferring community structure from network topology. However, it predicts a Poisson degree distribution within each community, while most real-world networks have a heavy-tailed degree distribution. The degree-corrected block model can accommodate arbitrary degree distributions within communities. But since it takes the vertex degrees as parameters rather than generating them, it cannot use them to help it classify the vertices, and its natural generalization to directed graphs cannot even use the orientations of the edges. In this paper, we present variants of the block model with the best of both worlds: they can use vertex degrees and edge orientations in the classification process, while tolerating heavy-tailed degree distributions within communities. We show that for some networks, including synthetic networks and networks of word adjacencies in English text, these new block models achieve a higher accuracy than either standard or degree-corrected block models.

international conference on social computing | 2015

Structural Properties of Ego Networks

Sidharth Gupta; Xiaoran Yan; Kristina Lerman

The structure of real-world social networks in large part determines the evolution of social phenomena, including opinion formation, diffusion of information and influence, and the spread of disease. Globally, network structure is characterized by features such as degree distribution, degree assortativity, and clustering coefficient. However, information about global structure is usually not available to each vertex. Instead, each vertexs knowledge is generally limited to the locally observable portion of the network consisting of the subgraph over its immediate neighbors. Such subgraphs, known as ego networks, have properties that can differ substantially from those of the global network. In this paper, we study the structural properties of ego networks and show how they relate to the global properties of networks from which they are derived. Through empirical comparisons and mathematical derivations, we show that structural features, similar to static attributes, suffer from paradoxes. We quantify the differences between global information about network structure and local estimates. This knowledge allows us to better identify and correct the biases arising from incomplete local information.

advances in social networks analysis and mining | 2016

Bayesian model selection of stochastic block models

Xiaoran Yan

A central problem in analyzing networks is partitioning them into modules or communities. One of the best tools for this is the stochastic block model, which clusters vertices into blocks with statistically homogeneous pattern of links. Despite its flexibility and popularity, there has been a lack of principled statistical model selection criteria for the stochastic block model. Here we propose a Bayesian framework for choosing the number of blocks as well as comparing it to the more elaborate degree-corrected block models, ultimately leading to a universal model selection framework capable of comparing multiple modeling combinations. We will also investigate its theoretic connection to the minimum description length principle.

International MICCAI Workshop on Medical Computer Vision | 2015

Information-Theoretic Clustering of Neuroimaging Metrics Related to Cognitive Decline in the Elderly

Madelaine Daianu; Greg Ver Steeg; Adam Mezher; Neda Jahanshad; Talia M. Nir; Xiaoran Yan; Gautam Prasad; Kristina Lerman; Aram Galstyan; Paul M. Thompson

As Alzheimer’s disease progresses, there are changes in metrics of brain atrophy and network breakdown derived from anatomical or diffusion MRI. Neuroimaging biomarkers of cognitive decline are crucial to identify, but few studies have investigated how sets of biomarkers cluster in terms of the information they provide. Here, we evaluated more than 700 frequently studied diffusion and anatomical measures in 247 elderly participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We used a novel unsupervised machine learning technique - CorEx - to identify groups of measures with high multivariate mutual information; we computed latent factors to explain correlations among them. We visualized groups of measures discovered by CorEx in a hierarchical structure and determined how well they predict cognitive decline. Clusters of variables significantly predicted cognitive decline, including measures of cortical gray matter, and correlated measures of brain networks derived from graph theory and spectral graph theory.

Inverse Problems | 2017

Modified Cheeger and ratio cut methods using the Ginzburg-Landau functional for classification of high-dimensional data

Ekaterina Merkurjev; Andrea L. Bertozzi; Xiaoran Yan; Kristina Lerman

Abstract : Recent advances in clustering have included continuous relaxations of the Cheeger cut problem and those which address its linear approximation using the graph Laplacian. In this paper, we show how to use the graph Laplacian to solve the fully nonlinear Cheeger cut problem, as well as the ratio cut optimization task. Both problems are connected to total variation minimization, and the related Ginzburg-Landau functional is used in the derivation of the methods. The graph framework discussed in this paper is undirected. The resulting algorithms are efficient ways to cluster thedata into two classes, and they can be easily extended to case of multiple classes, or used on a multiclass data set via recursive bipartitioning. In addition to showing results on benchmark data sets, we also show an application of the algorithm to hyperspectral video data.

Explore More