Yanghua Xiao
Fudan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yanghua Xiao.
extending database technology | 2010
Wentao Wu; Yanghua Xiao; Wei Wang; Zhenying He; Zhihui Wang
With more and more social network data being released, protecting the sensitive information within social networks from leakage has become an important concern of publishers. Adversaries with some background structural knowledge about a target individual can easily re-identify him from the network, even if the identifiers have been replaced by randomized integers(i.e., the network is naively-anonymized). Since there exists numerous topological information that can be used to attack a victims privacy, to resist such structural re-identification becomes a great challenge. Previous works only investigated a minority of such structural attacks, without considering protecting against re-identification under any potential structural knowledge about a target. To achieve this objective, in this paper we propose k-symmetry model, which modifies a naively-anonymized network so that for any vertex in the network, there exist at least k -- 1 structurally equivalent counterparts. We also propose sampling methods to extract approximate versions of the original network from the anonymized network so that statistical properties of the original network could be evaluated. Extensive experiments show that we can successfully recover a variety of such properties of the original network through aggregations on quite a small number of sample graphs.
international conference on management of data | 2014
Wanyun Cui; Yanghua Xiao; Haixun Wang; Wei Wang
Community search is important in social network analysis. For a given vertex in a graph, the goal is to find the best community the vertex belongs to. Intuitively, the best community for a given vertex should be in the vicinity of the vertex. However, existing solutions use \emph{global search} to find the best community. These algorithms, although straight-forward, are very costly, as all vertices in the graph may need to be visited. In this paper, we propose a \emph{local search} strategy, which searches in the neighborhood of a vertex to find the best community for the vertex. We show that, because the minimum degree measure used to evaluate the goodness of a community is not \emph{monotonic}, designing efficient local search solutions is a very challenging task. We present theories and algorithms of local search to address this challenge. The efficiency of our local search strategy is verified by extensive experiments on both synthetic networks and a variety of real networks with millions of nodes.
international conference on management of data | 2013
Wanyun Cui; Yanghua Xiao; Haixun Wang; Yiqi Lu; Wei Wang
A great deal of research has been conducted on modeling and discovering communities in complex networks. In most real life networks, an object often participates in multiple overlapping communities. In view of this, recent research has focused on mining overlapping communities in complex networks. The algorithms essentially materialize a snapshot of the overlapping communities in the network. This approach has three drawbacks, however. First, the mining algorithm uses the same global criterion to decide whether a subgraph qualifies as a community. In other words, the criterion is fixed and predetermined. But in reality, communities for different vertices may have very different characteristics. Second, it is costly, time consuming, and often unnecessary to find communities for an entire network. Third, the approach does not support dynamically evolving networks. In this paper, we focus on online search of overlapping communities, that is, given a query vertex, we find meaningful overlapping communities the vertex belongs to in an online manner. In doing so, each search can use community criterion tailored for the vertex in the search. To support this approach, we introduce a novel model for overlapping communities, and we provide theoretical guidelines for tuning the model. We present several algorithms for online overlapping community search and we conduct comprehensive experiments to demonstrate the effectiveness of the model and the algorithms. We also suggest many potential applications of our model and algorithms.
extending database technology | 2009
Yanghua Xiao; Wentao Wu; Jian Pei; Wei Wang; Zhenying He
Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries on-the-fly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest paths in a graph of N vertices takes O(N2) space. In this paper, we tackle the problem of indexing shortest paths and online answering shortest path queries. As many large real graphs are shown richly symmetric, the central idea of our approach is to use graph symmetry to reduce the index size while retaining the correctness and the efficiency of shortest path query answering. Technically, we develop a framework to index a large graph at the orbit level instead of the vertex level so that the number of breadth-first search trees materialized is reduced from O(N) to O(|Δ|), where |Δ| ≤ N is the number of orbits in the graph. We explore orbit adjacency and local symmetry to obtain compact breadth-first-search trees (compact BFS-trees). An extensive empirical study using both synthetic data and real data shows that compact BFS-trees can be built efficiently and the space cost can be reduced substantially. Moreover, online shortest path query answering can be achieved using compact BFS-trees.
international conference on data engineering | 2014
Lu Wang; Yanghua Xiao; Bin Shao; Haixun Wang
Billion-node graphs pose significant challenges at all levels from storage infrastructures to programming models. It is critical to develop a general purpose platform for graph processing. A distributed memory system is considered a feasible platform supporting online query processing as well as offline graph analytics. In this paper, we study the problem of partitioning a billion-node graph on such a platform, an important consideration because it has direct impact on load balancing and communication overhead. It is challenging not just because the graph is large, but because we can no longer assume that the data can be organized in arbitrary ways to maximize the performance of the partitioning algorithm. Instead, the algorithm must adopt the same data and programming model adopted by the system and other applications. In this paper, we propose a multi-level label propagation (MLP) method for graph partitioning. Experimental results show that our solution can partition billion-node graphs within several hours on a distributed memory system consisting of merely several machines, and the quality of the partitions produced by our approach is comparable to state-of-the-art approaches applied on toy-size graphs.
Physical Review E | 2008
Yanghua Xiao; Momiao Xiong; Wei Wang; Hui Wang
Many real networks have been found to have a rich degree of symmetry, which is a universal structural property of complex networks, yet has been rarely studied so far. One of the fascinating problems related to symmetry is exploration of the origin of symmetry in real networks. For this purpose, we summarized the statistics of local symmetric motifs that contribute to local symmetry of networks. Analysis of these statistics shows that the symmetry of complex networks is a consequence of similar linkage pattern, which means that vertices with similar degrees tend to share common neighbors. An improved version of the Barabaśi-Albert model integrating similar linkage pattern successfully reproduces the symmetry of real networks, indicating that similar linkage pattern is the underlying ingredient that is responsible for the emergence of symmetry in complex networks.
Physica A-statistical Mechanics and Its Applications | 2008
Yanghua Xiao; Wentao Wu; Hui Wang; Momiao Xiong; Wei Wang
Precisely quantifying the heterogeneity or disorder of network systems is important and desired in studies of behaviors and functions of network systems. Although various degree-based entropies have been available to measure the heterogeneity of real networks, heterogeneity implicated in the structures of networks can not be precisely quantified yet. Hence, we propose a new structure entropy based on automorphism partition. Analysis of extreme cases shows that entropy based on automorphism partition can quantify the structural heterogeneity of networks more precisely than degree-based entropies. We also summarized symmetry and heterogeneity statistics of many real networks, finding that real networks are more heterogeneous in the view of automorphism partition than what have been depicted under the measurement of degree-based entropies; and that structural heterogeneity is strongly negatively correlated to symmetry of real networks.
very large data bases | 2013
Zichao Qi; Yanghua Xiao; Bin Shao; Haixun Wang
The emergence of real life graphs with billions of nodes poses significant challenges for managing and querying these graphs. One of the fundamental queries submitted to graphs is the shortest distance query. Online BFS (breadth-first search) and offline pre-computing pairwise shortest distances are prohibitive in time or space complexity for billion-node graphs. In this paper, we study the feasibility of building distance oracles for billion-node graphs. A distance oracle provides approximate answers to shortest distance queries by using a pre-computed data structure for the graph. Sketch-based distance oracles are good candidates because they assign each vertex a sketch of bounded size, which means they have linear space complexity. However, state-of-the-art sketch-based distance oracles lack efficiency or accuracy when dealing with big graphs. In this paper, we address the scalability and accuracy issues by focusing on optimizing the three key factors that affect the performance of distance oracles: landmark selection, distributed BFS, and answer generation. We conduct extensive experiments on both real networks and synthetic networks to show that we can build distance oracles of affordable cost and efficiently answer shortest distance queries even for billion-node graphs.
Pattern Recognition | 2008
Yanghua Xiao; Hua Dong; Wentao Wu; Momiao Xiong; Wei Wang; Baile Shi
In recent years, evaluating graph distance has become more and more important in a variety of real applications and many graph distance measures have been proposed. Among all of those measures, structure-based graph distance measures have become the research focus due to their independence of the definition of cost functions. However, existing structure-based graph distance measures have low degree of precision because only node and edge information of graphs are employed in these measures. To improve the precision of graph distance measures, we define substructure abundance vector (SAV) to capture more substructure information of a graph. Furthermore, based on SAV, we propose unified graph distance measures which are generalization of the existing structure-based graph distance measures. In general, the unified graph distance measures can evaluate graph distance in much finer grain. We also show that unified graph distance measures based on occurrence mapping and some of their variants are metrics. Finally, we apply the unified graph distance metric and its variants to the population evolution analysis and construct distance graphs of marker networks in three populations, which reflect the single nucleotide polymorphism (SNP) linkage disequilibrium (LD) differences among these populations.
conference on information and knowledge management | 2011
Kun Xu; Lei Zou; Jeffrey Xu Yu; Lei Chen; Yanghua Xiao; Dongyan Zhao
In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries, specifically,given a label set S and two vertices u1 and u2 in a large directed graph G, we verify whether there exists a path from u1 to u2 under label constraint S. Like traditional reachability queries, LCR queries are very useful, such as pathway finding in biological networks, inferring over RDF (resource description f ramework) graphs, relationship finding in social networks. However, LCR queries are much more complicated than their traditional counterpart.Several techniques are proposed in this paper to minimize the search space in computing path-label transitive closure. Furthermore, we demonstrate the superiority of our method by extensive experiments.