Is this you? Create Your Porfile

Yuchen Zhao

University of Illinois at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuchen Zhao is active.

Explore More

Publication

Featured researches published by Yuchen Zhao.

international conference on data engineering | 2011

Outlier detection in graph streams

Charu C. Aggarwal; Yuchen Zhao; Philip S. Yu

A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural outlier detection in massive network streams. Such problems are inherently challenging, because the problem of outlier detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for outlier detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach.

knowledge discovery and data mining | 2013

Inferring social roles and statuses in social networks

Yuchen Zhao; Guan Wang; Philip S. Yu; Shaobo Liu; Simon Zhang

Users in online social networks play a variety of social roles and statuses. For example, users in Twitter can be represented as advertiser, content contributor, information receiver, etc; users in Linkedin can be in different professional roles, such as engineer, salesperson and recruiter. Previous research work mainly focuses on using categorical and textual information to predict the attributes of users. However, it cannot be applied to a large number of users in real social networks, since much of such information is missing, outdated and non-standard. In this paper, we investigate the social roles and statuses that people act in online social networks in the perspective of network structures, since the uniqueness of social networks is connecting people. We quantitatively analyze a number of key social principles and theories that correlate with social roles and statuses. We systematically study how the network characteristics reflect the social situations of users in an online society. We discover patterns of homophily, the tendency of users to connect with users with similar social roles and statuses. In addition, we observe that different factors in social theories influence the social role/status of an individual user to various extent, since these social principles represent different aspects of the network. We then introduce an optimization framework based on Factor Conditioning Symmetry, and we propose a probabilistic model to integrate the optimization framework on local structural information as well as network influence to infer the unknown social roles and statuses of online users. We will present experiment results to show the effectiveness of the inference.

international conference on data engineering | 2012

On Text Clustering with Side Information

Charu C. Aggarwal; Yuchen Zhao; Philip S. Yu

Text clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. In most cases, the data is not purely available in text form. A lot of side-information is available along with the text documents. Such side-information may be of different kinds, such as the links in the document, user-access behavior from web logs, or other non-textual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to incorporate side-information into the clustering process, because it can either improve the quality of the representation for clustering, or can add noise to the process. Therefore, we need a principled way to perform the clustering process, so as to maximize the advantages from using this side information. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We present experimental results on a number of real data sets in order to illustrate the advantages of using such an approach.

international conference on data mining | 2011

Positive and Unlabeled Learning for Graph Classification

Yuchen Zhao; Xiangnan Kong; Philip S. Yu

The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative sub graph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and negative graphs exist. However, in many real-world applications, the negative graph examples are not available. In this paper we study the problem of how to select useful sub graph features and perform graph classification based upon only positive and unlabeled graphs. This problem is challenging and different from previous works on PU learning, because there are no predefined features in graph data. Moreover, the sub graph enumeration problem is NP-hard. We need to identify a subset of unlabeled graphs that are most likely to be negative graphs. However, the negative graph selection problem and the sub graph feature selection problem are correlated. Before the reliable negative graphs can be resolved, we need to have a set of useful sub graph features. In order to address this problem, we first derive an evaluation criterion to estimate the dependency between sub graph features and class labels based on a set of estimated negative graphs. In order to build accurate models for the PU learning problem on graph data, we propose an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner. Experimental results illustrate the effectiveness and efficiency of the proposed method.

IEEE Transactions on Knowledge and Data Engineering | 2014

On the Use of Side Information for Mining Text Data

Charu C. Aggarwal; Yuchen Zhao; Philip S. Yu

In many text mining applications, side-information is available along with the text documents. Such side-information may be of different kinds, such as document provenance information, the links in the document, user-access behavior from web logs, or other non-textual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to incorporate side-information into the mining process, because it can either improve the quality of the representation for the mining process, or can add noise to the process. Therefore, we need a principled way to perform the mining process, so as to maximize the advantages from using this side information. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We then show how to extend the approach to the classification problem. We present experimental results on a number of real data sets in order to illustrate the advantages of using such an approach.

conference on information and knowledge management | 2010

On wavelet decomposition of uncertain time series data sets

Yuchen Zhao; Charu C. Aggarwal; Philip S. Yu

In this paper, we will explore the construction of wavelet decompositions of uncertain data. Uncertain representations of data sets require significantly more space, and it is therefore even more important to construct compressed representations for such cases. We will use a hierarchical optimization technique in order to construct the most effective partitioning for our wavelet representation. We explore two different schemes which optimize the uncertainty in the resulting representation. We will show that the incorporation of uncertainty into the design of the wavelet representations significantly improves the compression rate of the representation. We present experimental results illustrating the effectiveness of our approach.

Statistical Analysis and Data Mining | 2010

A framework for clustering massive graph streams: Submission to Best of SDM 2010 Issue

Charu C. Aggarwal; Yuchen Zhao; Philip S. Yu

In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This results in storage and computational challenges during the clustering process. In order to deal with the natural problems arising from the use of massive disk-resident graphs, we propose a technique for creating hash-compressed microclusters from graph streams. The compressed microclusters are designed by using a hash-based compression of the edges onto a smaller domain space. We provide theoretical results which show that the hash-based compression continues to maintain bounded accuracy in terms of distance computations. Since clustering is a data summarization technique, it can also be naturally extended to the problem of evolution analysis. We provide experimental results which illustrate the accuracy and efficiency of the underlying method.  2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 399-416, 2010

asia pacific web conference | 2014

An effective approach on overlapping structures discovery for co-clustering

Wangqun Lin; Yuchen Zhao; Philip S. Yu; Bo Deng

Co-clustering, which explores the inter-connected structures between objects and features simultaneously, has drawn much attention in the past decade. Most existing methods for co-clustering focus on partition-based approaches, which assume that each entry of the data matrix can only be assigned to one cluster. However, in the real world applications, the cluster structures can potential be overlapping. In this paper, we propose a novel overlapping co-clustering method by introducing the density guided principle for discriminative features (objects) identification. This is done by simultaneously finding the non-overlapping blocks. Based on the discovered blocks, an effective strategy is utilized to select the features (objects), which can discriminate the specified object (feature) cluster from other object (feature) clusters. Finally, according to the discriminative features (objects), a novel overlapping method, OPS, is proposed. Experimental studies on both synthetic and real-world data sets demonstrate the effectiveness and efficiency of the proposed OPS method.

international conference on data engineering | 2017

On Edge Classification in Networks with Structure and Content

Charu C. Aggarwal; Yao Li; Philip S. Yu; Yuchen Zhao

The problem of node classification has been widely studied in a variety of network-based scenarios. In this paper, we will study the more challenging scenario in which some of the edges in a content-based network are labeled, and it is desirable to use this information in order to determine the labels of other arbitrary edges. Furthermore, each edge is associated with text content, which may correspond to either communication or relationship information between the different nodes. Such a problem often arises in the context of many social or communication networks in which edges are associated with communication between different nodes, and the text is associated with the content of the communication. This situation can also arise in many online social networks such as chat messengers or email networks, where the edges in the network may also correspond to the actual content of the chats or emails. The problem of edge classification is much more challenging from a scalability point of view, because the number of edges is typically significantly larger than the number of nodes in the network. In this paper, we will design a holistic classification approach which can combine content and structure for effective edge classification.

Knowledge and Information Systems | 2016

Multi-type clustering in heterogeneous information networks

Wangqun Lin; Philip S. Yu; Yuchen Zhao; Bo Deng

Heterogeneous information networks have drawn much attention in recent years due to their significant applications, such as text mining, e-commerce, social networks, and bioinformatics. Clustering different types of objects simultaneously based upon not only their relations of the same type, but also the relations between different types of objects can improve the clustering quality mutually. In this paper, we propose a general model, in which both the homogeneous and heterogeneous relations are considered simultaneously, to describe the structure of the heterogeneous information networks and devise a novel parametric free multi-type overlapped clustering approach. In this model, different types of relations between different types of objects are represented by a group of matrices. In this way, we transfer the multi-type clustering problem into the information compression problem. Subsequently, greedy search approaches, which aim at describing the group of relational matrices with least bits, are proposed. Moreover, by discovering the discriminative clusters among different types of objects, we devise effective parameter-free strategies to discover either overlapping or non-overlapping structure among different types of clusters. Extensive experiments on real-world and synthetic data sets demonstrate our methods are effective and efficient.

Explore More