Cunchao Tu
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cunchao Tu.
meeting of the association for computational linguistics | 2017
Cunchao Tu; Han Liu; Zhiyuan Liu; Maosong Sun
Network embedding (NE) is playing a critical role in network analysis, due to its ability to represent vertices with efficient low-dimensional embedding vectors. However, existing NE models aim to learn a fixed context-free embedding for each vertex and neglect the diverse roles when interacting with other vertices. In this paper, we assume that one vertex usually shows different aspects when interacting with different neighbor vertices, and should own different embeddings respectively. Therefore, we present Context-Aware Network Embedding (CANE), a novel NE model to address this issue. CANE learns context-aware embeddings for vertices with mutual attention mechanism and is expected to model the semantic relationships between vertices more precisely. In experiments, we compare our model with existing NE models on three real-world datasets. Experimental results show that CANE achieves significant improvement than state-of-the-art methods on link prediction and comparable performance on vertex classification. The source code and datasets can be obtained from https://github.com/thunlp/CANE.
empirical methods in natural language processing | 2016
Huimin Chen; Maosong Sun; Cunchao Tu; Yankai Lin; Zhiyuan Liu
Document-level sentiment classification aims to predict user’s overall sentiment in a document about a product. However, most of existing methods only focus on local text information and ignore the global user preference and product characteristics. Even though some works take such information into account, they usually suffer from high model complexity and only consider wordlevel preference rather than semantic levels. To address this issue, we propose a hierarchical neural network to incorporate global user and product information into sentiment classification. Our model first builds a hierarchical LSTM model to generate sentence and document representations. Afterwards, user and product information is considered via attentions over different semantic levels due to its ability of capturing crucial semantic components. The experimental results show that our model achieves significant and consistent improvements compared to all state-of-theart methods. The source code of this paper can be obtained from https://github. com/thunlp/NSC.
international joint conference on artificial intelligence | 2017
Cheng Yang; Maosong Sun; Zhiyuan Liu; Cunchao Tu
Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently. In this paper, we summarize most existing NRL methods into a unified two-step framework, including proximity matrix construction and dimension reduction. We focus on the analysis of proximity matrix construction step and conclude that an NRL method can be improved by exploring higher order proximities when building the proximity matrix. We propose Network Embedding Update (NEU) algorithm which implicitly approximates higher order proximities with theoretical approximation bound and can be applied on any NRL methods to enhance their performances. We conduct experiments on multi-label classification and link prediction tasks. Experimental results show that NEU can make a consistent and significant improvement over a number of NRL methods with almost negligible running time on all three publicly available datasets. The source code of this paper can be obtained from https://github.com/thunlp/NEU.
Chinese National Conference on Social Media Processing | 2014
Cunchao Tu; Zhiyuan Liu; Maosong Sun
Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval. In order to better understand the semantics of user tags, we propose Tag Correspondence Model (TCM) to identify complex correspondences of tags from the rich context of microblog users. In TCM, we divide the context of a microblog user into various sources (such as short messages, user profile, and neighbors). With a collection of users with annotated tags, TCM can automatically learn the correspondences of user tags from the multiple sources. With the learned correspondences, we are able to interpret implicit semantics of tags. Moreover, for the users who have not annotated any tags, TCM can suggest tags according to users’ context information. Extensive experiments on a real-world dataset demonstrate that our method can efficiently identify correspondences of tags, which may eventually represent semantic meanings of tags.
Chinese National Conference on Social Media Processing | 2015
Cunchao Tu; Zhiyuan Liu; Maosong Sun
User profession plays an important role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this paper, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges which make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework of PRofession Identification in Social Media (PRISM). It takes advantages of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidences of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate significant effectiveness of our method compared with other baseline methods.
international joint conference on artificial intelligence | 2017
Cunchao Tu; Zhengyan Zhang; Zhiyuan Liu; Maosong Sun
Conventional network representation learning (NRL) models learn low-dimensional vertex representations by simply regarding each edge as a binary or continuous value. However, there exists rich semantic information on edges and the interactions between vertices usually preserve distinct meanings, which are largely neglected by most existing NRL models. In this work, we present a novel Translation-based NRL model, TransNet, by regarding the interactions between vertices as a translation operation. Moreover, we formalize the task of Social Relation Extraction (SRE) to evaluate the capability of NRL methods on modeling the relations between vertices. Experimental results on SRE demonstrate that TransNet significantly outperforms other baseline methods by 10% to 20% on hits@1. The source code and datasets can be obtained from https: //github.com/thunlp/TransNet.
ACM Transactions on Intelligent Systems and Technology | 2017
Cunchao Tu; Zhiyuan Liu; Huanbo Luan; Maosong Sun
Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.
Journal of Computer Science and Technology | 2015
Cunchao Tu; Zhiyuan Liu; Maosong Sun
Some microblog services encourage users to annotate themselves with multiple tags, indicating their attributes and interests. User tags play an important role for personalized recommendation and information retrieval. In order to better understand the semantics of user tags, we propose Tag Correspondence Model (TCM) to identify complex correspondences of tags from the rich context of microblog users. The correspondence of a tag is referred to as a unique element in the context which is semantically correlated with this tag. In TCM, we divide the context of a microblog user into various sources (such as short messages, user profile, and neighbors). With a collection of users with annotated tags, TCM can automatically learn the correspondences of user tags from multiple sources. With the learned correspondences, we are able to interpret implicit semantics of tags. Moreover, for the users who have not annotated any tags, TCM can suggest tags according to users’ context information. Extensive experiments on a real-world dataset demonstrate that our method can efficiently identify correspondences of tags, which may eventually represent semantic meanings of tags.
Journal of Computer Science and Technology | 2017
Ayana; Shiqi Shen; Yankai Lin; Cunchao Tu; Yu Zhao; Zhiyuan Liu; Maosong Sun
Recently, neural models have been proposed for headline generation by learning to map documents to headlines with recurrent neural network. In this work, we give a detailed introduction and comparison of existing work and recent improvements in neural headline generation, with particular attention on how encoders, decoders and neural model training strategies alter the overall performance of the headline generation system. Furthermore, we perform quantitative analysis of most existing neural headline generation systems and summarize several key factors that impact the performance of headline generation systems. Meanwhile, we carry on detailed error analysis to typical neural headline generation systems in order to gain more comprehension. Our results and conclusions are hoped to benefit future research studies.
international joint conference on artificial intelligence | 2016
Cunchao Tu; Weicheng Zhang; Zhiyuan Liu; Maosong Sun