Chengyu Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chengyu Wang is active.

Explore More

Publication

Featured researches published by Chengyu Wang.

international conference on data engineering | 2015

Challenges in Chinese knowledge graph construction

Chengyu Wang; Ming Gao; Xiaofeng He; Rong Zhang

The automatic construction of large-scale knowledge graphs has received much attention from both academia and industry in the past few years. Notable knowledge graph systems include Google Knowledge Graph, DBPedia, YAGO, NELL, Probase and many others. Knowledge graph organizes the information in a structured way by explicitly describing the relations among entities. Since entity identification and relation extraction are highly depending on language itself, data sources largely determine the way the data are processed, relations are extracted, and ultimately how knowledge graphs are formed, which deeply involves the analysis of lexicon, syntax and semantics of the content. Currently, much progress has been made for knowledge graphs in English language. In this paper, we discuss the challenges facing Chinese knowledge graph construction because Chinese is significantly different from English in various linguistic perspectives. Specifically, we analyze the challenges from three aspects: data sources, taxonomy derivation and knowledge extraction. We also present our insights in addressing these challenges.

asia-pacific web conference | 2015

User Generated Content Oriented Chinese Taxonomy Construction

Jinyang Li; Chengyu Wang; Xiaofeng He; Rong Zhang; Ming Gao

The taxonomy is one of the basic components in knowledge graphs as it establishes types of classes and semantic relations among the classes. Taxonomies are normally constructed either manually, or by language-dependent rules or patterns for type and relation extraction or inference. Existing work on building taxonomies for knowledge graphs is mostly in English language environment. In this paper, we propose a novel approach for large-scale Chinese taxonomy construction based on user generated content. We take Chinese Wikipedia as the data source, develop methods to extract classes and their relations mined from user tagged categories, and build up the taxonomy using a bottom-up strategy. The algorithms can be easily applied to other Wiki-style data sources. The experiments show that the constructed Chinese taxonomy achieves better results in both quality and quantity.

meeting of the association for computational linguistics | 2017

Transductive Non-linear Learning for Chinese Hypernym Prediction.

Chengyu Wang; Junchi Yan; Aoying Zhou; Xiaofeng He

Finding the correct hypernyms for entities is essential for taxonomy learning, fine-grained entity categorization, query understanding, etc. Due to the flexibility of the Chinese language, it is challenging to identify hypernyms in Chinese accurately. Rather than extracting hypernyms from texts, in this paper, we present a transductive learning approach to establish mappings from entities to hypernyms in the embedding space directly. It combines linear and non-linear embedding projection models, with the capacity of encoding arbitrary language-specific rules. Experiments on real-world datasets illustrate that our approach outperforms previous methods for Chinese hypernym prediction.

Journal of Combinatorial Optimization | 2015

Optimizing word set coverage for multi-event summarization

Jihong Yan; Wenliang Cheng; Chengyu Wang; Jun Liu; Ming Gao; Aoying Zhou

We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.

international world wide web conferences | 2016

NERank: Ranking Named Entities in Document Collections

Chengyu Wang; Rong Zhang; Xiaofeng He; Aoying Zhou

While most of the entity ranking research focuses on Web corpora with user queries as input, little has been done to rank entities directly from documents. We propose a ranking algorithm NERank to address this issue. NERank employs a random walk process on a weighted tripartite graph mined from the document collection. We evaluate NERank over real-life document datasets and compare it with baselines. Experimental results show the effectiveness of our method.

international conference on data engineering | 2015

On the rise and fall of Sina Weibo: Analysis based on a fixed user group

Fan Xia; Qunyan Zhang; Chengyu Wang; Weining Qian; Aoying Zhou

Micro-blogging service Sina Weibo in China has become the countrys most free-flowing and important source of news and opinions just a few years ago. Following its launch in the summer of 2009, Sina Weibo grew quickly, attracting hundreds of millions of users and saw its biggest boom around 2011. However, several reports indicate a decrease in activity on Sina Weibo. In our study, we reveal the prosperity and decline of Sina Weibo by analyzing how a fixed user groups collective behaviors change throughout the whole development process. A huge dataset based on Sina Weibo along with search engine data is used in this study. In this paper we model the popularity of single tweet and multiple tweets. Then we define the statistic representing the capability of information propagation of Sina Weibo. The well-known time series prediction model, ARMA, is used to model and predict its trend. In addition, we extract both internal features, i.e. features of Sina Weibo, and external features, i.e. publics attention. Their trends are presented and analyzed. Then detailed experiments are conducted to measure the correlation and causality between them and our proposed statistic. The approaches we present in this paper clearly show the prosperity and decline of this microblogging community.

database systems for advanced applications | 2014

TaxiHailer: A Situation-Specific Taxi Pick-Up Points Recommendation System

Leyi Song; Chengyu Wang; Xiaoyi Duan; Bing Xiao; Xiao Liu; Rong Zhang; Xiaofeng He; Xueqing Gong

This demonstration presents TaxiHailer, a situation-specific recommendation system for passengers who are eager to find a taxi. Given a query with departure point, destination and time, it recommends pick-up points within a specified distance and ranked by potential waiting time. Unlike existing works, we consider three sets of features to build regression models, as well as Poisson process models for road segment clusters. We evaluate and choose the most proper models for each cluster under different situations. Also, TaxiHailer gives destination-aware recommendations for pick-up points with driving directions. We evaluate our recommendation results based on real GPS datasets.

Frontiers of Computer Science in China | 2018

A retrospective of knowledge graphs

Jihong Yan; Chengyu Wang; Wenliang Cheng; Ming Gao; Aoying Zhou

Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for machines, and even for humans. Knowledge graphs have become prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and finally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner.In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via logical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph systems and discuss the future research directions.

web information systems engineering | 2016

Event Phase Extraction and Summarization

Chengyu Wang; Rong Zhang; Xiaofeng He; Guomin Zhou; Aoying Zhou

Text summarization aims to generate a single, concise representation for documents. For Web applications, documents related to an event retrieved by search engines usually describe several event phases implicitly, making it difficult for existing approaches to identify, extract and summarize these phases. In this paper, we aim to mine and summarize event phases automatically from a stream of news data on the Web. We model the semantic relations of news via a graph model called Temporal Content Coherence Graph. A structural clustering algorithm EPCluster is designed to separate news articles corresponding to event phases. After that, we calculate the relevance of news articles based on a vertex-reinforced random walk algorithm and generate event phase summaries in a relevance maximum optimization framework. Experiments on news datasets illustrate the effectiveness of our approach.

conference on information and knowledge management | 2016

Error Link Detection and Correction in Wikipedia

Chengyu Wang; Rong Zhang; Xiaofeng He; Aoying Zhou

The hyperlink structure of Wikipedia forms a rich semantic network connecting entities and concepts, enabling it as a valuable source for knowledge harvesting. Wikipedia, as crowd-sourced data, faces various data quality issues which significantly impacts knowledge systems depending on it as the information source. One such issue occurs when an anchor text in a Wikipage links to a wrong Wikipage, causing the error link problem. While much of previous work has focused on leveraging Wikipedia for entity linking, little has been done to detect error links. In this paper, we address the error link problem, and propose algorithms to detect and correct error links. We introduce an efficient method to generate candidate error links based on iterative ranking in an Anchor Text Semantic Network. This greatly reduces the problem space. A more accurate pairwise learning model was used to detect error links from the reduced candidate error link set, while suggesting correct links in the same time. This approach is effective when data sparsity is a challenging issue. The experiments on both English and Chinese Wikipedia illustrate the effectiveness of our approach. We also provide a preliminary analysis on possible causes of error links in English and Chinese Wikipedia.

Explore More