Zhichun Wang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhichun Wang is active.

Explore More

Publication

Featured researches published by Zhichun Wang.

international world wide web conferences | 2012

Cross-lingual knowledge linking across wiki knowledge bases

Zhichun Wang; Juanzi Li; Zhigang Wang; Jie Tang

Wikipedia becomes one of the largest knowledge bases on the Web. It has attracted 513 million page views per day in January 2012. However, one critical issue for Wikipedia is that articles in different language are very unbalanced. For example, the number of articles on Wikipedia in English has reached 3.8 million, while the number of Chinese articles is still less than half million and there are only 217 thousand cross-lingual links between articles of the two languages. On the other hand, there are more than 3.9 million Chinese Wiki articles on Baidu Baike and Hudong.com, two popular encyclopedias in Chinese. One important question is how to link the knowledge entries distributed in different knowledge bases. This will immensely enrich the information in the online knowledge bases and benefit many applications. In this paper, we study the problem of cross-lingual knowledge linking and present a linkage factor graph model. Features are defined according to some interesting observations. Experiments on the Wikipedia data set show that our approach can achieve a high precision of 85.8% with a recall of 88.1%. The approach found 202,141 new cross-lingual links between English Wikipedia and Baidu Baike.

Knowledge Based Systems | 2015

NewsMiner: Multifaceted news analysis for event search

Lei Hou; Juanzi Li; Zhichun Wang; Jie Tang; Peng Zhang; Ruibing Yang; Qian Zheng

Online news has become increasingly prevalent due to its convenience for information acquisition, and meanwhile the rapid development of social applications enables news generate and spread through various ways at an unprecedented rate. How to organize and integrate news from multiple sources, and how to analyze and present news to users are two challenging problems. In this article, we represent news as a link-centric heterogeneous network and formalize news analysis and mining task as link discovery problem. More specifically, we propose a co-mention and context based knowledge linking method and a topic-level social content alignment method to establish the links between news and external sources (i.e. knowledge base and social content), and introduce a unified probabilistic model for topic extraction and inner relationship discovery within events. We further present a multifaceted ranking strategy to rank the linked events, topics and entities simultaneously. Extensive experiments demonstrate the advantage of the proposed approaches over baseline methods and the online system we developed (i.e. NewsMiner) has been running for more than three years.

Journal of Zhejiang University Science C | 2012

Knowledge extraction from Chinese wiki encyclopedias

Zhichun Wang; Zhigang Wang; Juanzi Li; Jeff Z. Pan

The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.

Knowledge Based Systems | 2013

Large scale instance matching via multiple indexes and candidate selection

Juanzi Li; Zhichun Wang; Xiao Zhang; Jie Tang

Instance matching aims to discover the linkage between different descriptions of real objects across heterogeneous data sources. With the rapid development of Semantic Web, especially of the linked data, automatically instance matching has been become the fundamental issue for ontological data sharing and integration. Instances in the ontologies are often in large scale, which contains millions of, or even hundreds of millions objects. Directly applying previous schema level ontology matching methods is infeasible. In this paper, we systematically investigate the characteristics of instance matching, and then propose a scalable and efficient instance matching approach named VMI. VMI generates multiple vectors for different kinds of intained in the ontology instances, and uses a set of inverted indexes based rules to get the primary matching candidates. Then it employs user customized property values to further eliminate the incorrect matchings. Finally the similarities of matching candidates are computed as the integrated vector distances and the matching results are extracted. Experiments on instance track from OAEI 2009 and OAEI 2010 show that the proposed method achieves better effectiveness and efficiency (a speedup of more than 100 times and a bit better performance (+3.0% to 5.0% in terms of F1-score) than top performer RiMOM on most of the datasets). Experiments on Linked MDB and DBpedia show that VMI can obtain comparable results with the SILK system (about 26,000 results with good quality).

Knowledge Based Systems | 2013

A unified approach to matching semantic data on the Web

Zhichun Wang; Juanzi Li; Yue Zhao; Rossitza Setchi; Jie Tang

In recent years, the Web has evolved from a global information space of linked documents to a space where data are linked as well. The Linking Open Data (LOD) project has enabled a large number of semantic datasets to be published on the Web. Due to the open and distributed nature of the Web, both the schema (ontology classes and properties) and instances of the published datasets may have heterogeneity problems. In this context, the matching of entities from different datasets is important for the integration of information from different data sources. Recently, much work has been conducted on ontology matching to resolve the schema heterogeneity problem in the semantic datasets. However, there is no unified framework for matching both schema entities and instances. This paper presents a unified matching approach to finding equivalent entities in ontologies and LOD datasets on the Web. The approach first combines multiple lexical matching strategies using a novel voting-based aggregation method; then it utilizes the structural information and the already found correspondences to discover additional ones. We evaluated our approach using datasets from both OAEI and LOD. The results show that the voting-based aggregation method provides highly accurate matching results, and that the structural propagation procedure effectively improves the recall of the results.

international semantic web conference | 2013

Discovering Missing Semantic Relations between Entities in Wikipedia

Mengling Xu; Zhichun Wang; Rongfang Bie; Juanzi Li; Chen Zheng; Wantian Ke; Mingquan Zhou

Wikipedias infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedias instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedias infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall.

Journal of Computer Science and Technology | 2016

RiMOM-IM: A Novel Iterative Framework for Instance Matching

Chao Shao; Linmei Hu; Juanzi Li; Zhichun Wang; Tong Lee Chung; Jun-Bo Xia

Instance matching, which aims at discovering the correspondences of instances between knowledge bases, is a fundamental issue for the ontological data sharing and integration in Semantic Web. Although considerable instance matching approaches have already been proposed, how to ensure both high accuracy and efficiency is still a big challenge when dealing with large-scale knowledge bases. This paper proposes an iterative framework, RiMOM-IM (RiMOM-Instance Matching). The key idea behind this framework is to fully utilize the distinctive and available matching information to improve the efficiency and control the error propagation. We participated in the 2013 and 2014 competition of Ontology Alignment Evaluation Initiative (OAEI), and our system was ranked the first. Furthermore, the experiments on previous OAEI datasets also show that our system performs the best.

international semantic technology conference | 2011

Building a large scale knowledge base from chinese wiki encyclopedia

Zhichun Wang; Zhigang Wang; Juanzi Li; Jeff Z. Pan

DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.

asia-pacific web conference | 2014

Learning to Compute Semantic Relatedness Using Knowledge from Wikipedia

Chen Zheng; Zhichun Wang; Rongfang Bie; Mingquan Zhou

Recently, Wikipedia has become a very important resource for computing semantic relatedness (SR) between entities. Several approaches have already been proposed to compute SR based on Wikipedia. Most of the existing approaches use certain kinds of information in Wikipedia (e.g. links, categories, and texts) and compute the SR by empirically designed measures. We have observed that these approaches produce very different results for the same entity pair in some cases. Therefore, how to select appropriate features and measures to best approximate the human judgment on SR becomes a challenging problem. In this paper, we propose a supervised learning approach for computing SR between entities based on Wikipedia. Given two entities, our approach first maps entities to articles in Wikipedia; then different kinds of features of the mapped articles are extracted from Wikipedia, which are then combined with different relatedness measures to produce nine raw SR values of the entity pair. A supervised learning algorithm is proposed to learn the optimal weights of different raw SR values. The final SR is computed as the weighted average of raw SRs. Experiments on benchmark datasets show that our approach outperforms baseline methods.

CSWS | 2013

Co-mention and Context-Based Entity Linking

Qian Zheng; Juanzi Li; Zhichun Wang; Lei Hou

Recently, online news has become one of the most important resources from which people get useful information. Linking named entities in news articles to existing knowledge bases is a critical task to facilitate readers to understand the news well. In this paper, we propose an approach for linking entities in Chinese news articles to Chinese knowledge bases. Our approach first recognizes three types of named entities (i.e., person, location, and organization) and then uses a disambiguation method to link entities occurring in news articles to entities in knowledge bases. In the disambiguation process, co-mentioned entities are used as features to compute the context similarities between entities in news and entities in knowledge bases; the disambiguation results are decided by a threshold-filtering method on the context similarities. Experiments on linking entities in Sina news to Hudong knowledge base validate the effectiveness of our approach; it achieves 84.39%, 84.02%, and 86.16% F1-scores in the task of linking person entities, location entities, and organization entities, respectively.

Explore More