Bifan Wei | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bifan Wei is active.

Explore More

Publication

Featured researches published by Bifan Wei.

international conference on neural information processing | 2012

MOTIF-RE: motif-based hypernym/hyponym relation extraction from wikipedia links

Bifan Wei; Jun Liu; Jian Ma; Qinghua Zheng; Wei Zhang; Boqin Feng

Hypernym/hyponym relation extraction plays an essential role in taxonomy learning. The conventional methods based on lexico-syntactic patterns or machine learning usually make use of content-related features. In this paper, we find that the proportions of hyperlinks with different semantic type vary markedly in different network motifs. Based on this observation, we propose MOTIF-RE, an algorithm of extracting hypernym/hyponym relation from Wikipedia hyperlinks. The extraction process consists of three steps: 1) Build a directed graph from a set of domain-specific Wikipedia articles. 2) Count the occurrences of hyperlinks in every three-node network motif and create a feature vector for every hyperlink. 3) Train a classifier to identify semantic relation of hyperlinks. We created three domain-specific Wikipedia article sets to test MOTIF-RE. Experiments on individual dataset show that MOTIF-RE outperforms the baseline algorithm by about 30% in terms of F1-measure. Cross-domain experimental results show similar, which proves that MOTIF-RE has fairly good domain adaptation ability.

european semantic web conference | 2017

Exploiting Source-Object Networks to Resolve Object Conflicts in Linked Data

Wenqiang Liu; Jun Liu; Haimeng Duan; Wei Hu; Bifan Wei

Considerable effort has been exerted to increase the scale of Linked Data. However, an inevitable problem arises when dealing with data integration from multiple sources. Various sources often provide conflicting objects for a certain predicate of the same real-world entity, thereby causing the so-called object conflict problem. At present, object conflict problem has not received sufficient attention in the Linked Data community. Thus, in this paper, we firstly formalize the object conflict resolution as computing the joint distribution of variables on a heterogeneous information network called the Source-Object Network, which successfully captures three correlations from objects and Linked Data sources. Then, we introduce a novel approach based on network effects called ObResolution (object resolution), to identify a true object from multiple conflicting objects. ObResolution adopts a pairwise Markov Random Field (pMRF) to model all evidence under a unified framework. Extensive experimental results on six real-world datasets show that our method achieves higher accuracy than existing approaches and it is robust and consistent in various domains.

international world wide web conferences | 2013

DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia

Bifan Wei; Jun Liu; Jian Ma; Qinghua Zheng; Wei Zhang; Boqin Feng

Extracting faceted taxonomies from the Web has received increasing attention in recent years from the web mining community. We demonstrate in this study a novel system called DFT-Extractor, which automatically constructs domain-specific faceted taxonomies from Wikipedia in three steps: 1) It crawls domain terms from Wikipedia by using a modified topical crawler. 2) Then it exploits a classification model to extract hyponym relations with the use of motif-based features. 3) Finally, it constructs a faceted taxonomy by applying a community detection algorithm and a group of heuristic rules. DFT-Extractor also provides a graphical user interface to visualize the learned hyponym relations and the tree structure of taxonomies.

international world wide web conferences | 2017

TruthDiscover: Resolving Object Conflicts on Massive Linked Data

Wenqiang Liu; Jun Liu; Haimeng Duan; Jian Zhang; Wei Hu; Bifan Wei

Considerable effort has been made to increase the scale of Linked Data. However, because of the openness of the Semantic Web and the ease of extracting Linked Data from semi-structured sources (e.g., Wikipedia) and unstructured sources, many Linked Data sources often provide conflicting objects for a certain predicate of a real-world entity. Existing methods cannot be trivially extended to resolve conflicts in Linked Data because Linked Data has a scale-free property. In this demonstration, we present a novel system called TruthDiscover, to identify the truth in Linked Data with a scale-free property. First, TruthDiscover leverages the topological properties of the Source Belief Graph to estimate the priori beliefs of sources, which are utilized to smooth the trustworthiness of sources. Second, the Hidden Markov Random Field is utilized to model interdependencies among objects for estimating the trust values of objects accurately. TruthDiscover can visualize the process of resolving conflicts in Linked Data.

IEEE Transactions on Knowledge and Data Engineering | 2014

Motif-Based Hyponym Relation Extraction from Wikipedia Hyperlinks

Bifan Wei; Jun Liu; Jian Ma; Qinghua Zheng; Wei Zhang; Boqin Feng

Discovering hyponym relations among domain-specific terms is a fundamental task in taxonomy learning and knowledge acquisition. However, the great diversity of various domain corpora and the lack of labeled training sets make this task very challenging for conventional methods that are based on text content. The hyperlink structure of Wikipedia article pages was found to contain recurring network motifs in this study, indicating the probability of a hyperlink being a hyponym hyperlink. Hence, a novel hyponym relation extraction approach based on the network motifs of Wikipedia hyperlinks was proposed. This approach automatically constructs motif-based features from the hyperlink structure of a domain; every hyperlink is mapped to a 13-dimensional feature vector based on the 13 types of three-node motifs. The approach extracts structural information from Wikipedia and heuristically creates a labeled training set. Classification models were determined from the training sets for hyponym relation extraction. Two experiments were conducted to validate our approach based on seven domain-specific datasets obtained from Wikipedia. The first experiment, which utilized manually labeled data, verified the effectiveness of the motif-based features. The second experiment, which utilized an automatically labeled training set of different domains, showed that the proposed approach performs better than the approach based on lexico-syntactic patterns and achieves comparable result to the approach based on textual features. Experimental results show the practicability and fairly good domain scalability of the proposed approach.

Archive | 2011

An Extended Supervised Term Weighting Method for Text Categorization

Bifan Wei; Boqin Feng; Feijuan He; Xiaoyu Fu

When Support Vector Machines (SVM) are exploited for automatic text categorization, text representation and term weighting have a significant impact on the performance of text classification. Conventional supervised weighting methods only focus on the frequency characteristics of feature terms, without consideration of semantic characteristics of them. Inspired by supervised weighting method, semantic distance between terms and categories is introduced into term weights calculation. The first step is modeling each category with two vectors of feature terms, which are called category core terms, and acquiring these terms by machine learning methods. Second, the semantic distance between feature terms and category core terms is calculated based on semantic database. Third, the global weight factor is replaced by the sematic distance to calculate the weight of every term. Based on the standard benchmark Reuters-21578, this kind of term weighting schemas can generally produce satisfied results of classification using SVMlight as classifier with default parameters.

Neurocomputing | 2018

Representation learning over multiple knowledge graphs for knowledge graphs alignment

Wenqiang Liu; Jun Liu; Mengmeng Wu; Samar Abbas; Wei Hu; Bifan Wei; Qinghua Zheng

Abstract The goal of representation learning of knowledge graph is to encode both entities and relations into a low-dimensional embedding spaces. Mostly current works have demonstrated the benefits of knowledge graph embedding in single knowledge graph completion, such as relation extraction. The most significant distinction between multiple knowledge graphs embedding and single knowledge graph embedding is that the former must consider the alignments between multiple knowledge graphs which is very helpful to some applications built on multiple KGs, such as KB-QA and KG integration. In this paper, we proposed a new automatic representation learning model over Multiple Knowledge Graphs (MGTransE) by adopting a bootstrapping method. More specifically, MGTransE consists of three core components: Structure Model, Semantically Smooth Embedding Model and Iterative Smoothness Model. The experiment results on two real-world datasets show that our method achieves better performance on two new multiple KGs tasks compared with state-of-the-art KG embedding models and also preserves the key properties of knowledge graph embedding on traditional single KG tasks as compared to those methods learned from single KG.

Knowledge and Information Systems | 2018

Answering why-not questions on SPARQL queries

Meng Wang; Jun Liu; Bifan Wei; Siyu Yao; Hongwei Zeng; Lei Shi

SPARQL, the W3C standard for RDF query languages, has gained significant popularity in recent years. An increasing amount of effort is currently being exerted to improve the functionality and usability of SPARQL-based search engines. However, explaining missing items in the results of SPARQL queries or the so-called why-not question has not received sufficient attention. In this study, we first formalize why-not questions on SPARQL queries and then propose a novel explanation model, called answering why-not questions on SPARQL (ANNA) to answer why-not questions using a divide-and-conquer strategy. ANNA adopts a graph-based approach and an operator-based approach to generate logical explanations at the triple pattern level and the query operator level, respectively, which helps users refine their initial queries. Extensive experimental results on two real-world RDF datasets show that the proposed model and algorithms can provide high-quality explanations in terms of both effectiveness and efficiency.

Knowledge and Information Systems | 2018

A new truth discovery method for resolving object conflicts over Linked Data with scale-free property

Wenqiang Liu; Jun Liu; Bifan Wei; Haimeng Duan; Wei Hu

Considerable effort has been exerted to increase the scale of Linked Data. However, an inevitable problem arises when dealing with data integration from multiple sources. Various sources often provide conflicting objects for a certain predicate of the same real-world entity, thereby causing the so-called object conflict problem. Existing truth discovery methods cannot be trivially extended to resolve object conflict problems because Linked Data has a scale-free property, i.e., most of the sources provide few objects, whereas only a few sources have numerous objects. In this study, we propose a novel approach called TruthDiscover to determine the most trustworthy object in Linked Data with a scale-free property. More specifically, TruthDiscover consists of two core components: Priori Belief Estimation for smoothing the trustworthiness of sources by leveraging the topological properties of the Source Belief Graph, and Truth Computation for inferencing the trustworthiness of source and trust value of an object. Experimental results conducted on six datasets show that TruthDiscover achieves higher accuracy than existing approaches, and it is robust and consistent in various domains.

advanced data mining and applications | 2017

Quality prediction of newly proposed questions in CQA by leveraging weakly supervised learning

Yuanhao Zheng; Bifan Wei; Jun Liu; Meng Wang; Weitong Chen; Bei Wu; Yihe Chen

Community Question Answering (CQA) websites provide a platform to ask questions and share their knowledge. Good questions in CQA websites can improve user experiences and attract more users. To the best of our knowledge, a few researches have been studied on the question quality, especially the quality of newly proposed questions. In this work, we consider that a good question is popular and answerable in CQA websites. The community features of questions are extracted automatically and utilized to acquire massive good questions. The text features and asker features of good questions are utilized to train our weakly supervised model based on Convolutional Neural Network to recognize good newly proposed questions. We conduct extensive experiments on the publicly available dataset from StackExchange and our best result achieves F1-score at 91.5%, outperforming the baselines.

Explore More