Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiaofeng He is active.

Publication


Featured researches published by Xiaofeng He.


Frontiers of Computer Science in China | 2013

Towards modeling popularity of microblogs

Haixin Ma; Weining Qian; Fan Xia; Xiaofeng He; Jun Xu; Aoying Zhou

As one kind of social media, microblogs are widely used for sensing the real-world. The popularity of microblogs is an important measurement for evaluation of the influencial of pieces of information. The models and modeling techniques for popularity of microblogs are studied in this paper. A huge data set based on Sina Weibo, one of the most popular microblogging services, is used in the study. First, two different types of popularity, namely number of retweets and number of possible views are defined, while their relationships are discussed. Then, the temporal dynamics, including lifecycles and tipping-points, of tweets’ popularity are studied. For modeling the temporal dynamics, a piecewise sigmoid model is used. Empirical studies show the effectiveness of our modeling methods.


international conference on data engineering | 2015

Challenges in Chinese knowledge graph construction

Chengyu Wang; Ming Gao; Xiaofeng He; Rong Zhang

The automatic construction of large-scale knowledge graphs has received much attention from both academia and industry in the past few years. Notable knowledge graph systems include Google Knowledge Graph, DBPedia, YAGO, NELL, Probase and many others. Knowledge graph organizes the information in a structured way by explicitly describing the relations among entities. Since entity identification and relation extraction are highly depending on language itself, data sources largely determine the way the data are processed, relations are extracted, and ultimately how knowledge graphs are formed, which deeply involves the analysis of lexicon, syntax and semantics of the content. Currently, much progress has been made for knowledge graphs in English language. In this paper, we discuss the challenges facing Chinese knowledge graph construction because Chinese is significantly different from English in various linguistic perspectives. Specifically, we analyze the challenges from three aspects: data sources, taxonomy derivation and knowledge extraction. We also present our insights in addressing these challenges.


asia-pacific web conference | 2015

User Generated Content Oriented Chinese Taxonomy Construction

Jinyang Li; Chengyu Wang; Xiaofeng He; Rong Zhang; Ming Gao

The taxonomy is one of the basic components in knowledge graphs as it establishes types of classes and semantic relations among the classes. Taxonomies are normally constructed either manually, or by language-dependent rules or patterns for type and relation extraction or inference. Existing work on building taxonomies for knowledge graphs is mostly in English language environment. In this paper, we propose a novel approach for large-scale Chinese taxonomy construction based on user generated content. We take Chinese Wikipedia as the data source, develop methods to extract classes and their relations mined from user tagged categories, and build up the taxonomy using a bottom-up strategy. The algorithms can be easily applied to other Wiki-style data sources. The experiments show that the constructed Chinese taxonomy achieves better results in both quality and quantity.


international conference on data mining | 2013

Search Behavior Based Latent Semantic User Segmentation for Advertising Targeting

Xueqing Gong; Xinyu Guo; Rong Zhang; Xiaofeng He; Aoying Zhou

The popularity of internet usage greatly motivates the online advertising activities. Compared to advertising on traditional media, online advertising has rich information as well as necessary techniques to achieve precise user targeting. This rich information includes the search behaviors of a user, such as queries issued, or the ads clicked by the user. For popular websites with large number of active users, ad delivery targeting at individual users puts too much burden on the system. User segmentation is an alternative way to relieve this burden by grouping users of similar interests together, then the ad delivery system targets the user segments to display relevant ads, instead of individual users. Existing user segmentation work either adapts clustering methods without considering the hidden semantics embedded in the data, such as K-means, or treats users as data instance and clusters users indirectly even if the latent semantics is incorporated into the transformed data, such as PLSA or LDA. In this paper, we present a search behavior based latent semantic user segmentation method and validate its effectiveness on new ads. Instead of treating users as data instances, they are used as attributes of user issued queries or clicked ads which are considered to be data instances. LDA is then applied to this data set to directly obtain the user segments. Compared to popular K-means clustering, our approach achieves higher CTR values on new ads, with only simple search information.


asia-pacific web conference | 2013

Selecting a Diversified Set of Reviews

Wenzhe Yu; Rong Zhang; Xiaofeng He; Chaofeng Sha

Online product reviews provide helpful information for user decision-making. However, since user-generated reviews proliferate in recent years, it is critical to deal with the information overload in e-commerce sites. In this paper, we propose an approach to select a small set of representative reviews for each product, which shall consider both the attribute coverage and opinion diversity under the requirement of providing high quality reviews. First, we assign weights to each attribute, which measure the attribute importance and help realize useful review selection; second, we cluster reviews into different groups representing different concerns which lead to better diversification results especially for selecting smaller sets of reviews; finally, we perform a set of experiments on real datasets to verify our ideas.


meeting of the association for computational linguistics | 2017

Transductive Non-linear Learning for Chinese Hypernym Prediction.

Chengyu Wang; Junchi Yan; Aoying Zhou; Xiaofeng He

Finding the correct hypernyms for entities is essential for taxonomy learning, fine-grained entity categorization, query understanding, etc. Due to the flexibility of the Chinese language, it is challenging to identify hypernyms in Chinese accurately. Rather than extracting hypernyms from texts, in this paper, we present a transductive learning approach to establish mappings from entities to hypernyms in the embedding space directly. It combines linear and non-linear embedding projection models, with the capacity of encoding arbitrary language-specific rules. Experiments on real-world datasets illustrate that our approach outperforms previous methods for Chinese hypernym prediction.


Knowledge and Information Systems | 2016

Learning user credibility for product ranking

Rong Zhang; Ming Gao; Xiaofeng He; Aoying Zhou

As the explosion of user-generated data (UGC) in electronic commerce, this kind of data is scanned for trust or credibility calculation, which plays an important role in business selection. The commonly used UGC is user reviews and ratings. A new consumer without any experience with some product will read these UGCs to get an overview. However, the open and dynamic e-commerce platforms may rise the generation of unfair or deceitful reviews and ratings. Then, detecting trustful reviewers or generating authentic ratings for customers is urgent and useful. In this paper, we present a twin-bipartite graph model to catch the review and ranking relationship among users, products and shops. We design a feedback mechanism to get the consistent ranking among different level of objects, which are users and items. In the algorithm, we adjust customer credibility values by the feedback considering the rating consistency; we adjust ratings by combining customer credibility together with originally assigned ratings. We increase the credibility for a customer if the customer gives a high (low) score to a good (bad) product and decrease the value if the customer gives a low (high) score to a good (bad) product. We detect the inconsistency between semantic ratings (the review comments) and numerical ratings (scores). To deal with it, we train a classifier on the training data that are constructed automatically. The trained classifier is used to predict the semantic scores from review comments. Finally, we calculate the scores of products by considering both the customer credibility and the predicted scores. We conduct experiments using a large amount of real-world data. The experimental results show that our proposed approach provides better products ranking than the baseline systems.


international world wide web conferences | 2016

NERank: Ranking Named Entities in Document Collections

Chengyu Wang; Rong Zhang; Xiaofeng He; Aoying Zhou

While most of the entity ranking research focuses on Web corpora with user queries as input, little has been done to rank entities directly from documents. We propose a ranking algorithm NERank to address this issue. NERank employs a random walk process on a weighted tripartite graph mined from the document collection. We evaluate NERank over real-life document datasets and compare it with baselines. Experimental results show the effectiveness of our method.


Frontiers of Computer Science in China | 2015

Product-oriented review summarization and scoring

Rong Zhang; Wenzhe Yu; Chaofeng Sha; Xiaofeng He; Aoying Zhou

Currently, there are many online review web sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the number of online comments is often large and the number continues to grow as more and more consumers contribute. In addition, one comment may mention more than one product and contain opinions about different products, mentioning something good and something bad. However, they share only a single overall score. Therefore, it is not easy to know the quality of an individual product from these comments.This paper presents a novel approach to generate review summaries including scores and description snippets with respect to each individual product. From the large number of comments, we first extract the context (snippet) that includes a description of the products and choose those snippets that express consumer opinions on them. We then propose several methods to predict the rating (from 1 to 5 stars) of the snippets. Finally, we derive a generic framework for generating summaries from the snippets. We design a new snippet selection algorithm to ensure that the returned results preserve the opinion-aspect statistical properties and attribute-aspect coverage based on a standard seat allocation algorithm. Through experimentswe demonstrate empirically that our methods are effective. We also quantitatively evaluate each step of our approach.


database systems for advanced applications | 2014

TaxiHailer: A Situation-Specific Taxi Pick-Up Points Recommendation System

Leyi Song; Chengyu Wang; Xiaoyi Duan; Bing Xiao; Xiao Liu; Rong Zhang; Xiaofeng He; Xueqing Gong

This demonstration presents TaxiHailer, a situation-specific recommendation system for passengers who are eager to find a taxi. Given a query with departure point, destination and time, it recommends pick-up points within a specified distance and ranked by potential waiting time. Unlike existing works, we consider three sets of features to build regression models, as well as Poisson process models for road segment clusters. We evaluate and choose the most proper models for each cluster under different situations. Also, TaxiHailer gives destination-aware recommendations for pick-up points with driving directions. We evaluate our recommendation results based on real GPS datasets.

Collaboration


Dive into the Xiaofeng He's collaboration.

Top Co-Authors

Avatar

Aoying Zhou

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Chengyu Wang

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Rong Zhang

East China Normal University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xueqing Gong

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Yan Fan

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Ming Gao

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Bing Xiao

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Fan Xia

East China Normal University

View shared research outputs
Top Co-Authors

Avatar

Guohai Xu

East China Normal University

View shared research outputs
Researchain Logo
Decentralizing Knowledge