Zhenying He | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhenying He is active.

Explore More

Publication

Featured researches published by Zhenying He.

Expert Systems With Applications | 2015

Top-k similarity search in heterogeneous information networks with x-star network schema

Mingxi Zhang; Hao Hu; Zhenying He; Wei Wang

The efficiency improvement is evident for similarity computation.The effectiveness of returned result is good for similarity search.The pruning algorithm is presented for supporting fast online query processing.The accuracy loss of pruning algorithm can be controlled by setting thresholds. An x-star network is an information network which consists of centers with connections among themselves, and different type attributes linking to these centers. As x-star networks become ubiquitous, extracting knowledge from x-star networks has become an important task. Similarity search in x-star network aims to find the centers similar to a given query center, which has numerous applications including collaborative filtering, community mining and web search. Although existing methods yield promising similar results, such as SimRank and P-Rank, they are not applicable for massive x-star networks. In this paper, we propose a structural-based similarity measure, NetSim, towards efficiently computing similarity between centers in an x-star network. The similarity between attributes is computed in the pre-processing stage by the expected meeting probability over attribute network that is extracted from the whole structure of x-star network. The similarity between centers is computed online according to the attribute similarities based on the intuition that similar centers are linked with similar attributes. NetSim requires less time and space cost than existing methods since the scale of attribute network is significantly smaller than the whole x-star network. For supporting fast online query processing, we develop a pruning algorithm by building a pruning index, which prunes candidate centers that are not promising. Extensive experiments demonstrate the effectiveness and efficiency of our method through comparing with the state-of-the-art measures.

web intelligence | 2012

E-rank: A Structural-Based Similarity Measure in Social Networks

Mingxi Zhang; Zhenying He; Hao Hu; Wei Wang

With the social networks (SNs) becoming ubiquitous and massive, the issue of similarity computation among entities becomes more challenging and draws extensive interests from various research fields. SimRank is a well known similarity measure, however it considers only the meetings between two nodes that walk along equal length paths since the path length increases strictly with the iteration increasing during the similarity computation, besides, it does not differentiate importance for each link. In this paper, we propose a novel structural similarity measure, E-Rank (Entity Rank), towards effectively computing the structural similarity of entities in SNs, based on the intuition that two entities are similar if they can arrive at common entities. E-Rank can be well applied to social networks for measuring similarities of entities. Extensive experiments demonstrate the effectiveness of E-Rank by comparing with the state-of-the-art measures.

Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2013

Diversifying Query Suggestions by Using Topics from Wikipedia

Hao Hu; Mingxi Zhang; Zhenying He; Peng Wang; Wei Wang

Diversifying query suggestions has emerged recently, by which the recommended queries can be both relevant and diverse. Most existing works diversify suggestions by query log analysis, however, for structured data, not all query logs are available. To this end, this paper studies the problem of suggesting diverse query terms by using topics from Wikipedia. Wikipedia is a successful online encyclopedia, and has high coverage of entities and concepts. We first obtain all relevant topics from Wikipedia, and then map each term to these topics. As the mapping is a nontrivial task, we leverage information from both Wikipedia and structured data to semantically map each term to topics. Finally, we propose a fast algorithm to efficiently generate the suggestions. Extensive evaluations are conducted on a real dataset, and our approach yields promising results.

Expert Systems With Applications | 2015

Efficient link-based similarity search in web networks

Mingxi Zhang; Hao Hu; Zhenying He; Liping Gao; Liujie Sun

The pre-computation cost in the off-line stage is significantly reduced.The efficiency of query processing is optimized by proposing a pruning algorithm.The accuracy loss of pruning algorithm is controlled by tuning threshold.The effectiveness of returned result is effective and acceptable. Similarity search in web networks, aiming to find entities similar to the given entity, is one of the core tasks in network analysis. With the proliferation of web applications, including web search and recommendation system, SimRank has been a well-known measure for evaluating entity similarity in a network. However, the existing work computes SimRank iteratively over a huge similarity matrix, which is expensive in terms of time and space cost and cannot efficiently support similarity search over large networks. In this paper, we propose a link-based similarity search method, WebSim, towards efficiently finding similar entities in web networks. WebSim defines the similarity between entities as the 2-hop similarity of SimRank. To reduce computation cost, we divide the similarity search process into two stages: off-line stage and on-line stage. In the off-line stage, the 1-hop similarities are computed, and an optimized algorithm is designed to reduce the unnecessary accumulation operations on zero similarities. In the on-line stage, the 2-hop similarities are computed, and a pruning algorithm is developed to support fast query processing through searching similar entries from a partial sums index derived from the 1-hop similarities. The index items that are lower than a given threshold are skipped to reduce the searching space. Compared to the iterative SimRank computation, the time and space cost of similarity computation is significantly reduced, since WebSim maintains only the similarity matrix of 1-hop that is much smaller than that of multi-hop. Experiments through comparison with SimRank and its optimized algorithms demonstrate that WebSim has on average a 99.83% reduction in the time cost and a 92.12% reduction in the space cost of similarity computation, and achieves on average 99.98% NDCG.

Neurocomputing | 2015

A comprehensive structural-based similarity measure in directed graphs

Mingxi Zhang; Hao Hu; Zhenying He; Liping Gao; Liujie Sun

Computing similarity between two nodes in directed graphs plays an increasingly important role in various research fields, including clustering, collaborative filtering and community mining. Many similarity measures have been devoted in recent years, such as SimRank, PSimRank and SimFusion. However, these measures consider only the expected meeting probability of equal path length, which may omit some latent similar nodes. Besides, the link importance of each edge is not distinguished, which may lead to unreasonable rankings while searching similar nodes. In this paper, we propose an effective structural-based similarity measure, ESimRank, for effectively computing similarities in directed graphs. We firstly define effective relationship strength (ERS) to distinguish link importance by utilizing node activity, node attraction and link frequency. And then we formalize ESimRank equation by combining ERS and the expected meeting probabilities of any path length. Compared to existing similarity measures, ESimRank can find more latent similar nodes and give ranking of better quality. For supporting fast similarity computation, we develop an extended partial sums-based algorithm, which reduces the time complexity significantly. Extensive experiments demonstrate the effectiveness and efficiency of ESimRank by comparing with the state-of-the-art similarity measures.

World Wide Web | 2018

Finding maximal ranges with unique topics in a text database

Zhihui Yang; Huixin Ma; Zhenying He; X. Sean Wang

Recent years have witnessed the rapid growth of text data, and thus the increasing importance of in-depth analysis of text data for various applications. Text data are often organized in a database with documents labeled by attributes like time and location. Different documents manifest different topics. The topics of the documents may change along the attributes of the documents, and such changes have been the subject of research in the past. However, previous analyses techniques, such as topic detection and tracking, topic lifetime, and burstiness, all focus on the topic behavior of the documents in a given attribute range without contrasting to the documents in the overall range. This paper introduces the concept of uniquetopics, referring to those topics that only appear frequently within a small range of documents but not in the whole range. These unique topics may reflect some unique characteristics of documents in this small range not found outside of the range. The paper aims at an efficient pruning-based algorithm that, for a user-given set of keywords and a user-given attribute, finds the maximal ranges along the given attribute and their unique topics that are highly related to the given keyword set. Thorough experiments show that the algorithm is effective in various scenarios.

database systems for advanced applications | 2016

ListMerge: Accelerating Top-k Aggregation Queries Over Large Number of Lists

Shile Zhang; Chao Sun; Zhenying He

Sorted list is widely used to feature indexing in a variety of applications, such as multimedia database and information retrieval. Answering top-k aggregation queries on a set of lists plays an increasingly important role in these domains. Unfortunately the existing solutions, such as threshold-style TA-style algorithms, do not guarantee superior performance on a large number of lists. In this paper, we introduce a merge-based strategy, called ListMerge, to accelerating TA-style algorithms. ListMerge exploits a critical observation to TA-style algorithms: if aggregation functions are monotone and distributive, it is much more efficient that merging several lists together, then applying a TA-style algorithm. This observation also inspires the development of our cost model, which can evaluate the best number of merged lists. Experimental results show that ListMerge could outperform the baseline algorithms upi¾źto 4---20 times in synthetic datasets generated by various distributions.

database systems for advanced applications | 2018

iExplore: Accelerating Exploratory Data Analysis by Predicting User Intention

Zhihui Yang; Jiyang Gong; Chaoying Liu; Yinan Jing; Zhenying He; Kai Zhang; X. Sean Wang

Exploratory data analysis over large datasets has become an increasingly prevalent use case. However, users are easily overwhelmed by the data and might take a long time to find interesting facts. In this paper, we design a system called iExplore to assist users in doing this time-consuming data exploration task through predicting user intention. Moreover, we propose an intention model to help the iExplore system have a comprehensive understanding of user’s intention. Thus, the exploratory process can be accelerated by the intention-driven recommendation and prefetching mechanisms. Extensive experiments demonstrate that the intention-driven iExplore system can significantly lighten the burden of users and facilitate the exploratory process.

database systems for advanced applications | 2018

Online Subset Topic Modeling for Interactive Documents Exploration

Linwei Li; Yaobo Wu; Yixiong Ke; Chaoying Liu; Yinan Jing; Zhenying He; Xiaoyang Sean Wang

Data exploration over text databases is an important problem. In an exploration scenario, users would find something useful without previously knowing what exactly they are looking for, until the time they identify them. Therefore, labor-intensive efforts are often required, since users have to review the overview (or detail) results of ad-hoc queries and adjust the queries (e.g., zoom or filter) continuously. Probabilistic topic models are often adopted as a solution to provide the overview for a given text collection, since it could discover the underlying thematic structures of unstructured text data. However, training a topic model for a selected document collection is time consuming. Moreover, frequent model retraining would be introduced by continuous query-adjusting, which leads to large amount of time wasting and therefore is unsuitable for online exploration. To remedy this problem, this paper presents STMS, an algorithm for constructing topic structures in document subsets efficiently. STMS accelerates the process of subset modeling by leveraging global precomputation and applying an efficient sampling-based inference algorithm. The experiments on real world datasets show that STMS achieves orders of magnitude speed-ups than standard topic model, while remaining comparable in terms of modeling quality.

database systems for advanced applications | 2017

An Adaptive Data Partitioning Scheme for Accelerating Exploratory Spark SQL Queries

Chenghao Guo; Zhigang Wu; Zhenying He; X. Sean Wang

For data analysis, it’s useful to explore the data set with a sequence of queries, frequently using the results from the previous queries to shape the next queries. Thus, data used in the previous queries are often reused, at least in part, in the next queries. This fact may be used to accelerate queries with data partitioning, a widely used technique that enables skipping the irrelevant data for better I/O performance. For getting effective partitions which are likely to cover the query workload in the future, we propose an adaptive partitioning scheme, combining the data-driven metrics and user-driven metrics to guide the data partitioning as well as a heuristic model using the metric plugin system to support different exploratory patterns. For partition storage and management, we propose an effective partition index structure for quickly searching for appropriate partitions to answer queries. The system is quite helpful in improving the performance of exploratory queries.

Explore More