Weihong Han
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Weihong Han.
international symposium on intelligence computation and applications | 2007
Le Wang; Yan Jia; Weihong Han
Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass-communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR-KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversations vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR-KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR-KMeans outperforms the traditional k-means and bisecting k-means algorithms.
Computer Engineering and Science | 2007
Le Wang; Li Tian; Yan Jia; Weihong Han
In order to conquer the major challenges of current web document clustering, i.e. huge volume of documents, high dimensional process and understandability of the cluster, we propose a simple hybrid algorithm (SHDC) based on top-k frequent term sets and k-means. Top-k frequent term sets are used to produce k initial means, which are regarded as initial clusters and further refined by k-means. The final optimal clustering is returned by k-means while the understandable description of clustering is provided by k frequent term sets. Experimental results on two public datasets indicate that SHDC outperforms other two representative clustering algorithms (the farthest first k-means and random initial k-means) both on efficiency and effectiveness.
International Conference on Trustworthy Computing and Services | 2012
Fei Liu; Yan Jia; Weihong Han
We proposed a new k-anonymity algorithm to publish datasets with privacy protection. We improved clustering techniquesto lower data distort and enhance diversity of sensitive attributes values. Our algorithm includes four phases. Tuples are distributed to several groups in phase one. Tuples in a group own same sensitive value. In phase two, groups smaller than the threshold merge and then they are partitioned into several clusters according to quasi-identifier attributes. Each cluster would become an equivalence class. In phase three, remainder tuples are distributed to clusters evenly to satisfy L-diversity. Finally, quasi-identifier attributes values in each cluster are generalized to satisfy k-anonymity. We used OCC dataset to compare our algorithm with classic method based on clustering. Empirical results showed that our algorithm could be used to publish datasets with high security and limited information loss.
Journal of Computational Science | 2018
Yong Quan; Yan Jia; Bin Zhou; Weihong Han; Shudong Li
Abstract Nowadays, information can spread rapidly over social networks via the relationships and interactions among people. To reveal the underlying intricate mechanism of information propagation, the problem of repost behavior prediction has recently drawn extensive attention. In this paper, we propose a novel method to measure time-sensitive mutual influence based on temporal behavior patterns of users and develop an efficient algorithm to calculate it via a discretization method. To predict repost behavior more accurately, we introduce another two features of user interests and information content, and respectively design the effective measurements of them to capture their predictive power. We further combine time-sensitive mutual influence with the two features into a feature-based method and evaluate the performance of the proposed method on repost prediction. Finally, extensive experiments have been conducted on a real large-scale microblogging dataset. The experimental results demonstrate that our method can achieve better performance compared to several baseline methods.
asia-pacific web conference | 2014
Fei Liu; Hong Yin; Weihong Han
Focusing on outlier detection in uncertain datasets, we combine distance-based outlier detection techniques with classic uncertainty models. Both variety of data’s value and incompleteness of data’s probability distribution are considered. In our research, all data objects in an uncertain dataset are described using x-tuple model with their respective probabilities. We find that outliers in uncertain datasets are probabilistic. Neighbors of a data object are different in distinct possible worlds. Based on possible world and x-tuple models, we propose a new definition of top K relative outliers and the RPOS algorithm. In RPOS algorithm, all data objects are compared with each other to find the most probable outliers. Two pruning strategies are utilized to improve efficiency. Besides that we construct some data structures for acceleration. We evaluate our research in both synthetic and real datasets. Experimental results demonstrate that our method can detect outliers more effectively than existing algorithms in uncertain environment. Our method is also in superior efficiency.
asia-pacific web conference | 2014
Lu Deng; Weihong Han; Aiping Li; Yong Quan
As the fast improvement of Internet, so do the data and information based on virtual identity. There are so many works on the data storage, whose main ideas are to store data in a distributed environment. And then a new issue is coming: how to decide the position of data efficiently. Because of the characteristics of the virtual identity data, they have their special pattern. In this paper, a partitioning model based on the virtual identity data is proposed. As an example, the Cassandra database is adopted to describe the model. The experiments are taken to test the feasibility of the model and the results show that the model can reduce the retrieval time efficiently.
asia-pacific web conference | 2007
Le Wang; Li Tian; Yan Jia; Weihong Han
Archive | 2012
Shuang Tan; Li He; Yan Jia; Binxing Fang; Lihua Yin; Weihong Han; Bin Zhou; Hua Fan; Wenmao Liu; Haining Yu; Juan Chen
Archive | 2012
Hua Fan; Li He; Yan Jia; Binxing Fang; Lihua Yin; Weihong Han; Shuqiang Yang; Bin Zhou; Shuang Tan; Jianfeng Zhang; Juan Chen
Archive | 2012
Li He; Shuang Tan; Yan Jia; Binxing Fang; Lihua Yin; Weihong Han; Bin Zhou; Hua Fan; Wenmao Liu; Haining Yu; Juan Chen