Is this you? Create Your Porfile

Weihong Han

National University of Defense Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weihong Han is active.

Explore More

Publication

Featured researches published by Weihong Han.

international symposium on intelligence computation and applications | 2007

Instant message clustering based on extended vector space model

Le Wang; Yan Jia; Weihong Han

Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale mass-communication media, clustering on its text content is a practical method to analyze the characteristic of text content in instant messages, and find or track the social hot topics. However, key words in one instant message usually are few, even latent; moreover, single message can not describe the conversational context. This is very different from general document and makes common clustering algorithms unsuitable. A novel method called WR-KMeans is proposed, which synthesizes related instant messages as a conversation and enriches conversations vector by words which are not included in this conversation but are closely related with existing words in this conversation. WR-KMeans performs clustering like k-means on this extended vector space of conversations. Experiments on the public datasets show that WR-KMeans outperforms the traditional k-means and bisecting k-means algorithms.

Computer Engineering and Science | 2007

A Hybrid Algorithm for Web Document Clustering Based on Frequent Term Sets and k-Means

Le Wang; Li Tian; Yan Jia; Weihong Han

In order to conquer the major challenges of current web document clustering, i.e. huge volume of documents, high dimensional process and understandability of the cluster, we propose a simple hybrid algorithm (SHDC) based on top-k frequent term sets and k-means. Top-k frequent term sets are used to produce k initial means, which are regarded as initial clusters and further refined by k-means. The final optimal clustering is returned by k-means while the understandable description of clustering is provided by k frequent term sets. Experimental results on two public datasets indicate that SHDC outperforms other two representative clustering algorithms (the farthest first k-means and random initial k-means) both on efficiency and effectiveness.

International Conference on Trustworthy Computing and Services | 2012

A Multi-phase k-anonymity Algorithm Based on Clustering Techniques

Fei Liu; Yan Jia; Weihong Han

We proposed a new k-anonymity algorithm to publish datasets with privacy protection. We improved clustering techniquesto lower data distort and enhance diversity of sensitive attributes values. Our algorithm includes four phases. Tuples are distributed to several groups in phase one. Tuples in a group own same sensitive value. In phase two, groups smaller than the threshold merge and then they are partitioned into several clusters according to quasi-identifier attributes. Each cluster would become an equivalence class. In phase three, remainder tuples are distributed to clusters evenly to satisfy L-diversity. Finally, quasi-identifier attributes values in each cluster are generalized to satisfy k-anonymity. We used OCC dataset to compare our algorithm with classic method based on clustering. Empirical results showed that our algorithm could be used to publish datasets with high security and limited information loss.

Journal of Computational Science | 2018

Repost prediction incorporating time-sensitive mutual influence in social networks

Yong Quan; Yan Jia; Bin Zhou; Weihong Han; Shudong Li

Abstract Nowadays, information can spread rapidly over social networks via the relationships and interactions among people. To reveal the underlying intricate mechanism of information propagation, the problem of repost behavior prediction has recently drawn extensive attention. In this paper, we propose a novel method to measure time-sensitive mutual influence based on temporal behavior patterns of users and develop an efficient algorithm to calculate it via a discretization method. To predict repost behavior more accurately, we introduce another two features of user interests and information content, and respectively design the effective measurements of them to capture their predictive power. We further combine time-sensitive mutual influence with the two features into a feature-based method and evaluate the performance of the proposed method on repost prediction. Finally, extensive experiments have been conducted on a real large-scale microblogging dataset. The experimental results demonstrate that our method can achieve better performance compared to several baseline methods.

asia-pacific web conference | 2014

A Top K Relative Outlier Detection Algorithm in Uncertain Datasets

Fei Liu; Hong Yin; Weihong Han

Focusing on outlier detection in uncertain datasets, we combine distance-based outlier detection techniques with classic uncertainty models. Both variety of data’s value and incompleteness of data’s probability distribution are considered. In our research, all data objects in an uncertain dataset are described using x-tuple model with their respective probabilities. We find that outliers in uncertain datasets are probabilistic. Neighbors of a data object are different in distinct possible worlds. Based on possible world and x-tuple models, we propose a new definition of top K relative outliers and the RPOS algorithm. In RPOS algorithm, all data objects are compared with each other to find the most probable outliers. Two pruning strategies are utilized to improve efficiency. Besides that we construct some data structures for acceleration. We evaluate our research in both synthetic and real datasets. Experimental results demonstrate that our method can detect outliers more effectively than existing algorithms in uncertain environment. Our method is also in superior efficiency.

asia-pacific web conference | 2014

The Research of Partitioning Model Based on Virtual Identity Data

Lu Deng; Weihong Han; Aiping Li; Yong Quan

As the fast improvement of Internet, so do the data and information based on virtual identity. There are so many works on the data storage, whose main ideas are to store data in a distributed environment. And then a new issue is coming: how to decide the position of data efficiently. Because of the characteristics of the virtual identity data, they have their special pattern. In this paper, a partitioning model based on the virtual identity data is proposed. As an example, the Cassandra database is adopted to describe the model. The experiments are taken to test the feasibility of the model and the results show that the model can reduce the retrieval time efficiently.

asia-pacific web conference | 2007