Zongda Wu
Wenzhou University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zongda Wu.
Knowledge and Information Systems | 2015
Guandong Xu; Zongda Wu; Guiling Li; Enhong Chen
As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial ads within the content of a Web page, to provide a better user experience and as a result increase the user’s ad-click rate. However, due to the intrinsic problems of homonymy and polysemy, the low intersection of keywords, and a lack of sufficient semantics, traditional keyword matching techniques are not able to effectively handle contextual matching and retrieve relevant ads for the user, resulting in an unsatisfactory performance in ad selection. In this paper, we introduce a new contextual advertising approach to overcome these problems, which uses Wikipedia thesaurus knowledge to enrich the semantic expression of a target page (or an ad). First, we map each page into a keyword vector, upon which two additional feature vectors, the Wikipedia concept and category vector derived from the Wikipedia thesaurus structure, are then constructed. Second, to determine the relevant ads for a given page, we propose a linear similarity fusion mechanism, which combines the above three feature vectors in a unified manner. Last, we validate our approach using a set of real ads, real pages along with the external Wikipedia thesaurus. The experimental results show that our approach outperforms the conventional contextual advertising matching approaches and can substantially improve the performance of ad selection.
Knowledge Based Systems | 2013
Guiling Li; Olli Bräysy; Liangxiao Jiang; Zongda Wu; Yuanzhen Wang
The problem of finding time series discord has attracted much attention recently due to its numerous applications and several algorithms have been suggested. However, most of them suffer from high computation cost and cannot satisfy the requirement of real applications. In this paper, we propose a novel discord discovery algorithm BitClusterDiscord which is based on bit representation clustering. Firstly, we use PAA (Piecewise Aggregate Approximation) bit serialization to segment time series, so as to capture the main variation characteristic of time series and avoid the influence of noise. Secondly, we present an improved K-Medoids clustering algorithm to merge several patterns with similar variation behaviors into a common cluster. Finally, based on bit representation clustering, we design two pruning strategies and propose an effective algorithm for time series discord discovery. Extensive experiments have demonstrated that the proposed approach can not only effectively find discord of time series, but also greatly improve the computational efficiency.
The Computer Journal | 2012
Zongda Wu; Guandong Xu; Yanchun Zhang; Peter Dolog; Chenglang Lu
The current boom of the Web is associated with the revenues originated from Web advertising. As one prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial textual ads within the content of a Web page, so as to provide a better user experience and thereby increase the revenues of Web site owners and an advertising platform. Therefore, in contextual advertising, the relevance of selected ads with a Web page is essential. However, some problems, such as homonymy and polysemy, low intersection of keywords and context mismatch, can lead to the selection of irrelevant textual ads for a Web page, making that a simple keyword matching technique generally gives poor accuracy. To overcome these problems and thus to improve the relevance of contextual ads, in this paper we propose a novel Wikipedia-based matching technique which, using selective matching strategies, selects a certain amount of relevant articles from Wikipedia as an intermediate semantic reference model for matching Web pages and textual ads. We call this technique SIWI: Selective Wikipedia Matching, which, instead of using the whole Wikipedia articles, only matches the most relevant articles for a page (or a textual ad), resulting in the effective improvement of the overall matching performance. An experimental evaluation is conducted, which runs over a set of real textual ads, a set of Web pages from the Internet and a dataset of more than 260Â 000 articles from Wikipedia. The experimental results show that our method performs better than existing matching strategies, which can deal with the matching over the large dataset of Wikipedia articles efficiently, and achieve a satisfactory contextual advertising effect.
conference on information and knowledge management | 2011
Zongda Wu; Guandong Xu; Rong Pan; Yanchun Zhang; Zhiwen Hu; Jianfeng Lu
As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant ads into a Web page, so as to increase the number of ad-clicks. However, some problems of homonymy and polysemy, low intersection of keywords etc., can lead to the selection of irrelevant ads for a page. In this paper, we present a new contextual advertising approach to overcome the problems, which uses Wikipedia concept and category information to enrich the content representation of an ad (or a page). First, we map each ad and page into a keyword vector, a concept vector and a category vector. Next, we select the relevant ads for a given page based on a similarity metric that combines the above three feature vectors together. Last, we evaluate our approach by using real ads, pages, as well as a great number of concepts and categories of Wikipedia. Experimental results show that our approach can improve the precision of ads-selection effectively.
Information Sciences | 2015
Zongda Wu; Jie Shi; Chenglang Lu; Enhong Chen; Guandong Xu; Guiling Li; Sihong Xie; Philip S. Yu
Users of web search engines are increasingly worried that their query activities may expose what topics they are interested in, and in turn, compromise their privacy. It would be desirable for a search engine to protect the true query intention for users without compromising the precision-recall performance. In this paper, we propose a client-based approach to address this problem. The basic idea is to issue plausible but innocuous pseudo queries together with a user query, so as to mask the user intention. First, we present a privacy model which formulates plausibility and innocuousness, and then the requirements which should be satisfied to ensure that the user intention is protected against a search engine effectively. Second, based on a semantic reference space derived from Wikipedia, we propose an approach to construct a group of pseudo queries that exhibit similar characteristic distribution as a given user query, but point to irrelevant topics, so as to meet the security requirements defined by the privacy model. Finally, we conduct extensive experimental evaluations to demonstrate the practicality and effectiveness of our approach.
Expert Systems With Applications | 2014
Guiling Li; Zhihua Cai; Xiaojun Kang; Zongda Wu; Yuanzhen Wang
Streaming time series segmentation is one of the major problems in streaming time series mining, which can create the high-level representation of streaming time series, and thus can provide important supports for many time series mining tasks, such as indexing, clustering, classification, and discord discovery. However, the data elements in streaming time series, which usually arrive online, are fast-changing and unbounded in size, consequently, leading to a higher requirement for the computing efficiency of time series segmentation. Thus, it is a challenging task how to segment streaming time series accurately under the constraint of computing efficiency. In this paper, we propose exponential smoothing prediction-based segmentation algorithm (ESPSA). The proposed algorithm is developed based on a sliding window model, and uses the typical exponential smoothing method to calculate the smoothing value of arrived data element of streaming time series as the prediction value of the future data. Besides, to determine whether a data element is a segmenting key point, we study the statistical characteristics of the prediction error and then deduce the relationship between the prediction error and the compression rate. The extensive experiments on both synthetic and real datasets demonstrate that the proposed algorithm can segment streaming time series effectively and efficiently. More importantly, compared with candidate algorithms, the proposed algorithm can reduce the computing time by orders of magnitude.
Neurocomputing | 2013
Zongda Wu; Guandong Xu; Chenglang Lu; Enhong Chen; Yanchun Zhang; Hong Zhang
Web advertising, a form of online advertising, which uses the Internet as a medium to post product or service information and attract customers, has become one of the most important marketing channels. As one prevalent type of web advertising, contextual advertising refers to the placement of the most relevant ads at appropriate positions of a web page, so as to provide a better user experience and increase the users ad-click rate. However, most existing contextual advertising techniques only take into account how to select as relevant ads for a given page as possible, without considering the positional effect of the ad placement on the page, resulting in an unsatisfactory performance in ad local context relevance. In this paper, we address the novel problem of position-wise contextual advertising, i.e., how to select and place relevant ads properly for a target web page. In our proposed approach, the relevant ads are selected based on not only global context relevance but also local context relevance, so that the embedded ads yield contextual relevance to both the whole target page and the insertion positions where the ads are placed. In addition, to improve the accuracy of global and local context relevance measure, the rich wikipedia knowledge is used to enhance the semantic feature representation of pages and ad candidates. Last, we evaluate our approach using a set of ads and pages downloaded from the Internet, and demonstrate the effectiveness of our approach.
intelligent information systems | 2015
Guandong Xu; Yu Zong; Ping Jin; Rong Pan; Zongda Wu
In the social annotation systems, users annotate digital data sources by using tags which are freely chosen textual descriptions. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major challenge of most social annotation systems resulting from several problems of ambiguity, redundancy and less semantic nature of tags. Clustering is a useful tool to handle these problems in social annotation systems. In this paper, we propose a novel tag clustering algorithm based on kernel information propagation. This approach makes use of the kernel density estimation of the kNN neighborhood directed graph as a start to reveal the prestige rank of tags in tagging data. The random walk with restart algorithm is then employed to determine the center points of tag clusters. The main strength of the proposed approach is the capability of partitioning tags from the perspective of tag prestige rank rather than the intuitive similarity calculation itself. Experimental studies on the six real world data sets demonstrate the effectiveness and superiority of the proposed method against other state-of-the-art clustering approaches in terms of various evaluation metrics.
IEEE Transactions on Services Computing | 2018
Zongda Wu; Guiling Li; Qi Liu; Guandong Xu; Enhong Chen
Personalized recommendation has demonstrated its effectiveness in improving the problem of information overload on the Internet. However, evidences show that due to the concerns of personal privacy, users’ reluctance to disclose their personal information has become a major barrier for the development of personalized recommendation. In this paper, we propose to generate a group of fake preference profiles, so as to cover up the user sensitive subjects, and thus protect user personal privacy in personalized recommendation. First, we present a client-based framework for user privacy protection, which requires not only no change to existing recommendation algorithms, but also no compromise to the recommendation accuracy. Second, based on the framework, we introduce a privacy protection model, which formulates the two requirements that ideal fake preference profiles should satisfy: (1) the similarity of feature distribution, which measures the effectiveness of fake preference profiles to hide a genuine user preference profile; and (2) the exposure degree of sensitive subjects, which measures the effectiveness of fake preference profiles to cover up the sensitive subjects. Finally, based on a subject repository of product classification, we present an implementation algorithm to well meet the privacy protection model. Both theoretical analysis and experimental evaluation demonstrate the effectiveness of our proposed approach.
Expert Systems With Applications | 2017
Zongda Wu; Li Lei; Guiling Li; Hui Huang; Chengren Zheng; Enhong Chen; Guandong Xu
A topic model based approach for novel summarization is proposed.An importance evaluation function for sentence candidates is designed.A summary smoothing approach is presented to improve the summary readability. Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality.