Victor Cheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Victor Cheng is active.

Explore More

Publication

Featured researches published by Victor Cheng.

Pattern Recognition | 2004

Dissimilarity learning for nominal data

Victor Cheng; Chun-hung Li; James Tin-Yau Kwok; Chi-Kwong Li

Defining a good distance (dissimilarity) measure between patterns is of crucial importance in many classification and clustering algorithms. While a lot of work has been performed on continuous attributes, nominal attributes are more difficult to handle. A popular approach is to use the value difference metric (VDM) to define a real-valued distance measure on nominal values. However, VDM treats the attributes separately and ignores any possible interactions among attributes. In this paper, we propose the use of adaptive dissimilarity matrices for measuring the dissimilarities between nominal values. These matrices are learned via optimizing an error function on the training samples. Experimental results show that this approach leads to better classification performance. Moreover, it also allows easier interpretation of (dis)similarity between different nominal values.

web intelligence | 2006

Personalized Spam Filtering with Semi-supervised Classifier Ensemble

Victor Cheng; Chun Hung Li

The proliferation of unsolicited emails, also known as spam, poses significant burden to email users worldwide. Recent researches on spam filtering have shown that high accuracies can be obtained if labeled emails examples are available from the particular user of the spam filter. However, the time consuming process of providing personalized labeled training examples is often inconvenient or impossible due to privacy issues. In this paper, a semi-supervised personalized spam filter based on classifier ensemble is proposed that classifies users emails accurately by learning on both generic labeled emails and personalized unlabeled emails. The proposed multi-stage classification process begins learning a SVM model from labeled generic data. Unlabeled users emails are then fed to this SVM to generate personalized labeled data for constructing personalized naive Bayes classifiers. Furthermore, some personalized labeled examples are generated by exploiting rare word distributions and then fed into a semi-supervised classifier. The multi-stage results are integrated with SVMs learned from generic labeled emails to produce the final classification results. Experimental results show that the proposed approaches can significantly increases the classification accuracy in spam filtering

IEEE Transactions on Knowledge and Data Engineering | 2014

Probabilistic Aspect Mining Model for Drug Reviews

Victor Cheng; Clement H. C. Leung; Jiming Liu; Alfredo Milani

Recent findings show that online reviews, blogs, and discussion forums on chronic diseases and drugs are becoming important supporting resources for patients. Extracting information from these substantial bodies of texts is useful and challenging. We developed a generative probabilistic aspect mining model (PAMM) for identifying the aspects/topics relating to class labels or categorical meta-information of a corpus. Unlike many other unsupervised approaches or supervised approaches, PAMM has a unique feature in that it focuses on finding aspects relating to one class only rather than finding aspects for all classes simultaneously in each execution. This reduces the chance of having aspects formed from mixing concepts of different classes; hence the identified aspects are easier to be interpreted by people. The aspects found also have the property that they are class distinguishing: They can be used to distinguish a class from other classes. An efficient EM-algorithm is developed for parameter estimation. Experimental results on reviews of four different drugs show that PAMM is able to find better aspects than other common approaches, when measured with mean pointwise mutual information and classification accuracy. In addition, the derived aspects were also assessed by humans based on different specified perspectives, and PAMM was found to be rated highest.

signal-image technology and internet-based systems | 2007

Topic Detection via Participation Using Markov Logic Network

Victor Cheng; Chun-hung Li

The advent of Web 2.0 enables the proliferation of online communities in which tremendous number of Internet users contribute and share enormous information. Proper exploitation of community structure help retrieving useful information and better understanding of their features. We employ Markov Logic Network to explore topic tracking by finding clusters, which represents latent topics, best fitting a set of rules. Rather than using contents in investigating discussions of a community, the user participation is used because it is believed that topics can be somehow reflected by the preferences of participation. User participation is also easier to process than text. The clustering results show this approach can reveal latent topics of a community effectively.

web intelligence | 2008

Linked Topic and Interest Model for Web Forums

Victor Cheng; Chun-hung Li

In Web forum analysis, both the discussion topics and author interests are greatly concerned. We introduce a linked topic and interest model based on latent Dirichlet allocation (LDA) to explore discussion topics and author interests. Rather than having two separate models or modeling combined topics and interests with just one hidden topic assignment variable, the proposed model has separate but linked hidden variables for topic and interest exploration. As exact model parameter inference is intractable, Gibbs sampling is employed to estimate topic, author, and interest distributions. The joint distribution of the linked hidden variables also provides an interpretation of an interest in terms of weighted topics or vice versa. We apply the model to a NIPS data set and a corpus containing text contents of a popular digital camera Web forum. Topics and interests discovered by using the model is demonstrated. The model generalization capability is also assessed by means of perplexity and the results show that the linked topic and interest model has performance exceeding that of LDA document topic model and author topic model.

international conference on natural computation | 2006

Classification of online discussions via content and participation

Victor Cheng; Chi-sum Yeung; Chun Hung Li

Web forums and online communities are becoming increasingly important information sources. While there are significant research works on classification of Web pages, relatively less is known on classification of online discussions. By observing the special nature of user participation in web communities, we propose classification methods based on user participation and text content of online discussions. Support vector machines have been employed in this study to classify and analyze discussions based on content and participation. It is found that the accuracy of using participation as classification features can be very high in certain communities. The use of high-dimensional classifier can be effective in enhancing retrieval and classification of online discussion topics.

knowledge discovery and data mining | 2007

Combining supervised and semi-supervised classifier for personalized spam filtering

Victor Cheng; Chun-hung Li

This paper addresses the problem of spam filtering for individual email user under the condition that only public domain labeled emails given as the training data and all emails from the users email inbox are unlabeled. Owing to the difference of wordings and distribution of emails, conventional supervised classifier such as SVM cannot produce accurate result because it assumes the training and the testing data come from the same source and have the same distribution. We model these discrepancies as variation of decision hyperplane and come up with a criterion for selecting reliable emails with classified labels which are likely to be agreed by the user. A semi-supervised classifier then uses these emails as the training set and propagates the label information to other unlabeled emails by exploiting the distribution of them in feature space. Experimental result shows that this combined classifier strategy can classify emails for individual user with high accuracy.

international conference on data mining | 2014

Medical Error Prevention Based on Path Integration System Approach

Sheung Wai Chan; Clement H. C. Leung; Victor Cheng; Jiming Liu

Recent findings show that medical errors are prevalent and lead to many unnecessary iatrogenic deaths and injuries. Medical error studies using different approaches such as person approach or system approach enable clinicians to have a better understanding and valuable insight into error prevention through employing guidelines, standardized procedures, and devices, etc. A novel approach is proposed, known as the Path Integration System Approach (PISA), based on information technology (IT), in the design of health systems processes to reduce adverse events. Unlike the person approach or the system approach, which basically addresses error-prone procedures and situations by building more error barriers or changing human behaviour, PISA is concerned with re-constructing the medical procedure paths and system operations to lower the association between clinical staff and medical errors, through the judicious deployment of Information and Communication Technology. It is shown that PISA has the potential to achieve medical error reduction in excess of 70%. Examples and guidelines are given to illustrate the application of the approach. The paper proposes an integrated approach which transforms the effective deployment of IT systems to focus on integration and communication between subsystems. Through the adoption of PISA, such integration maybe achieved to the benefit of different groups of stakeholders in medical error prevention.

knowledge discovery and data mining | 2011

Classification probabilistic PCA with application in domain adaptation

Victor Cheng; Chun-hung Li

Conventional dimensionality reduction algorithms such as principle component analysis (PCA) and non-negative matrix factorization (NMF) are unsupervised. Supervised probabilistic PCA (SPPCA) can utilize label information. However, this information is usually treated as regression targets rather than discrete nominal labels. We propose a classification probabilistic PCA (CPPCA) which is an extension of probabilistic PCA. Unlike SPPCA, the label class information is turned into a class probabilistic function by using a sigmoidal function. As the posterior distribution of latent variables are non-Gaussian, we use Laplace approximation with Expectation Maximization (EM) to obtain the solution. The formulation is applied to a domain adaptation classification problem where the labeled training data and unlabeled test data come from different but related domains. Experimental results show that the proposed model has accuracy over conventional probabilistic PCA, SPPCA and its semi-supervised version. It has similar performance when compared with popular dedicated algorithms for domain adaptation, the structural correspondence learning (SCL) and its variants.

international conference on wavelet analysis and pattern recognition | 2008

Large margin maximum entropy machines for classifier combination

Zhili Wu; Chun-hung Li; Victor Cheng

Majority voting in classifier combination treats all base classifiers equally without considering their performance differences. By analyzing the constraints imposed by the margins of an ensemble classifier, a set of weights can be computed to give better prediction than the majority voting. We propose a regularized classifier combination strategy that maximize the entropy of probability weights assigned to base classifiers subjected to the margin constraints of the ensemble classifier. Furthermore, we show that a sparse solution with a set of support vectors for ensemble classifier can be obtained.

Explore More