Tak-Lam Wong
University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tak-Lam Wong.
international acm sigir conference on research and development in information retrieval | 2008
Tak-Lam Wong; Wai Lam; Tik-Shun Wong
We have developed an unsupervised framework for simultaneously extracting and normalizing attributes of products from multiple Web pages originated from different sites. Our framework is designed based on a probabilistic graphical model that can model the page-independent content information and the page-dependent layout information of the text fragments in Web pages. One characteristic of our framework is that previously unseen attributes can be discovered from the clue contained in the layout format of the text fragments. Our framework tackles both extraction and normalization tasks by jointly considering the relationship between the content and layout information. Dirichlet process prior is employed leading to another advantage that the number of discovered product attributes is unlimited. An unsupervised inference algorithm based on variational method is presented. The semantics of the normalized attributes can be visualized by examining the term weights in the model. Our framework can be applied to a wide range of Web mining applications such as product matching and retrieval. We have conducted extensive experiments from four different domains consisting of over 300 Web pages from over 150 different Web sites, demonstrating the robustness and effectiveness of our framework.
knowledge discovery and data mining | 2009
Bo Chen; Wai Lam; Ivor W. Tsang; Tak-Lam Wong
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions. This poses a great difficulty for many statistical learning methods. However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation. Consequently a good feature representation can encode this concept space and minimize the distribution gap. To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain. Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously. We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method.
Information Processing and Management | 2016
Haoran Xie; Xiaodong Li; Tao Wang; Raymond Y. K. Lau; Tak-Lam Wong; Li Chen; Fu Lee Wang; Qing Li
We present a framework SenticRank to incorporate sentiment for personalized search.Content-based and collaborative sentiment ranking methods are proposed.We compare the proposed sentiment-based search with baselines experimentally.We study the influence of sentiment corpora by using some sentiment dictionaries.Sentiment-based information can significantly improve performance in folksonomy. In recent years, there has been a rapid growth of user-generated data in collaborative tagging (a.k.a. folksonomy-based) systems due to the prevailing of Web 2.0 communities. To effectively assist users to find their desired resources, it is critical to understand user behaviors and preferences. Tag-based profile techniques, which model users and resources by a vector of relevant tags, are widely employed in folksonomy-based systems. This is mainly because that personalized search and recommendations can be facilitated by measuring relevance between user profiles and resource profiles. However, conventional measurements neglect the sentiment aspect of user-generated tags. In fact, tags can be very emotional and subjective, as users usually express their perceptions and feelings about the resources by tags. Therefore, it is necessary to take sentiment relevance into account into measurements. In this paper, we present a novel generic framework SenticRank to incorporate various sentiment information to various sentiment-based information for personalized search by user profiles and resource profiles. In this framework, content-based sentiment ranking and collaborative sentiment ranking methods are proposed to obtain sentiment-based personalized ranking. To the best of our knowledge, this is the first work of integrating sentiment information to address the problem of the personalized tag-based search in collaborative tagging systems. Moreover, we compare the proposed sentiment-based personalized search with baselines in the experiments, the results of which have verified the effectiveness of the proposed framework. In addition, we study the influences by popular sentiment dictionaries, and SenticNet is the most prominent knowledge base to boost the performance of personalized search in folksonomy.
IEEE Transactions on Knowledge and Data Engineering | 2010
Tak-Lam Wong; Wai Lam
This paper presents a Bayesian learning framework for adapting information extraction wrappers with new attribute discovery, reducing human effort in extracting precise information from unseen Web sites. Our approach aims at automatically adapting the information extraction knowledge previously learned from a source Web site to a new unseen site, at the same time, discovering previously unseen attributes. Two kinds of text-related clues from the source Web site are considered. The first kind of clue is obtained from the extraction pattern contained in the previously learned wrapper. The second kind of clue is derived from the previously extracted or collected items. A generative model for the generation of the site-independent content information and the site-dependent layout format of the text fragments related to attribute values contained in a Web page is designed to harness the uncertainty involved. Bayesian learning and expectation-maximization (EM) techniques are developed under the proposed generative model for identifying new training data for learning the new wrapper for new unseen sites. Previously unseen attributes together with their semantic labels can also be discovered via another EM-based Bayesian learning based on the generative model. We have conducted extensive experiments from more than 30 real-world Web sites in three different domains to demonstrate the effectiveness of our framework.
data and knowledge engineering | 2009
Tak-Lam Wong; Wai Lam
We develop an unsupervised learning framework which can jointly extract information and conduct feature mining from a set of Web pages across different sites. One characteristic of our model is that it allows tight interactions between the tasks of information extraction and feature mining. Decisions for both tasks can be made in a coherent manner leading to solutions which satisfy both tasks and eliminate potential conflicts at the same time. Our approach is based on an undirected graphical model which can model the interdependence between the text fragments within the same Web page, as well as text fragments in different Web pages. Web pages across different sites are considered simultaneously and hence information from different sources can be effectively leveraged. An approximate learning algorithm is developed to conduct inference over the graphical model to tackle the information extraction and feature mining tasks. We demonstrate the efficacy of our framework by applying it to two applications, namely, important product feature mining from vendor sites, and hot item feature mining from auction sites. Extensive experiments on real-world data have been conducted to demonstrate the effectiveness of our framework.
international conference on data mining | 2005
Tak-Lam Wong; Wai Lam
Online auction Web sites are fast changing, highly dynamic, and complex as they involve tremendous sellers and potential buyers, as well as a huge amount of items listed for bidding. We develop a two-phase framework which aims at mining and summarizing hot items from multiple auction Web sites to assist decision making. The objective of the first phase is to automatically extract the product features and product feature values of the items from the descriptions provided by the sellers. We design a HMM-based learning method to train an extended HMM model which can adapt to the unseen Web page from which the information is extracted. The goal of the second phase is to discover and summarize the hot items based on the extracted information. We formulate the hot item mining task as a semi-supervised learning problem and employ the graph mincuts algorithm to accomplish this task. The summary of the hot items is then generated by considering the frequency and the position of the product features being mentioned in the descriptions. We have conducted extensive experiments from several real-world auction Web sites to demonstrate the effectiveness of our framework.
IEEE MultiMedia | 2016
Haoran Xie; Di Zou; Raymond Y. K. Lau; Fu Lee Wang; Tak-Lam Wong
Compared to intentional word learning, incidental word learning better motivates learners, integrates development of more language skills, and provides richer contexts. The effectiveness of incidental word learning tasks can also be increased by employing materials that learners are more familiar with or interested in. Here, the authors present a framework to generate incidental word learning tasks via load-based profiles measured through the involvement load hypothesis, and topic-based profiles obtained from social media. They also conduct an experiment on real participants and find that the proposed framework promotes more effective and enjoyable word learning than intentional word learning. This article is part of a special issue on social media for learning.
web search and data mining | 2013
Lidong Bing; Wai Lam; Tak-Lam Wong
We develop a new framework to achieve the goal of Wikipedia entity expansion and attribute extraction from the Web. Our framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities. One characteristic of our framework is to conduct discovery and extraction from desirable semi-structured data record sets which are automatically collected from the Web. A semi-supervised learning model with Conditional Random Fields is developed to deal with the issues of extraction learning and limited number of labeled examples derived from the seed entities. We make use of a proximate record graph to guide the semi-supervised learning process. The graph captures alignment similarity among data records. Then the semi-supervised learning process can leverage the unlabeled data in the record set by controlling the label regularization under the guidance of the proximate record graph. Extensive experiments on different domains have been conducted to demonstrate its superiority for discovering new entities and extracting attribute content.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Bo Chen; Wai Lam; Ivor W. Tsang; Tak-Lam Wong
We propose a framework for adapting text mining models that discovers low-rank shared concept space. Our major characteristic of this concept space is that it explicitly minimizes the distribution gap between the source domain with sufficient labeled data and the target domain with only unlabeled data, while at the same time it minimizes the empirical loss on the labeled data in the source domain. Our method is capable of conducting the domain adaptation task both in the original feature space as well as in the transformed Reproducing Kernel Hilbert Space (RKHS) using kernel tricks. Theoretical analysis guarantees that the error of our adaptation model can be bounded with respect to the embedded distribution gap and the empirical loss in the source domain. We have conducted extensive experiments on two common text mining problems, namely, document classification and information extraction, to demonstrate the efficacy of our proposed framework.
international conference on data mining | 2004
Tak-Lam Wong; Wai Lam
We develop a probabilistic framework for adapting information extraction wrappers with new attribute discovery. Wrapper adaptation aims at automatically adapting a previously learned wrapper from the source Web site to a new unseen site for information extraction. One unique characteristic of our framework is that it can discover new or previously unseen attributes as well as headers from the new site. It is based on a generative model for the generation of text fragments related to attribute items and formatting data in a Web page. To solve the wrapper adaptation problem, we consider two kinds of information from the source Web site. The first kind of information is the extraction knowledge contained in the previously learned wrapper from the source Web site. The second kind of information is the previously extracted or collected items. We employ a Bayesian learning approach to automatically select a set of training examples for adapting a wrapper for the new unseen site. To solve the new attribute discovery problem, we develop a model which analyzes the surrounding text fragments of the attributes in the new unseen site. A Bayesian learning method is developed to discover the new attributes and their headers. EM technique is employed in both Bayesian learning models. We conducted extensive experiments from a number of real-world Web sites to demonstrate the effectiveness of our framework.