Is this you? Create Your Porfile

Shing-Kit Chan

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shing-Kit Chan is active.

Explore More

Publication

Featured researches published by Shing-Kit Chan.

ACM Transactions on Information Systems | 2007

Named entity translation matching and learning: With application for mining unseen translations

Wai Lam; Shing-Kit Chan; Ruizhang Huang

This article introduces a named entity matching model that makes use of both semantic and phonetic evidence. The matching of semantic and phonetic information is captured by a unified framework via a bipartite graph model. By considering various technical challenges of the problem, including order insensitivity and partial matching, this approach is less rigid than existing approaches and highly robust. One major component is a phonetic matching model which exploits similarity at the phoneme level. Two learning algorithms for learning the similarity information of basic phonemic matching units based on training examples are investigated. By applying the proposed named entity matching model, a mining system is developed for discovering new named entity translations from daily Web news. The system is able to discover new name translations that cannot be found in the existing bilingual dictionary.

knowledge discovery and data mining | 2006

Extracting and summarizing hot item features across different auction web sites

Tak-Lam Wong; Wai Lam; Shing-Kit Chan

Online auction Web sites are fast changing and highly dynamic. It is difficult to digest the poorly organized and vast amount of information contained in the auction sites. We develop a unified framework aiming at automatically extracting the product features and summarizing the hot item features across different auction Web sites. One challenge of this problem is to extract useful information from the product descriptions provided by the sellers, which vary largely in the layout format. We formulate the problem as a single graph labeling problem using conditional random fields which can model the relationship among the neighbouring tokens in a Web page, the tokens from different pages, as well as various information such as the hot item features across different auction sites. We have conducted extensive experiments from several real-world auction Web sites to demonstrate the effectiveness of our framework.

international conference on data mining | 2010

Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data

Shing-Kit Chan; Wai Lam

Cascaded approach has been used for a long time to conduct sub-tasks in order to accomplish a major task. We put cascaded approach in a probabilistic framework and analyze possible reasons for cascaded errors. To reduce the occurrence of cascaded errors, we need to add a constraint when performing joint training. We suggest a pseudo Conditional Random Field (pseudo-CRF) approach that models two sub-tasks as two Conditional Random Fields (CRFs). We then present the formulation in the context of a linear chain CRF for solving problems on sequence data. In conducting joint training for a pseudo-CRF, we reuse all existing well-developed efficient inference algorithms for a linear chain CRF, which would otherwise require the use of approximate inference algorithms or simulations that involve long computational time. Our experimental results show an interesting fact that a jointly trained CRF model in a pseudo-CRF may perform worse than a separately trained CRF on a sub-task. However the overall system performance of a pseudo-CRF would outperform that of a cascaded approach. We implement the implicit constraint in the form of a soft constraint such that users can define the penalty cost for violating the constraint. In order to work on large-scale datasets, we further suggest a parallel implementation of the pseudo-CRF approach, which can be implemented on a multi-core CPU or GPU on a graphics card that supports multi-threading. Our experimental results show that it can achieve a 12 times increase in speedup.

knowledge discovery and data mining | 2009

An Efficient Method for Generating, Storing and Matching Features for Text Mining

Shing-Kit Chan; Wai Lam

Log-linear models have been widely used in text mining tasks because it can incorporate a large number of possibly correlated features. In text mining, these possibly correlated features are generated by conjunction of features. They are usually used with log-linear models to estimate robust conditional distributions. To avoid manual construction of conjunction of features, we propose a new algorithmic framework called F-tree for automatically generating and storing conjunctions of features in text mining tasks. This compact graph-based data structure allows fast one-vs-all matching of features in the feature space which is crucial for many text mining tasks. Based on this hierarchical data structure, we propose a systematic method for removing redundant features to further reduce memory usage and improve performance. We do large-scale experiments on three publicly-available datasets and show that this automatic method can get state-of-the-art performance achieved by manual construction of features.

international joint conference on natural language processing | 2008

Chinese NER Using CRFs and Logic for the Fourth SIGHAN Bakeoff.

Xiaofeng Yu; Wai Lam; Shing-Kit Chan; Yiu Kei Wu; Bo Chen

international conference on data mining | 2007

A Cascaded Approach to Biomedical Named Entity Recognition Using a Unified Model

Shing-Kit Chan; Wai Lam; Xiaofeng Yu

international joint conference on natural language processing | 2008

A Framework Based on Graphical Models with Logic for Chinese Named Entity Recognition

Xiaofeng Yu; Wai Lam; Shing-Kit Chan

bioinformatics and bioengineering | 2007

Efficient Methods for Biomedical Named Entity Recognition

Shing-Kit Chan; Wai Lam

siam international conference on data mining | 2006

Collaborative Information Extraction and Mining from Multiple Web Documents.

Tak-Lam Wong; Wai Lam; Shing-Kit Chan

international joint conference on natural language processing | 2008

An Online Cascaded Approach to Biomedical Named Entity Recognition.

Shing-Kit Chan; Wai Lam; Xiaofeng Yu

Explore More

Collaboration

Dive into the Shing-Kit Chan's collaboration.

Top Co-Authors

Wai Lam

The Chinese University of Hong Kong

View shared research outputs

Top Co-Authors

Xiaofeng Yu

The Chinese University of Hong Kong

View shared research outputs

Top Co-Authors

Tak-Lam Wong

University of Hong Kong

View shared research outputs

Top Co-Authors

Bo Chen

The Chinese University of Hong Kong

View shared research outputs

Top Co-Authors

Ruizhang Huang

The Chinese University of Hong Kong

View shared research outputs

Top Co-Authors

Yiu Kei Wu

The Chinese University of Hong Kong

View shared research outputs

Explore More