Is this you? Create Your Porfile

Kam-Fai Wong

The Chinese University of Hong Kong

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kam-Fai Wong is active.

Explore More

Publication

Featured researches published by Kam-Fai Wong.

ACM Transactions on Information Systems | 2008

Interpreting TF-IDF term weights as making relevance decisions

Ho Chung Wu; Robert W. P. Luk; Kam-Fai Wong; Kui Lam Kwok

A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity - log p(&rmacr;|t ∈ d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by &rmacr;) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.

asia pacific software engineering conference | 2000

Component-based software engineering: technologies, development frameworks, and quality assurance schemes

Xia Cai; Michael R. Lyu; Kam-Fai Wong; Roy Ko

Component-based software development approach is based on the idea to develop software systems by selecting appropriate off-the-shelf components and then to assemble them with a well-defined software architecture. Because the new software development paradigm is very different from the traditional approach, quality assurance (QA) for component-based software development is a new topic in the software engineering community. In this paper, we survey current component-based software technologies, describe their advantages and disadvantages, and discuss the features they inherit. We also address QA issues for component-based software. As a major contribution, we propose a QA model for component-based software which covers component requirement analysis, component development, component certification, component customization, and system architecture design, integration, testing and maintenance.

International Journal of Production Research | 1998

A TSP-based heuristic for forming machine groups and part families

Chun-Hung Cheng; Y.P. Gupta; W.H. Lee; Kam-Fai Wong

Cellular manufacturing has been proposed as a layout approach to improve manufacturing efficiency and productivity. In implementing cellular manufacturing, parts are grouped into part families based on their similarity in manufacturing, and machines are grouped into machine cells to reduce intercellular movement of parts. To model the cellular manufacturing problem, a machinepart incidence matrix is often used. A cell formation algorithm must produce machine cells and associated part families to minimize intercellular movement of parts. In this paper, the cell formation problem is formulated as a travelling salesman problem (TSP) and a solution methodology based on genetic algorithms (GAs) is proposed to solve the TSP-cell formation problem. The proposed algorithm is compared very favourably to a well-known algorithm available in the literature.

international conference on computational linguistics | 2008

Extractive Summarization Using Supervised and Semi-Supervised Learning

Kam-Fai Wong; Mingli Wu; Wenjie Li

It is difficult to identify sentence importance from a single point of view. In this paper, we propose a learning-based approach to combine various sentence features. They are categorized as surface, content, relevance and event features. Surface features are related to extrinsic aspects of a sentence. Content features measure a sentence based on content-conveying words. Event features represent sentences by events they contained. Relevance features evaluate a sentence from its relatedness with other sentences. Experiments show that the combined features improved summarization performance significantly. Although the evaluation results are encouraging, supervised learning approach requires much labeled data. Therefore we investigate co-training by combining labeled and unlabeled data. Experiments show that this semi-supervised learning approach achieves comparable performance to its supervised counterpart and saves about half of the labeling time cost.

international acm sigir conference on research and development in information retrieval | 2007

Cross-lingual query suggestion using query logs of different languages

Wei Gao; Cheng Niu; Jian-Yun Nie; Ming Zhou; Jian Hu; Kam-Fai Wong; Hsiao-Wuen Hon

Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-lingual query suggestion (CLQS): for a query in one language, we suggest similar or relevant queries in other languages. This is very important to scenarios of cross-language information retrieval (CLIR) and cross-lingual keyword bidding for search engine advertisement. Instead of relying on existing query translation technologies for CLQS, we present an effective means to map the input query of one language to queries of the other language in the query log. Important monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics, etc. are used to estimate the cross-lingual query similarity with a discriminative model. Benchmarks show that the resulting CLQS system significantly out performs a baseline system based on dictionary-based query translation. Besides, the resulting CLQS is tested with French to English CLIR tasks on TREC collections. The results demonstrate higher effectiveness than the traditional query translation methods.

international conference on data engineering | 2008

Multiple Materialized View Selection for XPath Query Rewriting

Nan Tang; Jeffrey Xu Yu; M.T. Ozsu; Byron Choi; Kam-Fai Wong

We study the problem of answering XPATH queries using multiple materialized views. Despite the efforts on answering queries using single materialized view, answering queries using multiple views remains relatively new. We address two important aspects of this problem: multiple-view selection and equivalent multiple-view rewriting. With regards to the first problem, we propose an NFA-based approach (called VFILTER) to filter views that cannot be used to answer a given query. We then present the criterion for multiple view/query answerability. Based on the output of VFILTER, we further propose a heuristic method to identify a minimal view set that can answer a given query. For the problem of multiple-view rewriting, we first refine the materialized fragments of each selected view (like pushing selection), we then join the refined fragments utilizing an encoding scheme. Finally, we extract the result of the query from the materialized fragments of a single view. Experiments show the efficiency of our approach.

Computers & Operations Research | 2003

FACOPT: a user friendly FACility layout OPTimization system

Jaydeep Balakrishnan; Chun Hung Cheng; Kam-Fai Wong

Abstract The facility layout problem is a well-researched one. However, few effective and user friendly approaches have been proposed. Since it is an NP hard problem, various optimization approaches for small problems and heuristic approaches for the larger problems have been proposed. For the most part the more effective algorithms are not user friendly. On the other hand, user-friendly methods have not been effective in handling the intricacies such as unequal department sizes. In this research, we present FACOPT, a heuristic approach that is effective and user friendly. The software uses two methods, simulated annealing and genetic algorithm to solve the facility layout problem. Computational tests are also done to identify good parameter values and to compare the performance of the two algorithms. Scope and purpose Various methods have been proposed for facility layout where departments are laid out within a facility. However, many of them are not flexible enough to handle intricacies such as unequal department sizes. Others do not provide user-friendly interfaces. Thus, there is a need for user-friendly software incorporating effective and flexible procedures. In this research we present FACOPT, a heuristic approach that is effective and user friendly. The software uses two different approaches to solve the facility layout problem effectively. FACOPT has a Visual BASIC interface and runs under a Windows environment for ease of use.

international joint conference on natural language processing | 2004

Phoneme-Based transliteration of foreign names for OOV problem

Wei Gao; Kam-Fai Wong; Wai Lam

A proper noun dictionary is never complete rendering name translation from English to Chinese ineffective. One way to solve this problem is not to rely on a dictionary alone but to adopt automatic translation according to pronunciation similarities, i.e. to map phonemes comprising an English name to the phonetic representations of the corresponding Chinese name. This process is called transliteration. We present a statistical transliteration method. An efficient algorithm for aligning phoneme chunks is described. Unlike rule-based approaches, our method is data-driven. Compared to source-channel based statistical approaches, we adopt a direct transliteration model, i.e. the direction of probabilistic estimation conforms to the transliteration direction. We demonstrate comparable performance to source-channel based system.

conference on information and knowledge management | 2015

Detect Rumors Using Time Series of Social Context Information on Microblogging Websites

Jing Ma; Wei Gao; Zhongyu Wei; Yueming Lu; Kam-Fai Wong

Automatically identifying rumors from online social media especially microblogging websites is an important research issue. Most of existing work for rumor detection focuses on modeling features related to microblog contents, users and propagation patterns, but ignore the importance of the variation of these social context features during the message propagation over time. In this study, we propose a novel approach to capture the temporal characteristics of these features based on the time series of rumors lifecycle, for which time series modeling technique is applied to incorporate various social context information. Our experiments using the events in two microblog datasets confirm that the method outperforms state-of-the-art rumor detection approaches by large margins. Moreover, our model demonstrates strong performance on detecting rumors at early stage after their initial broadcast.

Journal of the Association for Information Science and Technology | 2000

Aboutness from a commonsense perspective

Peter D. Bruza; Dawei Song; Kam-Fai Wong

Information retrieval (IR) is driven by a process that decides whether a document is about a query. Recent attempts spawned from a logic-based information retrieval theory have formalized properties characterizing “aboutness,” but no consensus has yet been reached. The proposed properties are largely determined by the underlying framework within which aboutness is defined. In addition, some properties are only sound within the context of a given IR model, but are not sound from the perspective of the user. For example, a common form of aboutness, namely overlapping aboutness, implies precision degrading properties such as compositional monotonicity. Therefore, the motivating question for this article is: independent of any given IR model, and examined within an information-based, abstract framework, what are commonsense properties of aboutness (and its dual, nonaboutness)? We propose a set of properties characterizing aboutness and nonaboutness from a commonsense perspective. Special attention is paid to the rules prescribing conservative behavior of aboutness with respect to information composition. The interaction between aboutness and nonaboutness is modeled via normative rules. The completeness, soundness, and consistency of the aboutness proof systems are analyzed and discussed. A case study based on monotonicity shows that many current IR systems are either monotonic or nonmonotonic. An interesting class of IR models, namely those that are conservatively monotonic, is identified.

Explore More