Is this you? Create Your Porfile

Hwee Tou Ng

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hwee Tou Ng is active.

Explore More

Publication

Featured researches published by Hwee Tou Ng.

Computational Linguistics | 2001

A machine learning approach to coreference resolution of noun phrases

Wee Meng Soon; Hwee Tou Ng; Daniel Chung Yong Lim

In this paper, we present a learning approach to coreference resolution of noun phrases in unrestricted text. The approach learns from a small, annotated corpus and the task includes resolving not just a certain type of noun phrase (e.g., pronouns) but rather general noun phrases. It also does not restrict the entity types of the noun phrases; that is, coreference is assigned whether they are of organization, person, or other types. We evaluate our approach on common data sets (namely, the MUC-6 and MUC-7 coreference corpora) and obtain encouraging results, indicating that on the general noun phrase coreference task, the learning approach holds promise and achieves accuracy comparable to that of nonlearning approaches. Our system is the first learning-based system that offers performance comparable to that of state-of-the-art nonlearning systems on these data sets.

international acm sigir conference on research and development in information retrieval | 1997

Feature selection, perceptron learning, and a usability case study for text categorization

Hwee Tou Ng; Wei Boon Goh; Kok Leong Low

In this paper, we describe an automated learning approach to text categorization based on perception learning and a new feature selection metric, called correlation coefficient. Our approach has been teated on the standard Reuters text categorization collection. Empirical results indicate that our approach outperforms the best published results on this % uters collection. In particular, our new feature selection method yields comiderable improvement. We also investigate the usability of our automated hxu-n~ approach by actually developing a system that categorizes texts into a treeof categories. We compare tbe accuracy of our learning approach to a rrddmsed, expert system ap preach that uses a text categorization shell built by Cams gie Group. Although our automated learning approach still gives a lower accuracy, by appropriately inmrporating a set of manually chosen worda to use as f~ures, the combined, semi-automated approach yields accuracy close to the * baaed approach.

meeting of the association for computational linguistics | 1996

Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach

Hwee Tou Ng; Hian Beng Lee

In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense, including part of speech of neighboring words, morphological form, the unordered set of surrounding words, local collocations, and verb-object syntactic relation. We tested our WSD program, named LEXAS, on both a common data set used in previous work, as well as on a large sense-tagged corpus that we separately constructed. LEXAS achieves a higher accuracy on the common data set, and performs better than the most frequent heuristic on the highly ambiguous words in the large corpus tagged with the refined senses of WORDNET.

empirical methods in natural language processing | 2002

An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation

Yoong Keok Lee; Hwee Tou Ng

In this paper, we evaluate a variety of knowledge sources and supervised learning algorithms for word sense disambiguation on SENSEVAL-2 and SENSEVAL-1 data. Our knowledge sources include the part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic relations. The learning algorithms evaluated include Support Vector Machines (SVM), Naive Bayes, AdaBoost, and decision tree algorithms. We present empirical results showing the relative contribution of the component knowledge sources and the different learning algorithms. In particular, using all of these knowledge sources and SVM (i.e., a single learning algorithm) achieves accuracy higher than the best official scores on both SENSEVAL-2 and SENSEVAL-1 test data.

international conference on computational linguistics | 2002

Named entity recognition: a maximum entropy approach using global information

Hai Leong Chieu; Hwee Tou Ng

This paper presents a maximum entropy-based named entity recognizer (NER). It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier. Previous work that involves the gathering of information from the whole document often uses a secondary classifier, which corrects the mistakes of a primary sentence-based classifier. In this paper, we show that the maximum entropy framework is able to make use of global information directly, and achieves performance that is comparable to the best previous machine learning-based NERs on MUC-6 and MUC-7 test data.

Natural Language Engineering | 2014

A PDTB-Styled End-to-End Discourse Parser

Ziheng Lin; Hwee Tou Ng; Min-Yen Kan

Since the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser-to-parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, non-explicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F 1 score of 46.80 percent for partial matching utilizing gold standard parses, and 38.18 percent with full automation.

north american chapter of the association for computational linguistics | 2003

Named entity recognition with a maximum entropy approach

Hai Leong Chieu; Hwee Tou Ng

The named entity recognition (NER) task involves identifying noun phrases that are names, and assigning a class to each name. This task has its origin from the Message Understanding Conferences (MUC) in the 1990s, a series of conferences aimed at evaluating systems that extract information from natural language texts. It became evident that in order to achieve good performance in information extraction, a system needs to be able to recognize names. A separate subtask on NER was created in MUC-6 and MUC-7 (Chinchor, 1998).

meeting of the association for computational linguistics | 2003

Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study

Hwee Tou Ng; Bin Wang; Yee Seng Chan

A central problem of word sense disambiguation (WSD) is the lack of manually sense-tagged data required for supervised learning. In this paper, we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method of acquiring sense-tagged data is promising. On a subset of the most difficult SENSEVAL-2 nouns, the accuracy difference between the two approaches is only 14.0%, and the difference could narrow further to 6.5% if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs.

international world wide web conferences | 2003

Mining topic-specific concepts and definitions on the web

Bing Liu; Chee Wee Chin; Hwee Tou Ng

Traditionally, when one wants to learn about a particular topic, one reads a book or a survey paper. With the rapid expansion of the Web, learning in-depth knowledge about a topic from the Web is becoming increasingly important and popular. This is also due to the Webs convenience and its richness of information. In many cases, learning from the Web may even be essential because in our fast changing world, emerging topics appear constantly and rapidly. There is often not enough time for someone to write a book on such topics. To learn such emerging topics, one can resort to research papers. However, research papers are often hard to understand by non-researchers, and few research papers cover every aspect of the topic. In contrast, many Web pages often contain intuitive descriptions of the topic. To find such Web pages, one typically uses a search engine. However, current search techniques are not designed for in-depth learning. Top ranking pages from a search engine may not contain any description of the topic. Even if they do, the description is usually incomplete since it is unlikely that the owner of the page has good knowledge of every aspect of the topic. In this paper, we attempt a novel and challenging task, mining topic-specific knowledge on the Web. Our goal is to help people learn in-depth knowledge of a topic systematically on the Web. The proposed techniques first identify those sub-topics or salient concepts of the topic, and then find and organize those informative pages, containing definitions and descriptions of the topic and sub-topics, just like those in a book. Experimental results using 28 topics show that the proposed techniques are highly effective.

empirical methods in natural language processing | 2008

A Generative Model for Parsing Natural Language to Meaning Representations

Wei Lu; Hwee Tou Ng; Wee Sun Lee; Luke Zettlemoyer

In this paper, we present an algorithm for learning a generative model of natural language sentences together with their formal meaning representations with hierarchical structures. The model is applied to the task of mapping sentences to hierarchical representations of their underlying meaning. We introduce dynamic programming techniques for efficient training and decoding. In experiments, we demonstrate that the model, when coupled with a discriminative reranking technique, achieves state-of-the-art performance when tested on two publicly available corpora. The generative model degrades robustly when presented with instances that are different from those seen in training. This allows a notable improvement in recall compared to previous models.

Explore More