Is this you? Create Your Porfile

Yee Seng Chan

National University of Singapore

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yee Seng Chan is active.

Explore More

Publication

Featured researches published by Yee Seng Chan.

meeting of the association for computational linguistics | 2003

Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study

Hwee Tou Ng; Bin Wang; Yee Seng Chan

A central problem of word sense disambiguation (WSD) is the lack of manually sense-tagged data required for supervised learning. In this paper, we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method of acquiring sense-tagged data is promising. On a subset of the most difficult SENSEVAL-2 nouns, the accuracy difference between the two approaches is only 14.0%, and the difference could narrow further to 6.5% if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs.

meeting of the association for computational linguistics | 2007

NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks

Yee Seng Chan; Hwee Tou Ng; Zhi Zhong

We participated in the SemEval-2007 coarse-grained English all-words task and fine-grained English all-words task. We used a supervised learning approach with SVM as the learning algorithm. The knowledge sources used include local collocations, parts-of-speech, and surrounding words. We gathered training examples from English-Chinese parallel corpora, SemCor, and DSO corpus. While the fine-grained sense inventory of WordNet was used to train our system employed for the fine-grained English all-words task, our system employed for the coarse-grained English all-words task was trained with the coarse-grained sense inventory released by the task organizers. Our scores (for both recall and precision) are 0.825 and 0.587 for the coarse-grained English all-words task and fine-grained English all-words task respectively. These scores put our systems in the first place for the coarse-grained English all-words task and the second place for the fine-grained English all-words task.

meeting of the association for computational linguistics | 2006

Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation

Yee Seng Chan; Hwee Tou Ng

Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy.

empirical methods in natural language processing | 2008

Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms

David Chiang; Steve DeNeefe; Yee Seng Chan; Hwee Tou Ng

Bleu is the de facto standard for evaluation and development of statistical machine translation systems. We describe three real-world situations involving comparisons between different versions of the same systems where one can obtain improvements in Bleu scores that are questionable or even absurd. These situations arise because Bleu lacks the property of decomposability, a property which is also computationally convenient for various applications. We propose a very conservative modification to Bleu and a cross between Bleu and word error rate that address these issues while improving correlation with human judgments.

empirical methods in natural language processing | 2008

Word Sense Disambiguation Using OntoNotes: An Empirical Study

Zhi Zhong; Hwee Tou Ng; Yee Seng Chan

The accuracy of current word sense disambiguation (WSD) systems is affected by the fine-grained sense inventory of WordNet as well as a lack of training examples. Using the WSD examples provided through OntoNotes, we conduct the first large-scale WSD evaluation involving hundreds of word types and tens of thousands of sense-tagged examples, while adopting a coarse-grained sense inventory. We show that though WSD systems trained with a large number of examples can obtain a high level of accuracy, they nevertheless suffer a substantial drop in accuracy when applied to a different domain. To address this issue, we propose combining a domain adaptation technique using feature augmentation with active learning. Our results show that this approach is effective in reducing the annotation effort required to adapt a WSD system to a new domain. Finally, we propose that one can maximize the dual benefits of reducing the annotation effort while ensuring an increase in WSD accuracy, by only performing active learning on the set of most frequently occurring word types.

meeting of the association for computational linguistics | 2007

SemEval-2007 Task 11: English Lexical Sample Task via English-Chinese Parallel Text

Hwee Tou Ng; Yee Seng Chan

We made use of parallel texts to gather training and test examples for the English lexical sample task. Two tracks were organized for our task. The first track used examples gathered from an LDC corpus, while the second track used examples gathered from a Web corpus. In this paper, we describe the process of gathering examples from the parallel corpora, the differences with similar tasks in previous SENSEVAL evaluations, and present the results of participating systems.

Machine Translation | 2009

MaxSim: performance and effects of translation fluency

Yee Seng Chan; Hwee Tou Ng

This paper evaluates the performance of our recently proposed automatic machine translation evaluation metric MaxSim and examines the impact of translation fluency on the metric. MaxSim calculates a similarity score between a pair of English system-reference sentences by comparing information items such as n-grams across the sentence pair. Unlike most metrics which perform binary matching, MaxSim also computes similarity scores between items and models them as nodes in a bipartite graph to select a maximum weight matching. Our experiments show that MaxSim is competitive with state-of-the-art metrics on benchmark datasets.

meeting of the association for computational linguistics | 2007