Percy Chi Shun Cheung
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Percy Chi Shun Cheung.
international conference on computational linguistics | 2004
Pascale Fung; Percy Chi Shun Cheung
We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We propose that while better document matching leads to better parallel sentence extraction, better sentence matching also leads to better document matching. Based on this, we use multi-level bootstrapping to improve the alignments between documents, sentences, and bilingual word pairs, iteratively. Our method is the first method that does not rely on any supervised training data, such as a sentence-aligned corpus, or temporal information, such as the publishing date of a news article. It is validated by experimental results that show a 23% improvement over a method without multilevel bootstrapping.
Machine Translation | 2004
Percy Chi Shun Cheung; Pascale Fung
AbstractCode-switching is very common among bilingual speakers. Spoken queries by these speakers are typically in mixed language. In this paper, we propose an unsupervised method for mixed-language query understanding, using only a monolingual corpus and a bilingual dictionary. Secondary-language words mixed in a primary-language query are translated into words in the primary language. We found that using a single disambiguation feature for translation is more effective than using multiple features, provided this feature is based on the most salient seed-word, chosen automatically by confidence scoring. We propose and compare four types of disambiguation features that are based on context seed-words. A baseline method uses the nearest neighboring seed-word as disambiguation feature. Multiple-context seed-word voting is also proposed in order to enlarge the context window. On the other hand, merely using the inverse-distance as weights on context words degrades the performance as it runs counter to the potential underlying syntactic relations between words. Our final proposal is a solution that uses multiple-context seed-words and the translation candidates of all mixed language words to select a single most salient seed-word for translation disambiguation. The translation disambiguation accuracy for this feature is at 83.7% for all words in the ATIS spontaneous speech query database, and 66.7% for content words.
Conference on Empirical Methods in NLP 2004, Barcelona, Span | 2004
Pascale Fung; Percy Chi Shun Cheung
LREC Workshop on the amazing utility of parallel corpora, Portugal, May 2004 | 2004
Percy Chi Shun Cheung; Pascale Fung
conference of the international speech communication association | 1999
Yi Liu; Pascale Fung; Percy Chi Shun Cheung
empirical methods in natural language processing | 2004
Pascale Fung; Percy Chi Shun Cheung
The First AEARU Web Technolgy Workshop, Kyoto | 1998
Pascale Fung; Percy Chi Shun Cheung; Kwok Leung Lam; Wai Kat Liu; Yuen Yee Lo; Chi Yuen Ma
international conference on spoken language processing | 1998
Pascale Fung; Percy Chi Shun Cheung; Kwok Leung Lam; Wai Kat Lau; Yuen Yee Lo
LREC Workshop on Usage of Parallel Corpora, Lisbon, May 2004 | 2004
Percy Chi Shun Cheung; Pascale Fung
conference of the international speech communication association | 1999
Xiaohu Liu; Pascale Fung; Percy Chi Shun Cheung