Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Percy Chi Shun Cheung is active.

Publication


Featured researches published by Percy Chi Shun Cheung.


international conference on computational linguistics | 2004

Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus

Pascale Fung; Percy Chi Shun Cheung

We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We propose that while better document matching leads to better parallel sentence extraction, better sentence matching also leads to better document matching. Based on this, we use multi-level bootstrapping to improve the alignments between documents, sentences, and bilingual word pairs, iteratively. Our method is the first method that does not rely on any supervised training data, such as a sentence-aligned corpus, or temporal information, such as the publishing date of a news article. It is validated by experimental results that show a 23% improvement over a method without multilevel bootstrapping.


Machine Translation | 2004

Translation Disambiguation in Mixed Language Queries

Percy Chi Shun Cheung; Pascale Fung

AbstractCode-switching is very common among bilingual speakers. Spoken queries by these speakers are typically in mixed language. In this paper, we propose an unsupervised method for mixed-language query understanding, using only a monolingual corpus and a bilingual dictionary. Secondary-language words mixed in a primary-language query are translated into words in the primary language. We found that using a single disambiguation feature for translation is more effective than using multiple features, provided this feature is based on the most salient seed-word, chosen automatically by confidence scoring. We propose and compare four types of disambiguation features that are based on context seed-words. A baseline method uses the nearest neighboring seed-word as disambiguation feature. Multiple-context seed-word voting is also proposed in order to enlarge the context window. On the other hand, merely using the inverse-distance as weights on context words degrades the performance as it runs counter to the potential underlying syntactic relations between words. Our final proposal is a solution that uses multiple-context seed-words and the translation candidates of all mixed language words to select a single most salient seed-word for translation disambiguation. The translation disambiguation accuracy for this feature is at 83.7% for all words in the ATIS spontaneous speech query database, and 66.7% for content words.


Conference on Empirical Methods in NLP 2004, Barcelona, Span | 2004

Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM

Pascale Fung; Percy Chi Shun Cheung


LREC Workshop on the amazing utility of parallel corpora, Portugal, May 2004 | 2004

Sentence Alignment in Paralllel, Comparable and Quasi-comparable Corpora

Percy Chi Shun Cheung; Pascale Fung


conference of the international speech communication association | 1999

Decision tree-based triphones are robust and practical for mandarian speech recognition.

Yi Liu; Pascale Fung; Percy Chi Shun Cheung


empirical methods in natural language processing | 2004

Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E.

Pascale Fung; Percy Chi Shun Cheung


The First AEARU Web Technolgy Workshop, Kyoto | 1998

SALSA, A Multilingual Speech-Based Web Browser

Pascale Fung; Percy Chi Shun Cheung; Kwok Leung Lam; Wai Kat Liu; Yuen Yee Lo; Chi Yuen Ma


international conference on spoken language processing | 1998

SALSA version 1.0, A Speech-Based Web Browser

Pascale Fung; Percy Chi Shun Cheung; Kwok Leung Lam; Wai Kat Lau; Yuen Yee Lo


LREC Workshop on Usage of Parallel Corpora, Lisbon, May 2004 | 2004

From Parallel Corpora to Non-Parallel Corpora, Methods for Bilingual Sentence Extraction

Percy Chi Shun Cheung; Pascale Fung


conference of the international speech communication association | 1999

A Monolingual Semantic Decoder Based on Word Sense Disambiguation for Mixed Language

Xiaohu Liu; Pascale Fung; Percy Chi Shun Cheung

Collaboration


Dive into the Percy Chi Shun Cheung's collaboration.

Top Co-Authors

Avatar

Pascale Fung

Hong Kong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yi Liu

Hong Kong University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge