Oi Yee Kwong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Oi Yee Kwong is active.

Explore More

Publication

Featured researches published by Oi Yee Kwong.

Archive | 2005

Natural Language Processing – IJCNLP 2004

Keh-Yih Su; Jun’ichi Tsujii; Jong-Hyeok Lee; Oi Yee Kwong

Information Retrieval.- A New Method for Sentiment Classification in Text Retrieval.- Topic Tracking Based on Linguistic Features.- The Use of Monolingual Context Vectors for Missing Translations in Cross-Language Information Retrieval.- Automatic Image Annotation Using Maximum Entropy Model.- Corpus-Based Parsing.- Corpus-Based Analysis of Japanese Relative Clause Constructions.- Parsing Biomedical Literature.- Parsing the Penn Chinese Treebank with Semantic Knowledge.- Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese.- Web Mining.- Entropy as an Indicator of Context Boundaries: An Experiment Using a Web Search Engine.- Automatic Discovery of Attribute Words from Web Documents.- Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web.- Confirmed Knowledge Acquisition Using Mails Posted to a Mailing List.- Rule-Based Parsing.- Automatic Partial Parsing Rule Acquisition Using Decision Tree Induction.- Chunking Using Conditional Random Fields in Korean Texts.- High Efficiency Realization for a Wide-Coverage Unification Grammar.- Linguistically-Motivated Grammar Extraction, Generalization and Adaptation.- Disambiguation.- PP-Attachment Disambiguation Boosted by a Gigantic Volume of Unambiguous Examples.- Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain.- A Hybrid Approach to Single and Multiple PP Attachment Using WordNet.- Period Disambiguation with Maxent Model.- Text Mining.- Acquiring Synonyms from Monolingual Comparable Texts.- A Method of Recognizing Entity and Relation.- Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora.- Automatic Term Extraction Based on Perplexity of Compound Words.- Document Analysis.- Document Clustering with Grouping and Chaining Algorithms.- Using Multiple Discriminant Analysis Approach for Linear Text Segmentation.- Classifying Chinese Texts in Two Steps.- Assigning Polarity Scores to Reviews Using Machine Learning Techniques.- Ontology and Thesaurus.- Analogy as Functional Recategorization: Abstraction with HowNet Semantics.- PLSI Utilization for Automatic Thesaurus Construction.- Analysis of an Iterative Algorithm for Term-Based Ontology Alignment.- Finding Taxonomical Relation from an MRD for Thesaurus Extension.- Relation Extraction.- Relation Extraction Using Support Vector Machine.- Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering.- Automatic Relation Extraction with Model Order Selection and Discriminative Label Identification.- Mining Inter-Entity Semantic Relations Using Improved Transductive Learning.- Text Classification.- A Preliminary Work on Classifying Time Granularities of Temporal Questions.- Classification of Multiple-Sentence Questions.- Transliteration.- A Rule Based Syllabification Algorithm for Sinhala.- An Ensemble of Grapheme and Phoneme for Machine Transliteration.- Machine Translation - I.- Improving Statistical Word Alignment with Ensemble Methods.- Empirical Study of Utilizing Morph-Syntactic Information in SMT.- Question Answering.- Instance-Based Generation for Interactive Restricted Domain Question Answering Systems.- Answering Definition Questions Using Web Knowledge Bases.- Exploring Syntactic Relation Patterns for Question Answering.- Web-Based Unsupervised Learning for Query Formulation in Question Answering.- Morphological Analysis.- A Chunking Strategy Towards Unknown Word Detection in Chinese Word Segmentation.- A Lexicon-Constrained Character Model for Chinese Morphological Analysis.- Relative Compositionality of Multi-word Expressions: A Study of Verb-Noun (V-N) Collocations.- Automatic Extraction of Fixed Multiword Expressions.- Machine Translation - II.- Phrase-Based Statistical Machine Translation: A Level of Detail Approach.- Why Is Zero Marking Important in Korean?.- A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation.- Machine Translation Based on Constraint-Based Synchronous Grammar.- Text Summarization.- A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation.- Significant Sentence Extraction by Euclidean Distance Based on Singular Value Decomposition.- Named Entity Recognition.- Two-Phase Biomedical Named Entity Recognition Using A Hybrid Method.- Heuristic Methods for Reducing Errors of Geographic Named Entities Learned by Bootstrapping.- Linguistic Resources and Tools.- Building a Japanese-Chinese Dictionary Using Kanji/Hanzi Conversion.- Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus.- CTEMP: A Chinese Temporal Parser for Extracting and Normalizing Temporal Information.- French-English Terminology Extraction from Comparable Corpora.- Discourse Analysis.- A Twin-Candidate Model of Coreference Resolution with Non-Anaphor Identification Capability.- Improving Korean Speech Acts Analysis by Using Shrinkage and Discourse Stack.- Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources.- Automatic Slide Generation Based on Discourse Structure Analysis.- Semantic Analysis - I.- Using the Structure of a Conceptual Network in Computing Semantic Relatedness.- Semantic Role Labelling of Prepositional Phrases.- Global Path-Based Refinement of Noisy Graphs Applied to Verb Semantics.- Semantic Role Tagging for Chinese at the Lexical Level.- NLP Applications.- Detecting Article Errors Based on the Mass Count Distinction.- Principles of Non-stationary Hidden Markov Model and Its Applications to Sequence Labeling Task.- Integrating Punctuation Rules and Naive Bayesian Model for Chinese Creation Title Recognition.- A Connectionist Model of Anticipation in Visual Worlds.- Tagging.- Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora.- The Verbal Entries and Their Description in a Grammatical Information-Dictionary of Contemporary Tibetan.- Tense Tagging for Verbs in Cross-Lingual Context: A Case Study.- Regularisation Techniques for Conditional Random Fields: Parameterised Versus Parameter-Free.- Semantic Analysis - II.- Exploiting Lexical Conceptual Structure for Paraphrase Generation.- Word Sense Disambiguation by Relative Selection.- Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features.- Automatic Interpretation of Noun Compounds Using WordNet Similarity.- Language Models.- An Empirical Study on Language Model Adaptation Using a Metric of Domain Similarity.- A Comparative Study of Language Models for Book and Author Recognition.- Spoken Language.- Lexical Choice via Topic Adaptation for Paraphrasing Written Language to Spoken Language.- A Case-Based Reasoning Approach for Speech Corpus Generation.- Terminology Mining.- Web-Based Terminology Translation Mining.- Extracting Terminologically Relevant Collocations in the Translation of Chinese Monograph.

international conference on computational linguistics | 2004

Morpheme-based derivation of bipolar semantic orientation of Chinese words

Raymond W.M. Yuen; Terence Y.W. Chan; Tom B. Y. Lai; Oi Yee Kwong; Benjamin Ka-Yin T'sou

The evaluative character of a word is called its semantic orientation (SO). A positive SO indicates desirability (e.g. Good, Honest) and a negative SO indicates undesirability (e.g., Bad, Ugly). This paper presents a method, based on Turney (2003), for inferring the SO of a word from its statistical association with strongly-polarized words and morphemes in Chinese. It is noted that morphemes are much less numerous than words, and that also a small number of fundamental morphemes may be used in the modified system to great advantage. The algorithm was tested on 1,249 words (604 positive and 645 negative) in a corpus of 34 million words, and was run with 20 and 40 polarized words respectively, giving a high precision (79.96% to 81.05%), but a low recall (45.56% to 59.57%). The algorithm was then run with 20 polarized morphemes, or single characters, in the same corpus, giving a high precision of 80.23% and a high recall of 85.03%. We concluded that morphemes in Chinese, as in any language, constitute a distinct sub-lexical unit which, though small in number, has greater linguistic significance than words, as seen by the significant enhancement of results with a much smaller corpus than that required by Turney.

conference of the european chapter of the association for computational linguistics | 2003

Categorial fluidity in Chinese and its implications for part-of-speech tagging

Oi Yee Kwong; Benjamin K. Tsou

This paper discusses the theoretical and practical concerns in part-of-speech (POS) tagging for Chinese. Unlike other languages such as English, Chinese lacks morphological marking in association with categorial alternations. We consider such categorial fluidity a continuum, and any categorial shift a transition, with special focus on the verb-noun shift. Preliminary observations are reported on this phenomenon from empirical data, and we suggest that POS tagging should not only be theoretically valid but also sufficiently capture the extent of categorial fluidity as reflected by the data.

International Journal of Computer Processing of Languages | 2005

Sentiment and Content Analysis of Chinese News Coverage

Benjamin Ka-Yin T'sou; Oi Yee Kwong; Wei Lung Wong; Tom B. Y. Lai

Typical news coverage contains both objective facts and subjective sentiments. This is especially true for newsworthy individuals and organizations, and media opinion on strategic subjects. Analysis either on demand or on a longitudinal basis provides a critical source of information heretofore not readily nor economically obtainable for a range of meaningful purposes. One application is the monitoring of positive or negative summative news coverage on targeted subjects.

Archive | 2012

New Perspectives on Computational and Cognitive Strategies for Word Sense Disambiguation

Oi Yee Kwong

Cognitive and Computational Strategies for Word Sense Disambiguation examines cognitive strategies by humans and computational strategies by machines, for WSD in parallel. Focusing on a psychologically valid property of words and senses,author Oi Yee Kwongdiscusses their concreteness or abstractness and draws on psycholinguistic data to examine the extent to which existing lexical resources resemble the mental lexicon as far as the concreteness distinction is concerned. The text also investigates the contribution of different knowledge sources to WSD in relation to this very intrinsic nature of words and senses.

Software - Practice and Experience | 2003

Bilingual legal document retrieval and management using XML

Robert W. P. Luk; Benjamin K. Tsou; Tom B. Y. Lai; Oi Yee Kwong; Francis C. Y. Chik; Lawrence Y. L. Cheung

In certain bilingual and multi‐lingual societies, translated legal documents are as important as the original legal documents because they have the same legal status as the originals. However, there is little reported work on the retrieval and management of bilingual legal documents. We describe the design and development of a bilingual document retrieval and management prototype, called ELDoS, which is used by court interpreters and judges from the Hong Kong Judiciary. Since the speed of retrieval is a major concern for user acceptance, and therefore for widespread deployment of the system, the architecture of the prototype is designed to balance the workload of the client and server. Extensible Markup Language (XML) is used to mark up the bilingual legal documents for a variety of document retrieval and management tasks. XML enables the use of XML Stylesheet Language Transformation (XSLT) to align bilingual data in the client, instead of the server, and improve alignment speed linearly with respect to the size of the document, using a high‐end PC, when the server has no concurrent access. The design of the interface was continually improved after extensive consultation with court interpreters and after the user acceptance tests. In our evaluation, the facilities for highlighting translated terms have a macro‐averaged precision of 90+% and a macro‐average recall of 80+%, which were considered acceptable by our users. We believe that the experience in the design and development of this prototype is applicable to other language pairs as well as to other domains. Copyright

meeting of the association for computational linguistics | 2009

Homophones and Tonal Patterns in English-Chinese Transliteration

Oi Yee Kwong

The abundance of homophones in Chinese significantly increases the number of similarly acceptable candidates in English-to-Chinese transliteration (E2C). The dialectal factor also leads to different transliteration practice. We compare E2C between Mandarin Chinese and Cantonese, and report work in progress for dealing with homophones and tonal patterns despite potential skewed distributions of individual Chinese characters in the training data.

international conference natural language processing | 2006

Feasibility of enriching a chinese synonym dictionary with a synchronous chinese corpus

Oi Yee Kwong; Benjamin K. Tsou

This paper reports on a first step toward the construction of a Pan-Chinese lexical resource. We investigated the plausibility of extending and enhancing an existing Chinese synonym dictionary, the Tongyici Cilin, with lexical items from the financial news domain obtained from a synchronous Chinese corpus, LIVAC. Results showed that 23-40% of the words from various subcorpora are unique to the individual communities, and as much as 70% of such unique items are not yet covered in Cilin. Our next step would be to explore automatic means for extracting related lexical items from the corpus, and to incorporate them into existing semantic classifications.

Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) | 2009

Graphemic Approximation of Phonological Context for English-Chinese Transliteration

Oi Yee Kwong

Although direct orthographic mapping has been shown to outperform phoneme-based methods in English-to-Chinese (E2C) transliteration, it is observed that phonological context plays an important role in resolving graphemic ambiguity. In this paper, we investigate the use of surface graphemic features to approximate local phonological context for E2C. In the absence of an explicit phonemic representation of the English source names, experiments show that the previous and next character of a given English segment could effectively capture the local context affecting its expected pronunciation, and thus its rendition in Chinese.

international conference on computational linguistics | 2002

Some considerations on guidelines for bilingual alignment and terminology extraction

Lawrence Y. L. Cheung; Tom B. Y. Lai; Robert W. P. Luk; Oi Yee Kwong; King Kui Sin; Benjamin K. Tsou

Despite progress in the development of computational means, human input is still critical in the production of consistent and useable aligned corpora and term banks. This is especially true for specialized corpora and term banks whose end-users are often professionals with very stringent requirements for accuracy, consistency and coverage. In the compilation of a high quality Chinese-English legal glossary for ELDoS project, we have identified a number of issues that make the role human input critical for term alignment and extraction. They include the identification of low frequency terms, paraphrastic expressions, discontinuous units, and maintaining consistent term granularity, etc. Although manual intervention can more satisfactorily address these issues, steps must also be taken to address intra- and inter-annotator inconsistency.

Explore More