Jonathan J. Webster | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan J. Webster is active.

Explore More

Publication

Featured researches published by Jonathan J. Webster.

international conference on computational linguistics | 1992

Tokenization as the initial phase in NLP

Jonathan J. Webster; Chunyu Kit

In this paper, the authors address the significance and complexity of tokenization, the beginning step of NLP. Notions of word and token are discussed and defined from the viewpoints of lexicography and pragmatic implementation, respectively. Automatic segmentation of Chinese words is presented as an illustration of tokenization. Practical approaches to identification of compound tokens in English, such as idioms, phrasal verbs and fixed expressions, are developed.

Information Sciences | 2008

Chinese word segmentation as morpheme-based lexical chunking

Guo-Hong Fu; Chunyu Kit; Jonathan J. Webster

Chinese word segmentation plays an important role in many Chinese language processing tasks such as information retrieval and text mining. Recent research in Chinese word segmentation focuses on tagging approaches with either characters or words as tagging units. In this paper we present a morpheme-based chunking approach and implement it in a two-stage system. It consists of two main components, namely a morpheme segmentation component to segment an input sentence to a sequence of morphemes based on morpheme-formation models and bigram language models, and a lexical chunking component to label each segmented morphemes position in a word of a special type with the aid of lexicalized hidden Markov models. To facilitate these tasks, a statistically-based technique is also developed for automatically compiling a morpheme dictionary from a segmented or tagged corpus. To evaluate this approach, we conduct a closed test and an open test using the 2005 SIGHAN Bakeoff data. Our system demonstrates state-of-the-art performance on different test sets, showing the benefits of choosing morphemes as tagging units. Furthermore, the open test results indicate significant performance enhancement using lexicalization and part-of-speech features.

international conference natural language processing | 2003

Transductive HMM based Chinese text chunking

Heng Li; Jonathan J. Webster; Chunyu Kit; Tianshun Yao

We present a novel methodology to enhance Chinese text chunking with the aid of transductive Hidden Markov Models (transductive HMMs, henceforth). We consider chunking as a special tagging problem and attempt to utilize, via a number of transformation functions, as much relevant contextual information as possible for model training. These functions enable the models to make use of contextual information to a greater extent and keep us away from costly changes of the original training and tagging process. Each of them results in an individual model with certain pros and cons. Through a number of experiments, we succeed in integrating the best two models into a significantly better one. We carry out the chunking experiments on the HIT Chinese Treebank corpus. Experimental results show that it is an effective approach, achieving an F score of 82.38%.

international conference on the computer processing of oriental languages | 2009

An Extractive Text Summarizer Based on Significant Words

Xiaoyue Liu; Jonathan J. Webster; Chunyu Kit

Document summarization can be viewed as a reductive distilling of source text through content condensation, while words with high quantities of information are believed to carry more content and thereby importance. In this paper, we propose a new quantification measure for word significance used in natural language processing (NLP) tasks, and successfully apply it to an extractive text summarization approach. In a query-based summarization setting, the correlation between user queries and sentences to be scored is established from both the micro (i.e. at the word level) and the macro (i.e. at the sentence level) perspectives, resulting in an effective ranking formula. The experiments, both on a generic single document summarization evaluation, and on a query-based multi-document evaluation, verify the effectiveness of the proposed measures and show that the proposed approach achieves a state-of-the-art performance.

mexican international conference on artificial intelligence | 2006

Mapping FrameNet and SUMO with WordNet Verb: Statistical Distribution of Lexical-Ontological Realization

Ian C. Chow; Jonathan J. Webster

Automatic acquisition of lexical knowledge is critical to a wide range of natural language processing tasks. Verb knowledge is especially important in semantic parsing. Verbs denote relational information of lexicogrammar and semantically state the participants and event involved in the meaning construed. This paper describes a statistical distribution approach to reuse and integrate information from the Suggested Upper Merged Ontology (SUMO), WordNet and FrameNet. The mapping between word-meanings, frame-semantics and world concepts suggests a heuristic approach for linking WordNet verbs and FrameNet frames providing a knowledge base for Semantic Role Labeling(SRL), identifying the appropriate range of possible semantic roles with respect to the event evoked by verb. This is accomplished through the verbs covered by both FrameNet and WordNet, taking the shared lexical knowledge as learning data to map SUMO concepts with FrameNet frames. The exploitation of the mapping aims at automatic populating WordNet data to FrameNet frames constructing a knowledge base for semantic parsing.

international conference on computational linguistics | 2009

Integration of Linguistic Resources for Verb Classification: FrameNet Frame, WordNet Verb and Suggested Upper Merged Ontology

Ian C. Chow; Jonathan J. Webster

The work described in this paper was originally motivated by the construction of a lexical semantic knowledge base for analysis of Ideational Metafunction of language in Systemic Functional Grammar and the Generalized Upper Model ontology. The work involves mapping FrameNet Frames with Ideational Meanings and instantiating WordNet Verb as the meaning evoking linguistic elements. As the work evolved, the developed method has allowed the assignment of sense-tagged WordNet verb to FrameNet Lexical Units of each Frame. The task is achieved by linking FrameNet Frames with SUMO (Suggested Upper Merged Ontology) concepts. We describe our method of mapping which reuses and integrates linkages between WordNet, FrameNet and SUMO. The generated verb list is furthered examined with WordNet::Similarity, a semantic similarity and relatedness measuring system.

international conference natural language processing | 2011

Lexical cohesion for evaluation of machine translation at document level

Billy Tak-Ming Wong; Cecilia F. K. Pun; Chunyu Kit; Jonathan J. Webster

This paper studies how granularity of machine translation evaluation can be extended from sentence to document level. While most state-of-the-art evaluation metrics focus on the sentence level, we emphasize the importance of document structure, showing that lexical cohesion is a critical feature to highlight the superior quality of human translation to machine translation, which uses cohesive devices to tie salient words between sentences together as a text. An experiment shows that this feature can bring forth a 3–5% improvement in the correlation of automatic evaluation results with human judgments of machine translation outputs at the document level.

international conference natural language processing | 2005

Mapping WordNet to a relational network

Jonathan J. Webster; Ian C. Chow

This paper discusses the implementation of a lexical knowledge base, based on a conceptualization of the problem domain in terms of relational network notation. In particular, we focus here on the importing of lemmas as instances from WordNet 2.0 into the knowledge base. Relational network notation (RNN) offers a simple yet powerful means for representing lexicogrammatical, semantic and sememic information. RNN is driven by relational network theory and incorporates developments in the theory which have been shown to not only describe but also explain linguistic phenomena in a neurologically plausible manner.

Proceedings of the Second SIGHAN Workshop on Chinese Language Processing | 2003

Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation

Chunyu Kit; Zhiming Xu; Jonathan J. Webster

This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the systems performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.

BioMed Research International | 2013

Gene Prioritization of Resistant Rice Gene against Xanthomas oryzae pv. oryzae by Using Text Mining Technologies

Jingbo Xia; Xing Zhang; Daojun Yuan; Lingling Chen; Jonathan J. Webster; Alex Chengyu Fang

To effectively assess the possibility of the unknown rice protein resistant to Xanthomonas oryzae pv. oryzae, a hybrid strategy is proposed to enhance gene prioritization by combining text mining technologies with a sequence-based approach. The text mining technique of term frequency inverse document frequency is used to measure the importance of distinguished terms which reflect biomedical activity in rice before candidate genes are screened and vital terms are produced. Afterwards, a built-in classifier under the chaos games representation algorithm is used to sieve the best possible candidate gene. Our experiment results show that the combination of these two methods achieves enhanced gene prioritization.

Explore More