Christopher S. G. Khoo
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christopher S. G. Khoo.
meeting of the association for computational linguistics | 2000
Christopher S. G. Khoo; Syin Chan; Yun Niu
This paper reports the first part of a project that aims to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from textual databases. In this initial study, we develop a method to identify and extract cause-effect information that is explicitly expressed in medical abstracts in the Medline database. A set of graphical patterns were constructed that indicate the presence of a causal relation in sentences, and which part of the sentence represents the cause and which part represents the effect. The patterns are matched with the syntactic parse trees of sentences, and the parts of the parse tree that match with the slots in the patterns are extracted as the cause or the effect.
Journal of Information Science | 2010
Tun Thura Thet; Jin-Cheon Na; Christopher S. G. Khoo
In this article, a method for automatic sentiment analysis of movie reviews is proposed, implemented and evaluated. In contrast to most studies that focus on determining only sentiment orientation (positive versus negative), the proposed method performs fine-grained analysis to determine both the sentiment orientation and sentiment strength of the reviewer towards various aspects of a movie. Sentences in review documents contain independent clauses that express different sentiments toward different aspects of a movie. The method adopts a linguistic approach of computing the sentiment of a clause from the prior sentiment scores assigned to individual words, taking into consideration the grammatical dependency structure of the clause. The prior sentiment scores of about 32,000 individual words are derived from SentiWordNet with the help of a subjectivity lexicon. Negation is delicately handled. The output sentiment scores can be used to identify the most positive and negative clauses or sentences with respect to particular movie aspects.
acm/ieee joint conference on digital libraries | 2002
Ee-Peng Lim; Dion Hoe-Lian Goh; Zehua Liu; Wee-Keong Ng; Christopher S. G. Khoo; Susan Ellen Higgins
As the World Wide Web evolves into an immense information network, it is tempting to build new digital library services and expand existing digital library services to make use of web content. In this paper, we present the design and implementation of G-Portal, a web portal that aims to provide digital library services over geospatial and georeferenced content found on the World Wide Web. G-Portal adopts a map-based user interface to visualize and manipulate the distributed geospatial and georeferenced content. Annotation capabilities are supported, allowing users to contribute geospatial and georeferenced objects as well as their associated metadata. The other features included in G-Portals design are query support, content classification, and content maintenance. This paper will mainly focus on the architecture design, visualization and annotation capabilities of G-Portal.
Archive | 2002
Ee-Peng Lim; Schubert Foo; Christopher S. G. Khoo; Hsinchun Chen; Edward A. Fox; Shalini R. Urs; Thanos Costantino
After a decade of research and development, digital libraries are becoming operational systems and services. This paper summarizes some of the challenges required for that transition. Digital libraries as systems are converging with digital libraries as institutions, particularly as we consider the service aspects. They are enabling technologies for applications such as classroom instruction, information retrieval, and electronic commerce. Because usability depends heavily upon context, research on uses and users of digital libraries needs to be conducted in a wide array of environments. Interoperability and scaling continue to be major issues, but the problems are better understood. While technical work on interoperability and scaling continues, institutional collaboration is an emerging focus. Concerns for an information infrastructure to support digital libraries is moving toward the concept of “cyberinfrastructure,” now that distributed networks are widely deployed and access is becoming ubiquitous. Appropriate evaluation methods and metrics are requirements for sustainable digital libraries that have received little attention until recently. We need to know what works and in what contexts. Evaluation has many aspects and can address a variety of goals, such as usability, maintainability, interoperability, scalability, and economic viability. Lastly, two areas that have received considerable discussion elsewhere are noted -digital preservation and the role of information institutions such as libraries and archives.
international acm sigir conference on research and development in information retrieval | 1999
Yubin Dai; Teck Ee Loh; Christopher S. G. Khoo
A new statistical formula for identifying 2-character words in Chinese text, called the contextual information formula, was developed empirically by performing stepwise logistic regression using a sample of sentences that had been manually segmented. Contextual information in the form of the frequency of characters that are adjacent to the bigram being processed as well as the weighted document frequency of the overlapping bigrams were found to be significant factors for predicting the probablity that the bigram constitutes a word. Local information (the number of times the bigram occurs in the document being segmented) and the position of the bigram in the sentence were not found to be useful in determining words. The contextual information formula was found to be significantly and substantially better than the mutual information formula in identifying 2-character words. The method can also be used for identifying multi-word terms in English text.
Information Processing and Management | 2001
Christopher S. G. Khoo; Sung Hyon Myaeng; Robert N Oddy
Abstract This study attempted to use semantic relations expressed in text, in particular cause-effect relations, to improve information retrieval effectiveness. The study investigated whether the information obtained by matching cause-effect relations expressed in documents with the cause-effect relations expressed in users’ queries can be used to improve document retrieval results, in comparison to using just keyword matching without considering relations. An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query. Causal relation matching did not perform better than word proximity matching (i.e. matching pairs of causally related words in the query with pairs of words that co-occur within document sentences), but the best results were obtained when causal relation matching was combined with word proximity matching. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any word.
Information Processing and Management | 1994
Christopher S. G. Khoo; Danny Chiang Choon Poo
Abstract The various ways of improving the online catalog for subject searching are reviewed. The paper then discusses the expert system approach to developing a subject search front-end. It is suggested that an effective expert front-end can be developed by focusing on search strategies. A design for a rule-based expert system front-end is described. Possible search strategies and selection rules are illustrated. The inference structure of the system is based on Clanceys model of heuristic classification.
Journal of Information Science | 2008
Shiyan Ou; Christopher S. G. Khoo; Dion Hoe-Lian Goh
This paper describes a new concept-based multi-document summarization system that employs discourse parsing, information extraction and information integration. Dissertation abstracts in the field of sociology were selected as sample documents for this study. The summarization process includes four major steps — (1) parsing dissertation abstracts into five standard sections; (2) extracting research concepts (often operationalized as research variables) and their relationships, the research methods used and the contextual relations from specific sections of the text; (3) integrating similar concepts and relationships across different abstracts; and (4) combining and organizing the different kinds of information using a variable-based framework, and presenting them in an interactive web-based interface. The accuracy of each summarization step was evaluated by comparing the system-generated output against human coding. The user evaluation carried out in the study indicated that the majority of subjects (70%) preferred the concept-based summaries generated using the system to the sentence-based summaries generated using traditional sentence extraction techniques.
international health informatics symposium | 2012
Lorraine Goeuriot; Jin-Cheon Na; Wai Yan Min Kyaing; Christopher S. G. Khoo; Yun-Ke Chang; Yin-Leng Theng; Jung-Jae Kim
Opinion mining consists in extracting from a text opinions expressed by its author and their polarity. Lexical resources, such as polarized lexicons, are needed for this task. Opinion mining in the medical domain has not been well explored, partly because little credence is given to patients and their opinions (although more and more of them are using social media). We are interested in opinion mining of user-generated content on drugs/medication. We present in this paper the creation of our lexical resources and their adaptation to the medical domain. We first describe the creation of a general lexicon, containing opinion words from the general domain and their polarity. Then we present the creation of a medical opinion lexicon, based on a corpus of drug reviews. We show that some words have a different polarity in the general domain and in the medical one. Some words considered generally as neutral are opinionated in medical texts. We finally evaluate the lexicons and show with a simple algorithm that using our general lexicon gives better results than other well-known ones on our corpus and that adding the domain lexicon improves them as well.
international conference on conceptual structures | 1994
Sung-Hyun Myaeng; Christopher S. G. Khoo; Ming Li
This paper describes our large-scale effort to build a conceptual Information Retrieval system that converts a large volume of natural language text into Conceptual Graph representation by means of knowledge-based processing. In order to automatically extract concepts and conceptual relations between concepts from texts, we constructed a knowledge base consisting of over 12,000 case frames for verbs and a large number of other linguistic patterns that reveal conceptual relations. They were used to process a Wall Street Journal database covering a period of three years. We describe our methods for constructing the knowledge base, how the linguistic knowledge is used to process the text, and how the retrieval system makes use of the rich representation of documents and information needs.