Tobun Dorbin Ng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tobun Dorbin Ng is active.

Explore More

Publication

Featured researches published by Tobun Dorbin Ng.

Journal of the Association for Information Science and Technology | 1997

A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system

Hsinchun Chen; Tobun Dorbin Ng; Joanne Martinez; Bruce R. Schatz

This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive stud!es related to the vcrcabulaw problem and vocabulary-based search aids (thesauri) and then discuss technques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we racentiy conducted an experiment in the molecular biology domain in whch we created a C. eksgans worm thesaurus of 7,657 worm-specific terms and a Drosophila fty thesaurus of 15,626 terms. About 30% of these terms overtappad, which created vocabulary paths from one subject domain to the other. Based on a cognitive study of term association involving four biologists, we found that a large percentage (59.6-85.6”A ) of the terms suggested by the subjects were identified in the conjoined fly-worm thesaurus. However, we found only a small parentage (6.4-18.1 %) of the associations suggested by the subjects in the thesaurus. In a follow-up document retrieval study involving eight fly biologists, an actual worm database (Worm Community System), and the conjoined flywonn thesaurus, subjects were able to find more relevant documents (an increase from about 9 documents to 20) and to improve the document recall level (from 32.41 to 65.28% ) when using the thesaurus, although the precision level did not improve significantly. Implications of adopting the concept space approach for addressing the vocabulary

Journal of the Association for Information Science and Technology | 1995

An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation

Hsinchun Chen; Tobun Dorbin Ng

This paper presents a framework for knowledge discovery and concept exploration. In order to enhance the concept exploration capability of knowledge‐based systems and to alleviate the limitations of the manual browsing approach, we have developed two spreading activation‐based algorithms for concept exploration in large, heterogeneous networks of concepts (e.g., multiple thesauri). One algorithm, which is based on the symbolic AI paradigm, performs a conventional branch‐and‐bound search on a semantic net representation to identify other highly relevant concepts (a serial, optimal search process). The second algorithm, which is based on the neural network approach, executes the Hopfield net parallel relaxation and convergence process to identify “convergent” concepts for some initial queries (a parallel, heuristic search process). Both algorithms can be adopted for automatic, multiple‐thesauri consultation. We tested these two algorithms on a large text‐based knowledge network of about 13,000 nodes (terms) and 80,000 directed links in the area of computing technologies. This knowledge network was created from two external thesauri and one automatically generated thesaurus. We conducted experiments to compare the behaviors and performances of the two algorithms with the hypertext‐like browsing process. Our experiment revealed that manual browsing achieved higher‐term recall but lower‐term precision in comparison to the algorithmic systems. However, it was also a much more laborious and cognitively demanding process. In document retrieval, there were no statistically significant differences in document recall and precision between the algorithms and the manual browsing process. In light of the effort required by the manual browsing process, our proposed algorithmic approach presents a viable option for efficiently traversing large‐scale, multiple thesauri (knowledge network).

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1996

A parallel computing approach to creating engineering concept spaces for semantic retrieval: the Illinois Digital Library Initiative project

Hsinchun Chen; Bruce R. Schatz; Tobun Dorbin Ng; Joanne Martinez; Amy Kirchhoff; Chienting Lin

This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer to as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we conducted experiments using the concept space approach on parallel supercomputers. Our test collection included computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising.

IEEE Intelligent Systems | 1993

Generating, integrating, and activating thesauri for concept-based document retrieval

Hsinchun Chen; Kevin J. Lynch; Koushik Basu; Tobun Dorbin Ng

A blackboard-based document management system that uses a neural network spreading-activation algorithm which lets users traverse multiple thesauri is discussed. Guided by heuristics, the algorithm activates related terms in the thesauri and converges of the most pertinent concepts. The system provides two control modes: a browsing module and an activation module that determine the sequence of operations. With the browsing module, users have full control over which knowledge sources to browse and what terms to select. The systems query formation; the retrieving, ranking and selection of documents; and thesaurus activation are described. >

acm multimedia | 2002

Collages as dynamic summaries for news video

Michael G. Christel; Alexander G. Hauptmann; Howard D. Wactlar; Tobun Dorbin Ng

This paper introduces the video collage, a novel effective interface for browsing and interpreting video collections. The paper discusses how collages are automatically produced, illustrates their use, and evaluates their effectiveness as summaries across news stories. Collages are presentations of text and images derived from multiple video sources, which provide an interactive visualization for a set of video documents, summarizing their contents and providing a navigation aid for further exploration. The dynamic creation of collages is based on user context, e.g., an originating query, coupled with automatic processing to refine the candidate imagery. Named entity identification and common phrase extraction provides descriptive text. The dynamic manipulation of collages allows user-directed browsing and reveals additional detail. The utility of collages as summaries is examined with respect to other published news summaries.

Artificial Intelligence Review | 1999

Medical Data Mining on the Internet: Research on a Cancer Information System

Andrea L. Houston; Hsinchun Chen; Susan M. Hubbard; Bruce R. Schatz; Tobun Dorbin Ng; Robin R. Sewell; Kristin M. Tolle

This paper discusses several data mining algorithms and techniques thatwe have developed at the University of Arizona Artificial Intelligence Lab.We have implemented these algorithms and techniques into severalprototypes, one of which focuses on medical information developed incooperation with the National Cancer Institute (NCI) and the University ofIllinois at Urbana-Champaign. We propose an architecture for medicalknowledge information systems that will permit data mining across severalmedical information sources and discuss a suite of data mining tools that weare developing to assist NCI in improving public access to and use of theirexisting vast cancer information collections.

Journal of the Association for Information Science and Technology | 1998

Alleviating search uncertainty through concept associations: automatic indexing, co-occurrence analysis, and parallel computing

Hsinchun Chen; Joanne Martinez; Amy Kirchhoff; Tobun Dorbin Ng; Bruce R. Schatz

In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurrence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400,000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compared with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in concept recall, but in concept precision the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase variety in search terms and thereby reduce search uncertainty

Storage and Retrieval for Image and Video Databases | 2003

Video Retrieval using Speech and Image Information

Alexander G. Hauptmann; Rong Jin; Tobun Dorbin Ng

Video contains multiple types of audio and visual information, which are difficult to extract, combine or trade-off in general video information retrieval. This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection. A number of different sources of information are present in most typical broadcast video collections and can be exploited for information retrieval. We will discuss the contributions of automatically recognized speech transcripts, image similarity matching, face detection and video OCR in the contexts of experiments performed as part of 2001 TREC Video Retrieval Track evaluation performed by the National Institute of Standards and Technology. For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.

decision support systems | 2000

Exploring the use of concept spaces to improve medical information retrieval

Andrea L. Houston; Hsinchun Chen; Bruce R. Schatz; Susan M. Hubbard; Robin R. Sewell; Tobun Dorbin Ng

This research investigated the application of techniques successfully used in previous information retrieval research, to the more challenging area of medical informatics. It was performed on a biomedical document collection testbed, . CANCERLIT, provided by the National Cancer Institute NCI , which contains information on all types of cancer therapy. The quality or usefulness of terms suggested by three different thesauri, one based on MeSH terms, one based solely on . terms from the document collection, and one based on the Unified Medical Language System UMLS Metathesaurus, was explored with the ultimate goal of improving CANCERLIT information search and retrieval. Researchers affiliated with the University of Arizona Cancer Center evaluated lists of related terms suggested by different thesauri for 12 different directed searches in the CANCERLIT testbed. The preliminary results indicated that among the thesauri, there were no statistically significant differences in either term recall or precision. Surprisingly, there was almost no overlap of relevant terms suggested by the different thesauri for a given search. This suggests that recall could be significantly improved by using a combined thesaurus approach. q 2000 Elsevier Science B.V. All rights reserved.

Journal of Chemical Information and Computer Sciences | 1995

Using Backpropagation Networks for the Estimation of Aqueous Activity Coefficients of Aromatic Organic Compounds

Hsiao-Hui Chow; Hsinchun Chen; Tobun Dorbin Ng; Paul B. Myrdal; Samuel H. Yalkowsky

This research examined the applicability of using a neural network approach to the estimation of aqueous activity coefficients of aromatic organic compounds from fragmented structural information. A set of 95 compounds was used to train the neural network, and the trained network was tested on a set of 31 compounds. A comparison was made between the results and those obtained using multiple linear regression analysis. With the proper selection of neural network parameters, the backpropagation network provided a more accurate prediction of the aqueous activity coefficients for testing data than did regression analysis. This research indicates that neural networks have the potential to become a useful analytical technique for quantitative prediction of structure-activity relationships.

Explore More