Woojin Paik | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Woojin Paik is active.

Explore More

Publication

Featured researches published by Woojin Paik.

ACM Transactions on Information Systems | 1994

Text categorization for multiple users based on semantic features from a machine-readable dictionary

Elizabeth D. Liddy; Woojin Paik; Edmund S. Yu

The text categorization module described here provides a front-end filtering function for the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates a large incoming stream of documents to determine which documents are sufficiently similar to a profile at the broad subject level to warrant more refined representation and matching. To accomplish this task, each substantive word in a text is first categorized using a feature set based on the semantic Subject Field Codes (SFCs) assigned to individual word senses in a machine-readable dictionary. When tested on 50 user profiles and 550 megabytes of documents, results indicate that the feature set that is the basis of the text categorization module and the algorithm that establishes the boundary of categories of potentially relevant documents accomplish their tasks with a high level of performance. This means that the category of potentially relevant documents for most profiles would contain at least 80% of all documents later determined to be relevant to the profile. The number of documents in this set would be uniquely determined by the systems category-boundary predictor, and this set is likely to contain less than 5% of the incoming stream of documents.

human language technology | 1993

Development, implementation and testing of a discourse model for newspaper texts

Elizabeth D. Liddy; Kenneth A. Mcvearry; Woojin Paik; Edmund S. Yu; Mary McKenna

Texts of a particular type evidence a discernible, predictable schema. These schemata can be delineated, and as such provide models of their respective text-types which are of use in automatically structuring texts. We have developed a Text Structurer module which recognizes text-level structure for use within a larger information retrieval system to delineate the discourse-level organization of each documents contents. This allows those document components which are more likely to contain the type of information suggested by the users query to be selected for higher weighting. We chose newspaper text as the first text type to implement. Several iterations of manually coding a randomly chosen sample of newspaper articles enabled us to develop a newspaper text model. This process suggested that our intellectual decomposing of texts relied on six types of linguistic information, which were incorporated into the Text Structurer module. Evaluation of the results of the module led to a revision of the underlying text model and of the Text Structurer itself.

human language technology | 1993

Interpretation of proper nouns for information retrieval

Woojin Paik; Elizabeth D. Liddy; Edmund S. Yu; Mary McKenna

Most of the unknown words in texts which degrade the performance of natural language processing systems are proper nouns. On the other hand, proper nouns are recognized as a crucial source of information for identifying a topic in a text, extracting contents from a text, or detecting relevant documents in information retrieval (Rau, 1991).

acm/ieee joint conference on digital libraries | 2001

Breaking the metadata generation bottleneck: preliminary findings

Elizabeth D. Liddy; Stuart A. Sutton; Woojin Paik; Eileen Allen; Sarah C. Harwell; Michelle Monsour; Anne M. Turner; Jennifer Liddy

The goal of our 18 month NSDL-funded project is to develop Natural Language Processing and Machine Learning technology which will accomplish automatic metadata generation for individual educational resources in digital collections. The metadata tags that the system will be learning to automatically assign are the full complement of Gateway to Educational Materials (GEM) metadata tags – from the nationally recognized consortium of organizations concerned with access to educational resources. The documents that comprise the sample for this research come from the Eisenhower National Clearinghouse on Science and Mathematics.

Journal of The Korean Society for Information Management | 2006

Analysis of Korean Patent & Trademark Retrieval Query Log to Improve Retrieval and Query Reformulation Efficiency

Jee-Yeon Lee; Woojin Paik

To come up with the recommendations to improve the patent & trademark retrieval efficiency, 100,016 patent & trademark search requests by 17,559 unique users over a period of 193 days were analyzed. By analyzing 2,202 multi-query sessions, where one user issuing two or more queries consecutively, we discovered a number of retrieval efficiency improvements clues. The session analysis result also led to suggestions for new system features to help users reformulating queries. The patent & trademark retrieval users were found to be similar to the typical web users in certain aspects especially in issuing short queries. However, we also found that the patent & trademark retrieval users used Boolean operators more than the typical web search users. By analyzing the multi-query sessions, we found that the users had five intentions in reformulating queries such as paraphrasing, specialization, generalization, alternation, and interruption, which were also used by the web search engine users.

Archive | 1996