Woojin Paik
Syracuse University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Woojin Paik.
ACM Transactions on Information Systems | 1994
Elizabeth D. Liddy; Woojin Paik; Edmund S. Yu
The text categorization module described here provides a front-end filtering function for the larger DR-LINK text retrieval system [Liddy and Myaeing 1993]. The model evaluates a large incoming stream of documents to determine which documents are sufficiently similar to a profile at the broad subject level to warrant more refined representation and matching. To accomplish this task, each substantive word in a text is first categorized using a feature set based on the semantic Subject Field Codes (SFCs) assigned to individual word senses in a machine-readable dictionary. When tested on 50 user profiles and 550 megabytes of documents, results indicate that the feature set that is the basis of the text categorization module and the algorithm that establishes the boundary of categories of potentially relevant documents accomplish their tasks with a high level of performance. This means that the category of potentially relevant documents for most profiles would contain at least 80% of all documents later determined to be relevant to the profile. The number of documents in this set would be uniquely determined by the systems category-boundary predictor, and this set is likely to contain less than 5% of the incoming stream of documents.
human language technology | 1993
Elizabeth D. Liddy; Kenneth A. Mcvearry; Woojin Paik; Edmund S. Yu; Mary McKenna
Texts of a particular type evidence a discernible, predictable schema. These schemata can be delineated, and as such provide models of their respective text-types which are of use in automatically structuring texts. We have developed a Text Structurer module which recognizes text-level structure for use within a larger information retrieval system to delineate the discourse-level organization of each documents contents. This allows those document components which are more likely to contain the type of information suggested by the users query to be selected for higher weighting. We chose newspaper text as the first text type to implement. Several iterations of manually coding a randomly chosen sample of newspaper articles enabled us to develop a newspaper text model. This process suggested that our intellectual decomposing of texts relied on six types of linguistic information, which were incorporated into the Text Structurer module. Evaluation of the results of the module led to a revision of the underlying text model and of the Text Structurer itself.
human language technology | 1993
Woojin Paik; Elizabeth D. Liddy; Edmund S. Yu; Mary McKenna
Most of the unknown words in texts which degrade the performance of natural language processing systems are proper nouns. On the other hand, proper nouns are recognized as a crucial source of information for identifying a topic in a text, extracting contents from a text, or detecting relevant documents in information retrieval (Rau, 1991).
acm/ieee joint conference on digital libraries | 2001
Elizabeth D. Liddy; Stuart A. Sutton; Woojin Paik; Eileen Allen; Sarah C. Harwell; Michelle Monsour; Anne M. Turner; Jennifer Liddy
The goal of our 18 month NSDL-funded project is to develop Natural Language Processing and Machine Learning technology which will accomplish automatic metadata generation for individual educational resources in digital collections. The metadata tags that the system will be learning to automatically assign are the full complement of Gateway to Educational Materials (GEM) metadata tags – from the nationally recognized consortium of organizations concerned with access to educational resources. The documents that comprise the sample for this research come from the Eisenhower National Clearinghouse on Science and Mathematics.
Journal of The Korean Society for Information Management | 2006
Jee-Yeon Lee; Woojin Paik
To come up with the recommendations to improve the patent & trademark retrieval efficiency, 100,016 patent & trademark search requests by 17,559 unique users over a period of 193 days were analyzed. By analyzing 2,202 multi-query sessions, where one user issuing two or more queries consecutively, we discovered a number of retrieval efficiency improvements clues. The session analysis result also led to suggestions for new system features to help users reformulating queries. The patent & trademark retrieval users were found to be similar to the typical web users in certain aspects especially in issuing short queries. However, we also found that the patent & trademark retrieval users used Boolean operators more than the typical web search users. By analyzing the multi-query sessions, we found that the users had five intentions in reformulating queries such as paraphrasing, specialization, generalization, alternation, and interruption, which were also used by the web search engine users.
Archive | 1996
Elizabeth D. Liddy; Woojin Paik; Mary McKenna; Ming Li
Archive | 1996
Elizabeth D. Liddy; Woojin Paik; Mary McKenna; Michael L. Weiner; Edmund S. Yu; Theodore G. Diamond; Bhaskaran Balakrishnan; David L. Snyder
Archive | 1997
Woojin Paik; Elizabeth D. Liddy; Jennifer Liddy; Ian Harcourt Niles; Eileen Allen
Archive | 1996
Elizabeth D. Liddy; Woojin Paik; Edmund S. Yu; Ming Li
Archive | 1993
Elizabeth D. Liddy; Woojin Paik; Edmund S. Yu