Yeha Lee
Pohang University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yeha Lee.
european conference on information retrieval | 2009
Seung-Hoon Na; Yeha Lee; Sang-Hyob Nam; Jong-Hyeok Lee
Lexicon-based approaches have been widely used for opinion retrieval due to their simplicity. However, no previous work has focused on the domain-dependency problem in opinion lexicon construction. This paper proposes simple feedback-style learning for query-specific opinion lexicon using the set of top-retrieved documents in response to a query. The proposed learning starts from the initial domain-independent general lexicon and creates a query-specific lexicon by re-updating the opinion probability of the initial lexicon based on top-retrieved documents. Experimental results on recent TREC test sets show that the query-specific lexicon provides a significant improvement over previous approaches, especially in BLOG-06 topics.
european conference on information retrieval | 2009
Sang-Hyob Nam; Seung-Hoon Na; Yeha Lee; Jong-Hyeok Lee
One of the important issues in blog search engines is to extract the cleaned text from blog post. In practice, this extraction process is confronted with many non-relevant contents in the original blog post, such as menu, banner, site description, etc, causing the ranking be less-effective. The problem is that these non-relevant contents are not encoded in a unified way but encoded in many different ways between blog sites. Thus, the commercial vendor of blog sites should consider tuning works such as making human-driven rules for eliminating these non-relevant contents for all blog sites. However, such tuning is a very inefficient process. Rather than this labor-intensive method, this paper first recognizes that many of these non-relevant contents are not changed between several consequent blog posts, and then proposes a simple and effective DiffPost algorithm to eliminate them based on content difference between two consequent blog posts in the same blog site. Experimental result in TREC blog track is remarkable, showing that the retrieval system using DiffPost makes an important performance improvement of about 10% MAP (Mean Average Precision) increase over that without DiffPost.
asia information retrieval symposium | 2008
Seung-Hoon Na; In-Su Kang; Yeha Lee; Jong-Hyeok Lee
Passage retrieval has been expected to be an alternative method to re-solve length-normalization problem, since passages have more uniform lengths and topics, than documents. An important issue in the passage retrieval is to determine the type of the passage. Among several different passage types, the arbitrary passage type which dynamically varies according to query has shown the best performance. However, the previous arbitrary passage type is not fully examined, since it still uses the fixed-length restriction such as n consequent words. This paper proposes a new type of passage, namely completely-arbitrary passages by eliminating all possible restrictions of passage on both lengths and starting positions, and by extremely relaxing the type of the original arbitrary passage. The main advantage using completely-arbitrary passages is that the proximity feature of query terms can be well-supported in the passage retrieval, while the non-completely arbitrary passage cannot clearly support. Experimental result extensively shows that the passage retrieval using the completely-arbitrary passage significantly improves the document retrieval, as well as the passage retrieval using previous non-completely arbitrary passages, on six standard TREC test collections, in the context of language modeling approaches.
asia information retrieval symposium | 2008
Seung-Hoon Na; In-Su Kang; Yeha Lee; Jong-Hyeok Lee
Different from the traditional document-level feedback, passage-level feedback restricts the context of selecting relevant terms to a passage in a document, rather than to the entire document. It can thus avoid the selection of nonrelevant terms from non-relevant parts in a document. The most recent work of passage-level feedback has been investigated from the viewpoint of the fixed-window type of passage. However, the fixed-window type of passage has limitation in optimizing the passage-level feedback, since it includes a query-independent portion. To minimize the query-independence of the passage, this paper proposes a new type of passage, called completely-arbitrary passage. Based on this, we devise a novel two-stage passage feedback - which consists of passage-retrieval and passage-extension as sub-steps, unlike previous single-stage passage feedback relying only on passage retrieval. Experimental results show that the proposed two-stage passage-level feedback much significantly improves the document-level feedback than the single-stage passage feedback that uses the fixed-window type of passage.
Information Retrieval | 2012
Yeha Lee; Seung-Hoon Na; Jong-Hyeok Lee
Blog feed search aims to identify a blog feed of recurring interest to users on a given topic. A blog feed, the retrieval unit for blog feed search, comprises blog posts of diverse topics. This topical diversity of blog feeds often causes performance deterioration of blog feed search. To alleviate the problem, this paper proposes several approaches based on passage retrieval, widely regarded as effective to handle topical diversity at document level in ad-hoc retrieval. We define the global and local evidence for blog feed search, which correspond to the document-level and passage-level evidence for passage retrieval, respectively, and investigate their influence on blog feed search, in terms of both initial retrieval and pseudo-relevance feedback. For initial retrieval, we propose a retrieval framework to integrate global evidence with local evidence. For pseudo-relevance feedback, we gather feedback information from the local evidence of the top K ranked blog feeds to capture diverse and accurate information related to a given topic. Experimental results show that our approaches using local evidence consistently and significantly outperform traditional ones.
international conference on the computer processing of oriental languages | 2009
Jungi Kim; Hun-Young Jung; Sang-Hyeob Nam; Yeha Lee; Jong-Hyeok Lee
This paper proposes a method that automatically creates a subjectivity lexicon in a new language using a subjectivity lexicon in a resource---rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the subjectivity lexicon in the source language as well as create a good quality lexicon in a new language.
International Journal of Computer Processing of Languages | 2009
Jungi Kim; Hun-Young Jung; Yeha Lee; Jong-Hyeok Lee
This paper proposes a method that automatically creates a sentiment lexicon in a new language using a sentiment lexicon in a resource–rich language with only a bilingual dictionary. We resolve some of the difficulties in selecting appropriate senses when translating lexicon, and present a framework that sequentially applies an iterative link analysis algorithm to enhance the quality of lexicons of both the source and target languages. The experimental results have empirically shown to improve the sentiment lexicon in the source language as well as create a good quality lexicon in the new language.
Information Retrieval | 2014
Yeha Lee; Jong-Hyeok Lee
A huge volume of news stories are reported by various news channels, on a daily basis. Subscribing to all the stories and keeping track of the important ones day after day is very time-consuming. This paper proposes several approaches to identify important news stories. To this end, we take advantage of the blogosphere as an information source to evaluate the importance of news stories. Blogs reflect the diverse opinions of bloggers about news stories, and the attention that these stories receive can help estimate the importance of the stories. In this paper, we define the popularity of a news story in the blogosphere as the attention it attracts from users. We measure popularity of the stories in the blogosphere from two viewpoints: content and a timeline. In terms of content, we suggest several approaches to estimate language models for a news story and blog posts, and we evaluate the importance of the story using these language models. Furthermore, we generate a temporal profile of a news story by exploring the timeline of blog posts related to the story, and evaluate its importance based on the temporal profile. We experimentally verify the effectiveness of the proposed approaches for identifying top news stories.
international conference on the computer processing of oriental languages | 2009
Yeha Lee; Jungi Kim; Jong-Hyeok Lee
Sentiment analysis of weblogs is a challenging problem. Most previous work utilized semantic orientations of words or phrases to classify sentiments of weblogs. The problem with this approach is that semantic orientations of words or phrases are investigated without considering the domain of weblogs. Weblogs contain the authors various opinions about multifaceted topics. Therefore, we have to treat a semantic orientation domain-dependently. In this paper, we present an unsupervised learning model based on aspect model to classify sentiments of weblogs. Our model utilizes domain-dependent semantic orientations of latent variables instead of words or phrases, and uses them to classify sentiments of weblogs. Experiments on several domains confirm that our model assigns domain-dependent semantic orientations to latent variables correctly, and classifies sentiments of weblogs effectively.
international conference on the computer processing of oriental languages | 2009
Sang-Hyob Nam; Seung-Hoon Na; Jungi Kim; Yeha Lee; Jong-Hyeok Lee
This paper presents a new partially supervised approach to phrase-level sentiment analysis that first automatically constructs a polarity-tagged corpus and then learns sequential sentiment tag from the corpus. This approach uses only sentiment sentences which are readily available on the Internet and does not use a polarity-tagged corpus which is hard to construct manually. With this approach, the system is able to automatically classify phrase-level sentiment. The result shows that a system can learn sentiment expressions without a polarity-tagged corpus.