Jung Tae Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jung Tae Lee is active.

Explore More

Publication

Featured researches published by Jung Tae Lee.

empirical methods in natural language processing | 2008

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Jung Tae Lee; Sang Bum Kim; Young In Song; Hae Chang Rim

Lexical gaps between queries and questions (documents) have been a major issue in question retrieval on large online question and answer (Q&A) collections. Previous studies address the issue by implicitly expanding queries with the help of translation models pre-constructed using statistical techniques. However, since it is possible for unimportant words (e.g., non-topical words, common words) to be included in the translation models, a lack of noise control on the models can cause degradation of retrieval performance. This paper investigates a number of empirical methods for eliminating unimportant words in order to construct compact translation models for retrieval purposes. Experiments conducted on a real world Q&A collection show that substantial improvements in retrieval performance can be achieved by using compact translation models.

international acm sigir conference on research and development in information retrieval | 2012

Finding interesting posts in Twitter based on retweet graph analysis

Min Chul Yang; Jung Tae Lee; Seung Wook Lee; Hae Chang Rim

Millions of posts are being generated in real-time by users in social networking services, such as Twitter. However, a considerable number of those posts are mundane posts that are of interest to the authors and possibly their friends only. This paper investigates the problem of automatically discovering valuable posts that may be of potential interest to a wider audience. Specifically, we model the structure of Twitter as a graph consisting of users and posts as nodes and retweet relations between the nodes as edges. We propose a variant of the HITS algorithm for producing a static ranking of posts. Experimental results on real world data demonstrate that our method can achieve better performance than several baseline methods.

Pattern Recognition Letters | 2012

Content-based mobile spam classification using stylistically motivated features

Dae Neung Sohn; Jung Tae Lee; Kyoung-Soo Han; Hae Chang Rim

The feature of brevity in mobile phone messages makes it difficult to distinguish lexical patterns to identify spam. This paper proposes a novel approach to spam classification of extremely short messages using not only lexical features that reflect the content of a message but new stylistic features that indicate the manner in which the message is written. Experiments on two mobile phone message collections in two different languages show that the approach outperforms previous content-based approaches significantly, regardless of language.

Journal of Computer Science and Technology | 2014

Discovering High-Quality Threaded Discussions in Online Forums

Jung Tae Lee; Min Chul Yang; Hae Chang Rim

Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites.

pacific rim international conference on artificial intelligence | 2008

Combining Local and Global Resources for Constructing an Error-Minimized Opinion Word Dictionary

Linh Hoang; Jung Tae Lee; Young In Song; Hae Chang Rim

A lexical dictionary consisting of opinion words and their polar orientations plays a crucial contribution to opinion mining tasks (e.g., sentiment classification). Previous works on automatic construction of such dictionary have a problem of generating errors (i.e., incorrect identification of polar orientations of words in dictionary). To address the problem, this paper proposes an Error Minimization Algorithm for reducing errors caused by automatic compiling process to construct a reasonable opinion word dictionary. The proposed algorithm combines global and local resources for extracting and refining the dictionary with minimum errors. Empirical results show that our proposed approach is effective for enhancing the performance of the sentiment classification task.

international acm sigir conference on research and development in information retrieval | 2010

High precision opinion retrieval using sentiment-relevance flows

Seung Wook Lee; Jung Tae Lee; Young In Song; Hae Chang Rim

Opinion retrieval involves the measuring of opinion score of a document about the given topic. We propose a new method, namely sentiment-relevance flow, that naturally unifies the topic relevance and the opinionated nature of a document. Experiments conducted over a large-scaled Web corpus show that the proposed approach improves performance of opinion retrieval in terms of precision at top ranks.

international acm sigir conference on research and development in information retrieval | 2009

Finding advertising keywords on video scripts

Jung Tae Lee; Hyung-Dong Lee; Hee Seon Park; Young In Song; Hae Chang Rim

A key to success to contextual in-video advertising is finding advertising keywords on video contents effectively, but there has been little literature in the area so far. This paper presents some preliminary results of our learning-based system that finds relevant advertising keywords on particular scene of video contents using their scripts. The system is trained with not only features proven useful in earlier studies but novel features that reflect the situation of a targeted scene. Experimental results show that the new features are potentially helpful for enhancing the accuracy of keyword extraction for contextual in-video advertising.

intelligent information systems | 2012

A new generative opinion retrieval model integrating multiple ranking factors

Seung Wook Lee; Young In Song; Jung Tae Lee; Kyoung-Soo Han; Hae Chang Rim

In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature.

asia information retrieval symposium | 2012

Detecting Informative Messages Based on User History in Twitter

Chang Woo Chun; Jung Tae Lee; Seung Wook Lee; Hae Chang Rim

Since more and more users participate in various social networking services, the volume of streaming data is considerably increasing. It is necessary to find out valuable messages from huge data archived every moment. This paper investigates the problem of detecting informative messages in Twitter, and proposes effective methods to solve the problem based on User History. Most of the sheer information in tweets has a common defect which is the fact that it is affected by influence of User level within the Twitter network. Our key idea is to leverage each user’s history observed from a large scale dataset as features to determine whether a new message is informative or not, compared to their previous messages. This allows us to normalize influence of individual user on tweets and to estimate the probability of informativeness. Experimental results on a real Twitter data show that our method can effectively improve the performance on identifying informative tweets.

international acm sigir conference on research and development in information retrieval | 2010

Achieving high accuracy retrieval using intra-document term ranking

Hyun Wook Woo; Jung Tae Lee; Seung Wook Lee; Young In Song; Hae Chang Rim

Most traditional ranking models roughly score the relevance of a given document by observing simple term statistics, such as the occurrence of query terms within the document or within the collection. Intuitively, the relative importance of query terms with regard to other individual non-query terms in a document can also be exploited to promote the ranks of documents in which the query is dedicated as the main topic. In this paper, we introduce a simple technique named intra-document term ranking, which involves ranking all the terms in a document according to their relative importance within that particular document. We demonstrate that the information regarding the rank positions of given query terms within the intra-document term ranking can be useful for enhancing the precision of top-retrieved results by traditional ranking models. Experiments are conducted on three standard TREC test collections.

Explore More