Jongwoo Ha | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jongwoo Ha is active.

Explore More

Publication

Featured researches published by Jongwoo Ha.

ACM Transactions on The Web | 2013

Semantic contextual advertising based on the open directory project

Jung Hyun Lee; Jongwoo Ha; Jin Yong Jung; SangKeun Lee

Contextual advertising seeks to place relevant textual ads within the content of generic webpages. In this article, we explore a novel semantic approach to contextual advertising. This consists of three tasks: (1) building a well-organized hierarchical taxonomy of topics, (2) developing a robust classifier for effectively finding the topics of pages and ads, and (3) ranking ads based on the topical relevance to pages. First, we heuristically build our own taxonomy of topics from the Open Directory Project (ODP). Second, we investigate how to increase classification accuracy by taking the unique characteristics of the ODP into account. Last, we measure the topical relevance of ads by applying a link analysis technique to the similarity graph carefully derived from our taxonomy. Experiments show that our classification method improves the performance of Ma-F1 by as much as 25.7% over the baseline classifier. In addition, our ranking method enhances the relevance of ads substantially, up to 10% in terms of precision at k, compared to a representative strategy.

IEEE Internet Computing | 2014

EPE: An Embedded Personalization Engine for Mobile Users

Jongwoo Ha; Jung Hyun Lee; SangKeun Lee

The proposed embedded personalization engine (EPE) utilizes valuable in-device usage data for inferring mobile user interests in a privacy-preserving manner. To provide users with personalized services, the proposed approach analyzes both the usage data inside a mobile device and service items--such as news articles and mobile apps--using the Open Directory Project (ODP) as a knowledge base. Embedded classification and ranking methodologies effectively match such service items with inferred user interests. The scenario-based evaluation clearly shows that the proposed EPE gives users highly personalized services with both reasonable perceived latency and little energy consumption.

web information and data management | 2009

Novel web page classification techniques in contextual advertising

Jung Jin Lee; Jung Hyun Lee; Jongwoo Ha; SangKeun Lee

Contextual advertising seeks to place relevant ads to generic web pages based on their contents. Recently, it has been observed that classifying web pages into a well-organized taxonomy of topics is promising for matching topically relevant ads to web pages. Following the observation, in this paper we propose two methods to increase classification accuracy for web pages in the context of contextual advertising. Our strategy is to enhance the baseline classifier by reflecting unique features of web pages and the taxonomy. In particular, category tags extracted from web pages are utilized to augment term weights, and the hierarchical structure of the taxonomy is taken into account to categorize web pages with high confidence. We conduct a series of experiments to evaluate the proposed methods, and the results show that classification accuracy is increased up to 11% compared to the baseline classifier.

international conference data science | 2014

Toward robust classification using the Open Directory Project

Jongwoo Ha; Jung Hyun Lee; Won Jun Jang; Yong Ku Lee; SangKeun Lee

The Open Directory Project (ODP) is a large scale, high quality and publicly available web directory utilized in many studies and real-world applications. In this paper, we explore training data expansion techniques for text classification as one of the possible directions to deal with the sparse characteristic of the ODP dataset. We propose a dozen classification methods, which can be differentiated by (1) from which categories training data is expanded, and (2) how the expanded training data is merged to generate centroid vectors. Evaluation results show that training data expansion significantly improves the classification performance more than representative classifiers. We also find that (1) child and descendant categories are more valuable sources to expand training data than parent and ancestor categories, and (2) distance-based weighting is superior to simple averaging to merge the expanded training data.

database systems for advanced applications | 2009

Fractional PageRank Crawler: Prioritizing URLs Efficiently for Crawling Important Pages Early

Md. Hijbul Alam; Jongwoo Ha; SangKeun Lee

Crawling important pages early is a well studied problem. However, the availability of different types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

acm symposium on applied computing | 2012

Extending Open Directory Project to represent user interests

Seulgi So; Jung Hyun Lee; Daoun Jung; Jongwoo Ha; SangKeun Lee

Effective inference of user interests is crucial to personalization. Utilizing the Open Directory Project (ODP) categories is an effective way to infer user interests, which represents user interests in the form of ODP categories, i.e., nouns. In this paper, we build a knowledge base to represent user interests in the form of (noun, verb) pairs. We expect that this approach will enable us to represent user interests more precisely, since verbs clarify the context of nouns. To this end, we develop a verb extraction engine that extends ODP categories with their related verbs. It employs various information sources to automatically identify a set of related verbs for an arbitrary ODP category. Thus, we obtain the extended ODP categories in the form of (noun, verb) pairs that will be utilized for various personalization services. The experimental results show the efficacy of our verb extraction engine.

database and expert systems applications | 2009

Energy Efficient and Progressive Strategy for Processing Skyline Queries on Air

Jongwoo Ha; Yoon Kyung Kwon; Jae Ho Choi; SangKeun Lee

Computing skyline and its variations is attracting a lot of attention in the database community, however, processing the queries in wireless broadcast environments is an uncovered problem despite of its unique benefits compared to the other environments. In this paper, we propose a strategy to process skyline queries for the possible expansion of current data broadcasting services. For the energy efficient processing of the skyline queries, the Sweep space-filling curve is utilized based on the existing DSI structure to generate broadcast program at a server side. The corresponding algorithms of processing skyline queries are also proposed for the mobile clients. Moreover, we extend the DSI structure based on a novel concept of Minimized Dominating Points (MDP) in order to provide a progressive algorithm of the queries. We evaluate our strategy by performing a simulation, and the experimental results demonstrate the energy efficiency of the proposed methods.

Information Sciences | 2015

XQStream++: Fast tuple extraction algorithm for streaming XML data

Byung Gul Ryu; Jongwoo Ha; SangKeun Lee

Abstract Tuple extraction from streaming XML should be cost effective for real-time query evaluation. Recently, StreamTX exhibits a good performance in terms of both running time and memory usage to support the tuple extraction queries for streaming XML. However, we empirically observe that StreamTX incurs computational overhead unnecessarily, since it builds on TwigStack , an XML query processing algorithm originally developed for stored XML. In this paper, we first design a non-recursive XQStream algorithm to handle inefficient recursive calls of StreamTX . Subsequently, we extend the basic XQStream by incorporating two novel schemes: (1) the relational pointer to efficiently and effectively evaluate the structural relationship of elements, and (2) the pattern reuse to reduce redundant path evaluations for pattern matching. The performance evaluation on various datasets provides new empirical findings. First, XQStream++ , which incorporates the relational pointer and the pattern reuse scheme into XQStream , significantly outperforms the state-of-the-art algorithms in running time with a small, nearly constant memory usage. Second, the most recently released XQuery engines outperform StreamTX in running time.

Ksii Transactions on Internet and Information Systems | 2012

Vocabulary Expansion Technique for Advertisement Classification

Jin Yong Jung; Jung Hyun Lee; Jongwoo Ha; SangKeun Lee

Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.

Knowledge and Information Systems | 2012