Is this you? Create Your Porfile

Gae-won You

Pohang University of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gae-won You is active.

Explore More

Publication

Featured researches published by Gae-won You.

acm ifip usenix international conference on middleware | 2011

Scalable load balancing in cluster storage systems

Gae-won You; Seung-won Hwang; Navendu Jain

Enterprise and cloud data centers are comprised of tens of thousands of servers providing petabytes of storage to a large number of users and applications. At such a scale, these storage systems face two key challenges: (a) hot-spots due to the dynamic popularity of stored objects and (b) high reconfiguration costs of data migration due to bandwidth oversubscription in the data center network. Existing storage solutions, however, are unsuitable to address these challenges because of the large number of servers and data objects. This paper describes the design, implementation, and evaluation of Ursa, which scales to a large number of storage nodes and objects and aims to minimize latency and bandwidth costs during system reconfiguration. Toward this goal, Ursa formulates an optimization problem that selects a subset of objects from hot-spot servers and performs topology-aware migration to minimize reconfiguration costs. As exact optimization is computationally expensive, we devise scalable approximation techniques for node selection and efficient divide-and-conquer computation. Our evaluation shows Ursa achieves cost-effective load balancing while scaling to large systems and is time-responsive in computing placement decisions, e .g ., about two minutes for 10K nodes and 10M objects.

Information Sciences | 2008

Search structures and algorithms for personalized ranking

Gae-won You; Seung-won Hwang

As data of an unprecedented scale are becoming accessible on the Web, personalization, of narrowing down the retrieval to meet the user-specific information needs, is becoming more and more critical. For instance, while web search engines traditionally retrieve the same results for all users, they began to offer beta services to personalize the results to adapt to user-specific contexts such as prior search history or other application contexts. In a clear contrast to search engines dealing with unstructured text data, this paper studies how to enable such personalization in the context of structured data retrieval. In particular, we adopt contextual ranking model to formalize personalization as a cost-based optimization over collected contextual rankings. With this formalism, personalization can be abstracted as a cost-optimal retrieval of contextual ranking, closely matching user-specific retrieval context. With the retrieved matching context, we adopt a machine learning approach, to effectively and efficiently identify the ideal personalized ranked results for this specific user. Our empirical evaluations over synthetic and real-life data validate both the efficiency and effectiveness of our framework.

ACM Transactions on Storage | 2013

Ursa: Scalable Load and Power Management in Cloud Storage Systems

Gae-won You; Seung-won Hwang; Navendu Jain

Enterprise and cloud data centers are comprised of tens of thousands of servers providing petabytes of storage to a large number of users and applications. At such a scale, these storage systems face two key challenges: (1) hot-spots due to the dynamic popularity of stored objects; and (2) high operational costs due to power and cooling. Existing storage solutions, however, are unsuitable to address these challenges because of the large number of servers and data objects. This article describes the design, implementation, and evaluation of Ursa, a system that scales to a large number of storage nodes and objects, and aims to minimize latency and bandwidth costs during system reconfiguration. Toward this goal, Ursa formulates an optimization problem that selects a subset of objects from hot-spot servers and performs topology-aware migration to minimize reconfiguration costs. As exact optimization is computationally expensive, we devise scalable approximation techniques for node selection and efficient divide-and-conquer computation. We also show that the same dynamic reconfiguration techniques can be leveraged to reduce power costs by dynamically migrating data off under-utilized nodes, and powering up servers neighboring existing hot-spots to reduce reconfiguration costs. Our evaluation shows that Ursa achieves cost-effective load management, is time-responsive in computing placement decisions (e.g., about two minutes for 10K nodes and 10M objects), and provides power savings of 15%--37%.

Information Sciences | 2012

Interactive skyline queries

Jongwuk Lee; Gae-won You; Seung-won Hwang; Joachim Selke; Wolf-Tilo Balke

When issuing user-specific queries, users often present vague and imprecise information needs. Skyline queries with an intuitive query formulation mechanism identify the most interesting objects for incomplete user preferences. However, the applicability of skyline queries suffers from a severe drawback because incomplete user preferences often lead to an impractical skyline size. To address this problem, we develop an interactive preference elicitation framework - while user preferences are collected at each iteration, the framework iteratively updates skylines. In this process, the framework aims to both minimize user interaction and maximize skyline reduction size, while the query formulation is still intuitive. All that users need to do is thus to answer a few well-chosen questions generated from the framework. We validate the effectiveness and efficiency of our framework in extensive experimental settings, and demonstrate that a few questions are enough to acquire a skyline with a manageable size.

Information Sciences | 2016

Optimizing skyline queries over incomplete data

Jongwuk Lee; Hyeonseung Im; Gae-won You

Skyline queries have been widely used as an attractive operator in multi-criteria decision making applications. Because of the intuitive notion of skyline queries, many skyline algorithms have been developed in various data settings. However, most of the skyline algorithms rely on the assumption of completeness, i.e., all values of points are known. In many cases, because this assumption does not hold, conventional skyline algorithms cannot be applied. To handle incomplete data, existing work redefines the dominance notion by using the common subspace between points. However, it can incur too many pairwise comparisons over incomplete data. To address this problem, we first propose a new sorting-based bucket skyline algorithm using two optimization techniques: bucket- and point-level orders. In case that too few or no skyline points exist over incomplete data, we develop a novel skyline ranking method that adjusts two user-specific parameters for retrieving meaningful skyline points. Lastly, we empirically evaluate the efficiency and effectiveness of our proposed algorithms over both synthetic and real-life datasets.

acm symposium on applied computing | 2007

Personalized ranking: a contextual ranking approach

Gae-won You; Seung-won Hwang

As data of an unprecedented scale are becoming accessible on the Web, personalization, of narrowing down the retrieval to meet the user-specific information needs, is becoming more and more critical. For instance, in the context of text retrieval, in contrast to traditional web search engines retrieving the same results for all users, major commercial search engines are starting to support personalization, improving the search quality by adapting to the user-specific retrieval contexts, e.g., prior search history or other application contexts. This paper studies how to enable such personalization in the context of structured data retrieval. In particular, we adopt context-sensitive ranking model to formalize personalization as a cost-based optimization over context-sensitive rankings collected. With this formalism, personalization is essentially retrieving the context-sensitive ranking matching the specific users retrieval context and generating a personalized ranking accordingly. In particular, we adopt a machine learning approach, to effectively and efficiently identify the ideal personalized ranked results for this specific user. Our empirical evaluations over real-life data validate both the effectiveness and efficiency of our framework.

World Wide Web | 2013

SocialSearch+: enriching social network with web evidences

Gae-won You; Jin-woo Park; Seung-won Hwang; Zaiqing Nie; Ji-Rong Wen

This paper introduces the problem of searching for social network accounts, e.g., Twitter accounts, with the rich information available on the Web, e.g., people names, attributes, and relationships to other people. For this purpose, we need to map Twitter accounts with Web entities. However, existing solutions building upon naive textual matching inevitably suffer low precision due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage “relational” evidences extracted from the Web corpus. We consider two types of evidence resources—First, web-scale entity relationship graphs, extracted from name co-occurrences crawled from the Web. This co-occurrence relationship can be interpreted as an “implicit” counterpart of Twitter follower relationships. Second, web-scale relational repositories, such as Freebase with complementary strength. Using both textual and relational features obtained from these resources, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate that our approach outperforms the baseline approach. We evaluate our proposed system using real-life internet-scale entity-relationship and social network graphs.

ACM Transactions on Information Systems | 2012

Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach

Gae-won You; Seung-won Hwang; Young-In Song; Long Jiang; Zaiqing Nie

This article studies the problem of mining entity translation, specifically, mining English and Chinese name pairs. Existing efforts can be categorized into (a) transliteration-based approaches that leverage phonetic similarity and (b) corpus-based approaches that exploit bilingual cooccurrences. These approaches suffer from inaccuracy and scarcity, respectively. In clear contrast, we use under-leveraged resources of monolingual entity cooccurrences crawled from entity search engines, which are represented as two entity-relationship graphs extracted from two language corpora, respectively. Our problem is then abstracted as finding correct mappings across two graphs. To achieve this goal, we propose a holistic approach to exploiting both transliteration similarity and monolingual cooccurrences. This approach, which builds upon monolingual corpora, complements existing corpus-based work requiring scarce resources of parallel or comparable corpus while significantly boosting the accuracy of transliteration-based work. In addition, by parallelizing the mapping process on multicore architectures, we speed up the computation by more than 10 times per unit accuracy. We validated the effectiveness and efficiency of our proposed approach using real-life datasets.

Information Sciences | 2008

Supporting personalized ranking over categorical attributes

Gae-won You; Seung-won Hwang; Hwanjo Yu

This paper studies how to enable an effective ranked retrieval over data with categorical attributes, in particular, by supporting personalized ranked retrieval of highly relevant data. While ranked retrieval has been actively studied lately, existing efforts have focused only on supporting ranking over numerical or text data. However, many real-life data contain a large amount of categorical attributes, in combination with numerical and text attributes, which cannot be efficiently supported - unlike numerical attributes where a natural ordering is inherent, the existence of categorical attributes with no such ordering complicates both the formulation and processing of ranking. This paper studies the efficient and effective support of ranking over categorical data, as well as uniform support with other types of attributes.

Distributed and Parallel Databases | 2009

Ranking strategies and threats: a cost-based pareto optimization approach

Youngdae Kim; Gae-won You; Seung-won Hwang

Skyline queries have gained attention as an effective way to identify desirable objects that are “not dominated” by another object in the dataset. From market perspective, such objects are favored as pareto-optimal choices, as each of such objects has at least one competitive edge against all other objects, or not dominated. In other words, non-skyline objects have room for pareto-optimal improvements for more favorable positioning in the market. The goal of this paper is, for such non-skyline objects, to identify the cost-minimal pareto-optimal improvement strategy. More specifically, we abstract this problem as a mixed integer programming problem and develop a novel algorithm for efficiently identifying the optimal solution. In addition, the problem can be reversed to identify, for a skyline product, top-k threats that can be competitors after pareto-optimal improvements with the k lowest costs. Through extensive experiments using synthetic and real-life datasets, we show that our proposed framework is both efficient and scalable.

Explore More