Dongyeop Kang
KAIST
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dongyeop Kang.
web search and data mining | 2011
Dongyeop Kang; Daxin Jiang; Jian Pei; Zhen Liao; Xiaohui Sun; Ho-Jin Choi
In addition to search queries and the corresponding clickthrough information, search engine logs record multidimensional information about user search activities, such as search time, location, vertical, and search device. Multidimensional mining of search logs can provide novel insights and useful knowledge for both search engine users and developers. In this paper, we describe our topic-concept cube project, which addresses the business need of supporting multidimensional mining of search logs effectively and efficiently. We answer two challenges. First, search queries and click-through data are well recognized sparse, and thus have to be aggregated properly for effective analysis. Second, there is often a gap between the topic hierarchies in multidimensional aggregate analysis and queries in search logs. To address those challenges, we develop a novel topic-concept model that learns a hierarchy of concepts and topics automatically from search logs. Enabled by the topicconcept model, we construct a topic-concept cube that supports online multidimensional mining of search log data. A distinct feature of our approach is that, in addition to the standard dimensions such as time and location, our topic-concept cube has a dimension of topics and concepts, which substantially facilitates the analysis of log data. To handle a huge amount of log data, we develop distributed algorithms for learning model parameters efficiently. We also devise approaches to computing a topic-concept cube. We report an empirical study verifying the effectiveness and efficiency of our approach on a real data set of 1.96 billion queries and 2.73 billion clicks.
conference on information and knowledge management | 2014
Dongyeop Kang; Woosang Lim; Kijung Shin; Lee Sael; U Kang
How can we scale-up logistic regression, or L1 regularized loss minimization in general, for Terabyte-scale data which do not fit in the memory? How to design the distributed algorithm efficiently? Although there exist two major algorithms for logistic regression, namely Stochastic Gradient Descent (SGD) and Stochastic Coordinate Descent (SCD), they face limitations in distributed environments. Distributed SGD enables data parallelism (i.e., different machines access different part of the input data), but it does not allow feature parallelism (i.e., different machines compute different subsets of the output), and thus the communication cost is high. On the other hand, Distributed SCD allows feature parallelism, but it does not allow data parallelism and thus is not suitable to work in distributed environments. In this paper we propose DF-DSCD (Data/Feature Distributed Stochastic Coordinate Descent), an efficient distributed algorithm for logistic regression, or L1 regularized loss minimization in general. DF-DSCD allows both data and feature parallelism. The benefits of DF-DSCD are (a) full utilization of the capabilities provided by modern distributing computing platforms like MapReduce to analyze web-scale data, and (b) independence of each machine in updating parameters with little communication cost. We prove the convergence of DF-DSCD both theoretically, and also show empirical evidence that it is scalable, handles very high-dimensional data with up to 29 millions of features, and converges 2.2 times faster than competitors.
international conference on data mining | 2014
Dongyeop Kang; Donggyun Han; Nahea Park; Sangtae Kim; U Kang; Soobin Lee
Given massive heterogeneous online media, how can we summarize events, and discover causal relationships among them, in real time? Indeed we are living in a deluge of information, everyday hundreds of thousands of news articles are published, millions of postings from social media and internet forums are written, and billions of search queries are generated by Internet users. To convey user-interested news events and their big pictures for better understanding, building real-time event recommendation system is indispensable. Our proposed system, Eventera, aggregates massive online media from heterogeneous channels, summarizes them into events, discovers meaningful associations by bridging the events, and generates a sequence map of events that provides a big picture of how real life events interact with each other over time. We demonstrate how our system help users understand events and their causal relationships effectively.
Journal of the Korea society of IT services | 2013
Joon-Young Park; Soobin Lee; Dongyeop Kang; YoungTae Seok
The rapid growth and dissemination of touch-based mobile devices such as smart phones and tablet PCs, gives numerous benefits to people using a variety of multimedia contents. Due to its portability, it enables users to watch a soccer game, search video from YouTube, and sometimes tag on contents on the road. However, the limited screen size of mobile devices and touch-based character input methods based on this, are still major problems of searching and tagging multimedia contents. In this paper, we propose WalkieTagging, which provides a much more intuitive way than that of previous one. Just like any other previous video tagging services, WalkieTagging, as a voice-based annotation service, supports inserting detailed annotation data including start time, duration, tags, with little effort of users. To evaluate our methods, we developed the Android-based WalkieTagging application and performed user study via a two-week. Through our experiments by a total of 46 people, we observed that experiment participator think our system is more convenient and useful than that of touch-based one. Consequently, we found out that voice-based annotation methods can provide users with much convenience and satisfaction than that of touch-based methods in the mobile environments.
conference on recommender systems | 2015
Jung-Woo Ha; Dongyeop Kang; Hyuna Pyo; Jeonghee Kim
empirical methods in natural language processing | 2017
Dongyeop Kang; Varun Gangal; Ang Lu; Zheng Chen; Eduard H. Hovy
european conference on machine learning | 2014
Dongyeop Kang; Youngja Park; Suresh Chari
meeting of the association for computational linguistics | 2018
Dongyeop Kang; Tushar Khot; Ashish Sabharwal; Eduard H. Hovy
north american chapter of the association for computational linguistics | 2018
Dongyeop Kang; Waleed Ammar; Bhavana Dalvi; Madeleine van Zuylen; Sebastian Kohlmeier; Eduard H. Hovy; Roy Schwartz
national conference on artificial intelligence | 2018
Chu-Cheng Lin; Patrick Pantel; Michael Gamon; Dongyeop Kang