Chulyun Kim
Gachon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chulyun Kim.
Information Systems | 2011
Won Ho Kim; Ok-Ran Jeong; Chulyun Kim; Jungmin So
The Internet and Web technologies have originally been developed assuming an ideal world where all users are honorable. However, the dark side has emerged and bedeviled the world. This includes spam, malware, hacking, phishing, denial of service attacks, click fraud, invasion of privacy, defamation, frauds, violation of digital property rights, etc. The responses to the dark side of the Internet have included technologies, legislation, law enforcement, litigation, public awareness efforts, etc. In this paper, we explore and provide taxonomies of the causes and costs of the attacks, and types of responses to the attacks.
Journal of Systems and Software | 2007
Chulyun Kim; Jong-Hwa Lim; Raymond T. Ng; Kyuseok Shim
Discovering sequential patterns is an important problem for many applications. Existing algorithms find qualitative sequential patterns in the sense that only items are included in the patterns. However, for many applications, such as business and scientific applications, quantitative attributes are often recorded in the data, which are ignored by existing algorithms. Quantity information included in the mined sequential patterns can provide useful insight to the users. In this paper, we consider the problem of mining sequential patterns with quantities. We demonstrate that naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. To alleviate the situation, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions. Experimental results confirm that compared with the naive extensions, these schemes not only improve the execution time substantially but also show better scalability for sequential patterns with quantities.
IEEE Transactions on Knowledge and Data Engineering | 2011
Chulyun Kim; Kyuseok Shim
World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.
International Journal of Web Information Systems | 2011
Ok-Ran Jeong; Chulyun Kim; Won Ho Kim; Jungmin So
Purpose – A botnet is a network of computers on the internet infected with software robots (or bots). There are numerous botnets, and some of them control millions of computers. Cyber criminals use botnets to launch spam e‐mails and denial of service attacks; and commit click fraud and data theft. Governments use botnets for political purposes or to wage cyber warfare. The purpose of this paper is to review the botnet threats and the responses to the botnet threats.Design/methodology/approach – The paper describes how botnets are created and operated. Then, the paper discusses botnets in terms of architecture, attacking behaviors, communication protocols, observable botnet activities, rally mechanisms, and evasion techniques. Finally, the paper reviews state‐of‐the‐art techniques for detecting and counteracting botnets, and also legal responses to botnet threats.Findings – Botnets have become the platform for many online threats such as spam, denial of service attacks, phishing, data thefts, and online fr...
international conference on data engineering | 2004
Chulyun Kim; Jong-Hwa Lim; Raymond T. Ng; Kyuseok Shim
In this paper, we consider the problem of mining sequential patterns with quantities. Naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. To alleviate the situation, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions.
Information Systems | 2015
Chulyun Kim; Kyuseok Shim
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework. Author-HighlightsWe developed a set-similarity join solution in NoSQL using MapReduce.Our set-similarity join algorithm can avoid redundant comparisons between join attribute values in the MapReduce framework.We decreased substantially the amount of network traffic in the MapReduce framework.We reduced the number of comparisons to find all similar pairs by extending the prefix filtering technique for the MapReduce Framework.Our solution resulted in up to an order of magnitude improvement in performance over the most efficient existing solution.
International Journal of Data Warehousing and Mining | 2014
Won Kim; Ok-Ran Jeong; Chulyun Kim
Today there is much hype about big data. The discussions seem to revolve around data mining technology, social Web data, and the open source platform of NoSQL and Hadoop. However, database, data warehouse and OLAP technologies are also integral parts of big data. Big data involves data from all sources, not just social Web data. Further, big data requires not only technology, but also a painstaking process for identifying, collecting, and preparing sufficient amounts of relevant data. This paper provides a holistic view of big data.
Information Systems | 2011
Chulyun Kim; Hui Miao; Kyuseok Shim
Abstract As the Internet flourishes, online advertising becomes essential for marketing campaigns for business applications. To perform a marketing campaign, advertisers provide their advertisements to Internet publishers and commissions are paid to the publishers of the advertisements based on the clicks made for the posted advertisements or the purchases of the products of which advertisements posted. Since the payment given to a publisher is proportional to the amount of clicks received for the advertisements posted by the publisher, dishonest publishers are motivated to inflate the number of clicks on the advertisements hosted on their web sites. Since the click frauds are critical for online advertising to be reliable, the online advertisers make the efforts to prevent them effectively. However, the methods used for click frauds are also becoming more complex and sophisticated. In this paper, we study the problem of detecting coalition attacks of click frauds. The coalition attacks of click fraud is one of the latest sophisticated techniques utilized for click frauds because the fraudsters can obtain not only more gain but also less probability of being detected by joining a coalition. We introduce new definitions for the coalition and propose the novel algorithm called CATCH to find such coalitions. Extensive experiments with synthetic and real-life data sets confirm that our notion of coalition allows us to detect coalitions much more effectively than that of previous work.
information integration and web-based applications & services | 2010
Won Ho Kim; Ok-Ran Jeong; Chulyun Kim; Jungmin So
A botnet is a network of computers on the Internet infected with software robots, bots. There are numerous botnets. Some of them control millions of computers. Botnets have become the platform for the scourge of the Internet, namely, spam e-mails, launch denial of service attacks, click fraud, theft of sensitive information, cyber sabotage, cyber warfare, etc. In this paper, we review the status of the botnets, how they work, and how they may be defeated.
Multimedia Tools and Applications | 2015
Chulyun Kim
Currently, the size of data becomes much larger and the distributed data processing is getting very important to manage the huge size of data. The MapReduce well known as Google’s data processing environment is the most popular distributed platform with good scalability and fault tolerance. Many traditional algorithms in the single machine environment are being adopted to the MapReduce platform. In this paper we analyze a novel algorithm to generate wavelet synopses on the distributed MapReduce framework. Wavelet synopsis is one of the most popular dimensionality reduction methods and has been studied in various areas such as query optimization, approximate query answering, feature selection, etc. In the proposed algorithm, the wavelet synopsis can be calculated by a single MapReduce phase, and, by minimizing the amount of data communicated through the network of the distributed MapReduce platform, all computations are processed within almost linear time complexity. We theoretically study the properties of constructing wavelet synopsis on partitioned data sets and the correctness of the proposed algorithm.