Chulyun Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chulyun Kim is active.

Explore More

Publication

Featured researches published by Chulyun Kim.

Information Systems | 2011

The dark side of the Internet: Attacks, costs and responses

Won Ho Kim; Ok-Ran Jeong; Chulyun Kim; Jungmin So

The Internet and Web technologies have originally been developed assuming an ideal world where all users are honorable. However, the dark side has emerged and bedeviled the world. This includes spam, malware, hacking, phishing, denial of service attacks, click fraud, invasion of privacy, defamation, frauds, violation of digital property rights, etc. The responses to the dark side of the Internet have included technologies, legislation, law enforcement, litigation, public awareness efforts, etc. In this paper, we explore and provide taxonomies of the causes and costs of the attacks, and types of responses to the attacks.

Journal of Systems and Software | 2007

SQUIRE: Sequential pattern mining with quantities

Chulyun Kim; Jong-Hwa Lim; Raymond T. Ng; Kyuseok Shim

Discovering sequential patterns is an important problem for many applications. Existing algorithms find qualitative sequential patterns in the sense that only items are included in the patterns. However, for many applications, such as business and scientific applications, quantitative attributes are often recorded in the data, which are ignored by existing algorithms. Quantity information included in the mined sequential patterns can provide useful insight to the users. In this paper, we consider the problem of mining sequential patterns with quantities. We demonstrate that naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. To alleviate the situation, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions. Experimental results confirm that compared with the naive extensions, these schemes not only improve the execution time substantially but also show better scalability for sequential patterns with quantities.

IEEE Transactions on Knowledge and Data Engineering | 2011

TEXT: Automatic Template Extraction from Heterogeneous Web Pages

Chulyun Kim; Kyuseok Shim

World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.

International Journal of Web Information Systems | 2011

Botnets: threats and responses

Ok-Ran Jeong; Chulyun Kim; Won Ho Kim; Jungmin So

Purpose – A botnet is a network of computers on the internet infected with software robots (or bots). There are numerous botnets, and some of them control millions of computers. Cyber criminals use botnets to launch spam e‐mails and denial of service attacks; and commit click fraud and data theft. Governments use botnets for political purposes or to wage cyber warfare. The purpose of this paper is to review the botnet threats and the responses to the botnet threats.Design/methodology/approach – The paper describes how botnets are created and operated. Then, the paper discusses botnets in terms of architecture, attacking behaviors, communication protocols, observable botnet activities, rally mechanisms, and evasion techniques. Finally, the paper reviews state‐of‐the‐art techniques for detecting and counteracting botnets, and also legal responses to botnet threats.Findings – Botnets have become the platform for many online threats such as spam, denial of service attacks, phishing, data thefts, and online fr...

international conference on data engineering | 2004

SQUIRE: sequential pattern mining with quantities

Chulyun Kim; Jong-Hwa Lim; Raymond T. Ng; Kyuseok Shim

In this paper, we consider the problem of mining sequential patterns with quantities. Naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. To alleviate the situation, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions.

Information Systems | 2015

Supporting set-valued joins in NoSQL using MapReduce

Chulyun Kim; Kyuseok Shim

NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework. Author-HighlightsWe developed a set-similarity join solution in NoSQL using MapReduce.Our set-similarity join algorithm can avoid redundant comparisons between join attribute values in the MapReduce framework.We decreased substantially the amount of network traffic in the MapReduce framework.We reduced the number of comparisons to find all similar pairs by extending the prefix filtering technique for the MapReduce Framework.Our solution resulted in up to an order of magnitude improvement in performance over the most efficient existing solution.

International Journal of Data Warehousing and Mining | 2014

A Holistic View of Big Data

Won Kim; Ok-Ran Jeong; Chulyun Kim

Today there is much hype about big data. The discussions seem to revolve around data mining technology, social Web data, and the open source platform of NoSQL and Hadoop. However, database, data warehouse and OLAP technologies are also integral parts of big data. Big data involves data from all sources, not just social Web data. Further, big data requires not only technology, but also a painstaking process for identifying, collecting, and preparing sufficient amounts of relevant data. This paper provides a holistic view of big data.

Information Systems | 2011

CATCH: A detecting algorithm for coalition attacks of hit inflation in internet advertising

Chulyun Kim; Hui Miao; Kyuseok Shim

Abstract As the Internet flourishes, online advertising becomes essential for marketing campaigns for business applications. To perform a marketing campaign, advertisers provide their advertisements to Internet publishers and commissions are paid to the publishers of the advertisements based on the clicks made for the posted advertisements or the purchases of the products of which advertisements posted. Since the payment given to a publisher is proportional to the amount of clicks received for the advertisements posted by the publisher, dishonest publishers are motivated to inflate the number of clicks on the advertisements hosted on their web sites. Since the click frauds are critical for online advertising to be reliable, the online advertisers make the efforts to prevent them effectively. However, the methods used for click frauds are also becoming more complex and sophisticated. In this paper, we study the problem of detecting coalition attacks of click frauds. The coalition attacks of click fraud is one of the latest sophisticated techniques utilized for click frauds because the fraudsters can obtain not only more gain but also less probability of being detected by joining a coalition. We introduce new definitions for the coalition and propose the novel algorithm called CATCH to find such coalitions. Extensive experiments with synthetic and real-life data sets confirm that our notion of coalition allows us to detect coalitions much more effectively than that of previous work.

information integration and web-based applications & services | 2010

On botnets

Won Ho Kim; Ok-Ran Jeong; Chulyun Kim; Jungmin So

A botnet is a network of computers on the Internet infected with software robots, bots. There are numerous botnets. Some of them control millions of computers. Botnets have become the platform for the scourge of the Internet, namely, spam e-mails, launch denial of service attacks, click fraud, theft of sensitive information, cyber sabotage, cyber warfare, etc. In this paper, we review the status of the botnets, how they work, and how they may be defeated.

Multimedia Tools and Applications | 2015

Theoretical analysis of constructing wavelet synopsis on partitioned data sets

Chulyun Kim

Currently, the size of data becomes much larger and the distributed data processing is getting very important to manage the huge size of data. The MapReduce well known as Google’s data processing environment is the most popular distributed platform with good scalability and fault tolerance. Many traditional algorithms in the single machine environment are being adopted to the MapReduce platform. In this paper we analyze a novel algorithm to generate wavelet synopses on the distributed MapReduce framework. Wavelet synopsis is one of the most popular dimensionality reduction methods and has been studied in various areas such as query optimization, approximate query answering, feature selection, etc. In the proposed algorithm, the wavelet synopsis can be calculated by a single MapReduce phase, and, by minimizing the amount of data communicated through the network of the distributed MapReduce platform, all computations are processed within almost linear time complexity. We theoretically study the properties of constructing wavelet synopsis on partitioned data sets and the correctness of the proposed algorithm.

Explore More