Gangin Lee
Sejong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gangin Lee.
Knowledge Based Systems | 2014
Unil Yun; Gangin Lee; Keun Ho Ryu
Frequent pattern mining over data streams is currently one of the most interesting fields in data mining. Current databases have needed more immediate processes since enormous amounts of data are being accumulated and updated in real time. However, existing traditional approaches have not been entirely suitable for a data stream environment since they operate with more than two database scans. Moreover, frequent pattern mining over data streams mostly generates an enormous number of frequent patterns, thereby causing a significant amount of overheads. In addition, as weight conditions are very useful factors in reflecting importance for each object in the real world, it is necessary to apply them to the mining process in order to obtain more practical, meaningful patterns. To consider and solve these problems, we propose a novel method for mining Weighted Maximal Frequent Patterns (WMFPs) over data streams, called MWS (Maximal frequent pattern mining with Weight conditions over data Streams). MWS guarantees efficient mining performance in the data stream environment by scanning stream databases only once, and prevents overheads of pattern extractions with an abbreviated notation: a maximal frequent pattern form instead of the general one. Furthermore, MWS contributes to enhanced reliability of the mining results by applying weight conditions to each element of the data streams. Extensive experiments report that MWS has outstanding performance in comparison to previous algorithms.
Journal of Intelligent and Fuzzy Systems | 2015
Gangin Lee; Unil Yun; Heungmo Ryang
Erasable pattern mining is one of the variations in frequent pattern mining, and its main goal is to maximize the production capacity of manufacturing industries by quickly overcoming the financial crises that can occur in industries. However, existing erasable pattern mining approaches utilize only profit information for each product, but do not consider distinct weights of items organizing the products. In addition, previous algorithms spend much time and space mining erasable patterns due to their inefficient pattern mining methods. In a real-life environment, considering weights of items in each product can be more important compared to calculating product profits only. For this reason, we propose a novel algorithm for mining weighted erasable patterns by considering the distinct weight of each item. Moreover, we discuss both discovering weighted erasable patterns and minimizing the resource availability of erasable pattern mining processes utilizing weight conditions. Especially, our approach has advantages for time and space resource consumption compared to existing approaches because our algorithm uses a pattern pruning method and stores item information included in product databases by using both compact tree and hash list structures. We present performance evaluation utilizing both real and synthetic datasets to demonstrate the efficiency of our algorithm.
Future Generation Computer Systems | 2017
Gangin Lee; Unil Yun
Abstract The concept of uncertain pattern mining was recently proposed to fulfill the demand for processing databases with uncertain data, and various relevant methods have been devised. However, previous approaches have the following limitations. State-of-the-art methods based on tree structure can cause fatal problems in terms of runtime and memory usage according to the characteristics of uncertain databases and threshold settings because their own tree data structures can become excessively large and complicated in their mining processes. Various approximation approaches have been suggested in order to overcome such problems; however, they are methods that increase their own mining performance at the cost of accuracy of the mining results. In order to solve the problems, we propose an exact, efficient algorithm for mining uncertain frequent patterns based on novel data structures and mining techniques, which can also guarantee the correctness of the mining results without any false positives. The newly proposed list-based data structures and pruning techniques allow a complete set of uncertain frequent patterns to be mined more efficiently without pattern losses. We also demonstrate that the proposed algorithm outperforms previous state-of-the art approaches in both theoretical and empirical aspects. Especially, we provide analytical results of performance evaluation for various types of datasets to show efficiency of runtime, memory usage, and scalability in our method.
Expert Systems With Applications | 2016
Unil Yun; Gangin Lee
We propose an incremental mining algorithm that finds weighted maximal frequent itemsets.We devise strategies for guaranteeing correctness of the proposed algorithm.We suggest performance improving techniques for the incremental pattern mining.We provide extensive, comprehensive performance evaluation results. Frequent itemset mining allows us to find hidden, important information from large databases. Moreover, processing incremental databases in the itemset mining area has become more essential because a huge amount of data has been accumulated continually in a variety of application fields and users want to obtain mining results from such incremental data in more efficient ways. One of the major problems in incremental itemset mining is that the corresponding mining results can be very large-scale according to threshold settings and data volumes. In addition, it is considerably hard to analyze all of them and find meaningful information. Furthermore, not all of the mining results become actually important information. In this paper, to solve these problems, we propose an algorithm for mining weighted maximal frequent itemsets from incremental databases. By scanning a given incremental database only once, the proposed algorithm can not only conduct its mining operations suitable for the incremental environment but also extract a smaller number of important itemsets compared to previous approaches. The proposed method also has an effect on expert and intelligent systems since it can automatically provide more meaningful pattern results reflecting characteristics of given incremental databases and threshold settings, which can help users analyze the given data more easily. Our comprehensive experimental results show that the proposed algorithm is more efficient and scalable than previous state-of-the-art algorithms.
Future Generation Computer Systems | 2016
Unil Yun; Gangin Lee
Abstract As one of the variations in frequent pattern mining, erasable pattern mining discovers patterns with benefits lower than or equal to a user-specified threshold from a product database. Although traditional erasable pattern mining algorithms can perform their own mining operations on static mining environments, they are not suitable for dealing with dynamic data stream environments. In such dynamic data streams, algorithms have to process them immediately with only one database scan in order to consider characteristics of data stream mining. However, previous tree-based erasable pattern mining methods have difficulty in processing dynamic data streams because they need two or more database scans to construct their own tree structures. In addition, they do not also consider specific information of each item within a product database, but they need to conduct mining operations considering such additional information of the items in order to find more useful erasable pattern results. For this reason, in this paper, we propose a weighted erasable pattern mining algorithm suitable for sliding window-based data stream environments. The algorithm employs tree and list data structures for more efficient mining processes and solves the problems of previous erasable pattern mining approaches by using a sliding window-based stream processing technique and an item weight-based pattern pruning method. We compare performance of the proposed algorithm to state-of-the-art tree-based approaches with respect to various real and synthetic datasets. Experimental results show that our method is more efficient and scalable than the competitors in terms of runtime, memory, and pattern generation.
Knowledge Based Systems | 2017
Unil Yun; Heungmo Ryang; Gangin Lee; Hamido Fujita
High utility pattern mining has been actively researched as one of the significant topics in the data mining field since this approach can solve the limitation of traditional pattern mining that cannot fully consider characteristics of real world databases. Moreover, database volumes have been bigger gradually in various applications such as sales data of retail markets and connection information of web services, and general methods for static databases are not suitable for processing dynamic databases and extracting useful information from them. Although incremental utility pattern mining approaches have been suggested, previous approaches need at least two scans for incremental utility pattern mining irrespective of using any structure. However, the approaches with multiple scans are actually not adequate for stream environments. In this paper, we propose an efficient algorithm for mining high utility patterns from incremental databases with one database scan based on a list-based data structure without candidate generation. Experimental results with real and synthetic datasets show that the proposed algorithm outperforms previous one phase construction methods with candidate generation.
Cluster Computing | 2015
Jiwon Kim; Unil Yun; Heungmo Ryang; Gangin Lee; Eunchul Yoon; Keun Ho Ryu
In recent years, with the increasing number of blogs to share information, the ratio of blogs on the World Wide Web has been raised. In this regard, the problem of information quality has come up due to the rapidly increasing amount of information in a blogosphere. Therefore, discovering good quality information is one of the significant issues in the blog space. In this paper, we propose an algorithm for efficient blog ranking, called WCT (a blog ranking algorithm using Weighted Comments and Trackbacks). This method performs a ranking process through not only interconnection analysis of blogs but also structural weights for contents in the blogs. Moreover, we conduct performance evaluation and discuss the performance between our algorithm and a previous algorithm by comparing their experimental results, which show that our approach has higher performance than that of the other blog retrieval method.
Journal of Intelligent and Fuzzy Systems | 2016
Unil Yun; Donggyu Kim; Heungmo Ryang; Gangin Lee; Kyung-Min Lee
Utility pattern mining is a technique that finds valuable patterns from large-sized databases with each items importance and quantity information associated with it. The representative utility pattern mining technique, high utility pattern mining (HUPM), calculates the utilities of patterns by summating all of the item utilities in the patterns. However, such utility measures for patterns in HUPM have a drawback in whichpatterns with long lengths tend to have utilities sufficient to become high utility patterns. For these reasons, high average utility pattern mining (HAUPM) employing different utility measures has been studied in order to consider such pattern length factors. Recently, techniques for handling stream data are necessary because many data sources, e.g. sensors and POS devices, produce data in real time. However, all the existing HAUPM algorithms are unable to find up-to-date, meaningful patterns over data streams. We thus propose the first sliding window based HAUPM algorithm discovering recent high average utility patterns over data streams. Based on the sliding window model, our algorithm divides stream data into numerous batches, and keeps only recent batches in its window. Thereby, the algorithm can mine recent, important patterns over data streams. We also introduce a new strategy that enhances the performance of our algorithm by minimizing the overestimated average utilities stored in the proposed data structure. The experimental results show that our algorithm outperforms the competitors.
International Journal of Pattern Recognition and Artificial Intelligence | 2016
Gangin Lee; Unil Yun; Heungmo Ryang; Donggyu Kim
Since the concept of frequent pattern mining was proposed, there have been many efforts to obtain useful pattern information from large databases. As one of them, applying weight conditions allows us to mine weighted frequent patterns considering unique importance of each item composing databases, and the result of analysis for the patterns provides more useful information than that of considering only frequency or support information. However, although this approach gives us more meaningful pattern information, the number of patterns found from large databases is extremely large in general; therefore, analyzing all of them may become inefficient and hard work. Thus, it is essential to apply a method that can selectively extract representative patterns from the enormous ones. Moreover, in the real-world applications, unexpected errors such as noise may occur, which can have a negative effect on the values of databases. Although the changes by the error are quite small, the characteristics of generated pat...
Symmetry | 2015
Gangin Lee; Unil Yun; Heungmo Ryang; Donggyu Kim
Frequent graph pattern mining is one of the most interesting areas in data mining, and many researchers have developed a variety of approaches by suggesting efficient, useful mining techniques by integration of fundamental graph mining with other advanced mining works. However, previous graph mining approaches have faced fatal problems that cannot consider important characteristics in the real world because they cannot process both (1) different element importance and (2) multiple minimum support thresholds suitable for each graph element. In other words, graph elements in the real world have not only frequency factors but also their own importance; in addition, various elements composing graphs may require different thresholds according to their characteristics. However, traditional ones do not consider such features. To overcome these issues, we propose a new frequent graph pattern mining method, which can deal with both different element importance and multiple minimum support thresholds. Through the devised algorithm, we can obtain more meaningful graph pattern results with higher importance. We also demonstrate that the proposed algorithm has more outstanding performance compared to previous state-of-the-art approaches in terms of graph pattern generation, runtime, and memory usage.