Unil Yun
Sejong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Unil Yun.
Information Sciences | 2007
Unil Yun
Most algorithms for frequent pattern mining use a support constraint to prune the combinatorial search space but support-based pruning is not enough. After mining datasets to obtain frequent patterns, the resulting patterns can have weak affinity. Although the minimum support can be increased, it is not effective for finding correlated patterns with increased weight and/or support affinity. Interesting measures have been proposed to detect correlated patterns but any approach does not consider both support and weight. In this paper, we present a new strategy, Weighted interesting pattern mining (WIP) in which a new measure, weight-confidence, is suggested to mine correlated patterns with the weight affinity. A weight range is used to decide weight boundaries and an h-confidence serves to identify support affinity patterns. In WIP, without additional computation cost, original h-confidence is used instead of the upper bound of h-confidence for performance improvement. WIP not only gives a balance between the two measures of weight and support, but also considers weight affinity and/or support affinity between items within patterns so more correlated patterns can be detected. To our knowledge, ours is the first work specifically to consider weight affinity between items of patterns. A comprehensive performance study shows that WIP is efficient and scalable for finding affinity patterns. Moreover, it generates fewer but more valuable patterns with the correlation. To decrease the number of thresholds, w-confidence, h-confidence and weighted support can be used selectively according to requirement of applications.
Expert Systems With Applications | 2014
Unil Yun; Heungmo Ryang; Keun Ho Ryu
High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage.
Knowledge Based Systems | 2008
Unil Yun
Sequential pattern mining is an essential research topic with broad applications which discovers the set of frequent subsequences satisfying a support threshold in a sequence database. The major problems of mining sequential patterns are that a huge set of sequential patterns are generated and the computation time is so high. Although efficient algorithms have been developed to tackle these problems, the performance of the algorithms dramatically degrades in case of mining long sequential patterns in dense databases or using low minimum supports. In addition, the algorithms may reduce the number of patterns but unimportant patterns are still found in the result patterns. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining. Previous sequential mining algorithms treat sequential patterns uniformly while real sequential patterns have different importance. In our approach, the weights of items are given according to the priority or importance. During the mining process, we consider not only supports but also weights of patterns. Based on the framework, we present a weighted sequential pattern mining algorithm (WSpan). To our knowledge, this is the first work to mine weighted sequential patterns. The experimental results show that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.
Knowledge Based Systems | 2014
Unil Yun; Keun Ho Ryu
Outstanding frequent pattern mining guarantees both fast runtime and low memory usage with respect to various data with different types and sizes. However, it is hard to improve the two elements since runtime is inversely proportional to memory usage in general. Researchers have made efforts to overcome the problem and have proposed mining methods which can improve both through various approaches. Many of state-of-the-art mining algorithms use tree structures, and they create nodes independently and connect them as pointers when constructing their own trees. Accordingly, the methods have pointers for each node in the trees, which is an inefficient way since they should manage and maintain numerous pointers. In this paper, we propose a novel tree structure to solve the limitation. Our new structure, LP-tree (Linear Prefix - Tree) is composed of array forms and minimizes pointers between nodes. In addition, LP-tree uses minimum information required in mining process and linearly accesses corresponding nodes. We also suggest an algorithm applying LP-tree to the mining process. The algorithm is evaluated through various experiments, and the experimental results show that our approach outperforms previous algorithms in term of the runtime, memory, and scalability.
Knowledge Based Systems | 2007
Unil Yun
Frequent pattern mining is one of main concerns in data mining tasks. In frequent pattern mining, closed frequent pattern mining and weighted frequent pattern mining are two main approaches to reduce the search space. Although many related studies have been suggested, no mining algorithm considers both paradigms. Even if closed frequent pattern mining represents exactly the same knowledge and weighted frequent pattern mining provides a way to discover more important patterns, the incorporation of closed frequent pattern mining and weight frequent pattern mining may loss information. Based on our analysis of joining orders, we propose closed weighted frequent pattern mining, and present how to discover succinct but lossless closed frequent pattern with weight constraints. To our knowledge, ours is the first work specifically to consider both constraints. An extensive performance study shows that our algorithm outperforms previous algorithms. In addition, it is efficient and scalable.
Applied Intelligence | 2015
Unil Yun; Heungmo Ryang
Pattern mining is a data mining technique used for discovering significant patterns and has been applied to various applications such as disease analysis in medical databases and decision making in business. Frequent pattern mining based on item frequencies is the most fundamental topic in the pattern mining field. However, it is difficult to discover the important patterns on the basis of only frequencies since characteristics of real-world databases such as relative importance of items and non-binary transactions are not reflected. In this regard, utility pattern mining has been considered as an emergent research topic that deals with the characteristics. In real-world applications, meanwhile newly generated data by continuous operation or data in other databases for integration analysis can be gradually added to the current database. To efficiently deal with both existing and new data as a database, it is necessary to reflect increased data to previous analysis results without analyzing the whole database again. In this paper, we propose an algorithm called HUPID-Growth (High Utility Patterns in Incremental Databases Growth) for mining high utility patterns in incremental databases. Moreover, we suggest a tree structure constructed with a single database scan named HUPID-Tree (High Utility Patterns in Incremental Databases Tree), and a restructuring method with a novel data structure called TIList (Tail-node Information List) in order to process incremental databases more efficiently. We conduct various experiments for performance evaluation with state-of-the-art algorithms. The experimental results show that the proposed algorithm more efficiently processes real datasets compared to previous ones.
Knowledge Based Systems | 2014
Unil Yun; Gangin Lee; Keun Ho Ryu
Frequent pattern mining over data streams is currently one of the most interesting fields in data mining. Current databases have needed more immediate processes since enormous amounts of data are being accumulated and updated in real time. However, existing traditional approaches have not been entirely suitable for a data stream environment since they operate with more than two database scans. Moreover, frequent pattern mining over data streams mostly generates an enormous number of frequent patterns, thereby causing a significant amount of overheads. In addition, as weight conditions are very useful factors in reflecting importance for each object in the real world, it is necessary to apply them to the mining process in order to obtain more practical, meaningful patterns. To consider and solve these problems, we propose a novel method for mining Weighted Maximal Frequent Patterns (WMFPs) over data streams, called MWS (Maximal frequent pattern mining with Weight conditions over data Streams). MWS guarantees efficient mining performance in the data stream environment by scanning stream databases only once, and prevents overheads of pattern extractions with an abbreviated notation: a maximal frequent pattern form instead of the general one. Furthermore, MWS contributes to enhanced reliability of the mining results by applying weight conditions to each element of the data streams. Extensive experiments report that MWS has outstanding performance in comparison to previous algorithms.
Information Systems | 2006
Unil Yun; John J. Leggett
Sequential pattern mining algorithms have been developed which mine the set of frequent subsequences satisfying a minimum support constraint in a sequence database. However, previous sequential mining algorithms treat sequential patterns uniformly while sequential patterns have different importance. Another main problem in most of the sequence mining algorithms is that they still generate an exponentially large number of sequential patterns when a minimum support is lowered and they do not provide alternative ways to adjust the number of sequential patterns other than increasing the minimum support. In this paper, we propose a weighted sequential pattern mining algorithm called WSpan. Our main approach is to push the weight constraints into the sequential pattern growth approach while maintaining the downward closure property. A weight range is defined to maintain the downward closure property and items are given different weights within the weight range. In scanning a sequence database, a maximum weight in the sequence database is used to prune weighted infrequent sequential patterns and in the mining step, maximum weights of projected sequence databases are used. By doing so, the downward closure property can be maintained. WSpan generates fewer but important weighted sequential patterns in large databases, particularly dense databases with a low minimum support, by adjusting a weight range
Knowledge Based Systems | 2011
Unil Yun; Keun Ho Ryu
In data mining area, weighted frequent pattern mining has been suggested to find important frequent patterns by considering the weights of patterns. More extensions with weight constraints have been proposed such as mining weighted association rules, weighted sequential patterns, weighted closed patterns, frequent patterns with dynamic weights, weighted graphs, and weighted sub-trees or sub structures. In previous approaches of weighted frequent pattern mining, weighted supports of patterns were exactly matched to prune weighted infrequent patterns. However, in the noisy environment, the small change in weights or supports of items affects the result sets seriously. This may make the weighted frequent patterns less useful in the noisy environment. In this paper, we propose the robust concept of mining approximate weighted frequent patterns. Based on the framework of weight based pattern mining, an approximate factor is defined to relax the requirement for exact equality between weighted supports of patterns and a minimum threshold. After that, we address the concept of mining approximate weighted frequent patterns to find important patterns with/without the noisy data. We analyze characteristics of approximate weighted frequent patterns and run extensive performance tests.
Applied Intelligence | 2014
Unil Yun
Top-k frequent pattern mining finds interesting patterns from the highest support to the k-th support. The approach can be effectively applied in numerous fields such as marketing, finance, bio-data analysis, and so on since it does not need constraints by a minimum support threshold. Top-k mining methods use the support of the k-th pattern, not a user-specified minimum support. Thus, the methods conduct mining operations based on very low supports until the k-th pattern is detected. When a low support is used in the mining process, single-paths with numerous items are generated, where the top-k mining algorithm extracts valid patterns by combining the items for each single-path. Therefore, the bigger the number of combinations is, the larger the increase in time and memory consumption is. In this paper, in order to mine top-k frequent patterns more efficiently, we consider converting patterns obtained from single-paths into composite patterns during the mining process and recovering them as the original patterns when the top-k frequent patterns are extracted. For this, we define a new concept, the composite pattern, and propose novel techniques for reducing pattern combinations in the single-path. Two algorithms are introduced in this paper, where the former is CRM (Combination Reducing method), applying our reduction manner, and the latter is CRMN (Combination Reducing method for N-itemset), considering N-itemset, i.e., patterns’ lengths. A performance evaluation shows that CRM and CRMN algorithms can efficiently reduce pattern combinations in single-paths compared to state-of-the-art algorithms. The experimental results also illustrate that our approaches have outstanding performance in terms of runtime, memory, and scalability.