Ming-Yen Lin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ming-Yen Lin is active.

Explore More

Publication

Featured researches published by Ming-Yen Lin.

international conference on ubiquitous information management and communication | 2012

Apriori-based frequent itemset mining algorithms on MapReduce

Ming-Yen Lin; Pei-Yu Lee; Sue-Chen Hsueh

Many parallelization techniques have been proposed to enhance the performance of the Apriori-like frequent itemset mining algorithms. Characterized by both map and reduce functions, MapReduce has emerged and excels in the mining of datasets of terabyte scale or larger in either homogeneous or heterogeneous clusters. Minimizing the scheduling overhead of each map-reduce phase and maximizing the utilization of nodes in each phase are keys to successful MapReduce implementations. In this paper, we propose three algorithms, named SPC, FPC, and DPC, to investigate effective implementations of the Apriori algorithm in the MapReduce framework. DPC features in dynamically combining candidates of various lengths and outperforms both the straight-forward algorithm SPC and the fixed passes combined counting algorithm FPC. Extensive experimental results also show that all the three algorithms scale up linearly with respect to dataset sizes and cluster sizes.

international conference on tools with artificial intelligence | 1998

Incremental update on sequential patterns in large databases

Ming-Yen Lin; Suh-Yin Lee

Mining of sequential patterns in a transactional database is time consuming due to its complexity. Maintaining present patterns is a non-trivial task after database update, since appended data sequences may invalidate old patterns and create new ones. In contrast to re-mining, the key to improve mining performance in the proposed incremental update algorithm is to effectively utilize the discovered knowledge. By counting over appended data sequences instead of the entire updated database in most cases, fast filtering of patterns found in last mining and successive reductions in candidate sequences together make efficient update on sequential patterns possible.

Journal of Information Science and Engineering | 2005

Fast discovery of sequential patterns through memory indexing and database partitioning

Ming-Yen Lin; Suh-Yin Lee

Sequential pattern mining is a challenging issue because of the high complexity of temporal pattern discovering from numerous sequences. Current mining approaches either require frequent database scanning or the generation of several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns is becoming possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once to read data sequences into memory. The find-then-index technique is recursively used to find the items that constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. As a result of effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns get longer. Moreover, we can estimate the maximum size of the total memory required, which is independent of the minimum support threshold, in MEMISP. Experimental results indicate that MEMISP outperforms both GSP and PrefixSpan (general version) without the need for either candidate generation or database projection. When the database is too large to fit into memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Experiments performed on extra-large databases demonstrate the good performance and scalability of MEMISP, even with very low minimum support. Therefore, MEMISP can efficiently mine sequence databases of any size, for any minimum support values.

Information Systems | 2004

Incremental update on sequential patterns in large databases by implicit merging and efficient counting

Ming-Yen Lin; Suh-Yin Lee

Current approaches for sequential pattern mining usually assume that the mining is performed in a static sequence database. However, databases are not static due to update so that the discovered patterns might become invalid and new patterns could be created. In addition to higher complexity, the maintenance of sequential patterns is more challenging than that of association rules owing to sequence merging. Sequence merging, which is unique in sequence databases, requires the appended new sequences to be merged with the existing ones if their customer ids are the same. Re-mining of the whole database appears to be inevitable since the information collected in previous discovery will be corrupted by sequence merging. Instead of re-mining, the proposed IncSP (Incremental Sequential Pattern Update) algorithm solves the maintenance problem through effective implicit merging and efficient separate counting over appended sequences. Patterns found previously are incrementally updated rather than re-mined from scratch. Moreover, the technique of early candidate pruning further speeds up the discovery of new patterns. Empirical evaluation using comprehensive synthetic data shows that IncSP is fast and scalable.

data warehousing and knowledge discovery | 2002

Fast Discovery of Sequential Patterns by Memory Indexing

Ming-Yen Lin; Suh-Yin Lee

Mining sequential patterns is an important issue for the complexity of temporal pattern discovering from sequences. Current mining approaches either require many times of database scanning or generate several intermediate databases. As databases may fit into the ever-increasing main memory, efficient memory-based discovery of sequential patterns will become possible. In this paper, we propose a memory indexing approach for fast sequential pattern mining, named MEMISP. During the whole process, MEMISP scans the sequence database only once for reading data sequences into memory. The find-then- index technique recursively finds the items which constitute a frequent sequence and constructs a compact index set which indicates the set of data sequences for further exploration. Through effective index advancing, fewer and shorter data sequences need to be processed in MEMISP as the discovered patterns getting longer. Moreover, the maximum size of total memory required, which is independent of minimum support threshold in MEMISP, can be estimated. The experiments indicates that MEMISP outperforms both GSP and PrefixSpan algorithms. MEMISP also has good linear scalability even with very low minimum support. When the database is too large to fit in memory in a batch, we partition the database, mine patterns in each partition, and validate the true patterns in the second pass of database scanning. Therefore, MEMISP may efficiently mine databases of any size, for any minimum support values.

Information Sciences | 2012

High utility pattern mining using the maximal itemset property and lexicographic tree structures

Ming-Yen Lin; Tzer-Fu Tu; Sue-Chen Hsueh

The problem of high utility mining is discovering all of the high utility itemsets in a transactional database. Most algorithms find high utility itemsets in two steps. The first step identifies all of the potential itemsets. The second step then determines the high utility itemsets from the set of potential itemsets. The large number of potential itemsets in the first step is generally the mining bottleneck. If we can reduce the number of potential itemsets, the mining performance can be improved significantly. In this paper, we use a maximal itemset property and propose an algorithm called UMMI (high Utility Mining using the Maximal Itemset property) to significantly reduce the number of potential itemsets in the first step. In the second step, UMMI uses an effective lexicographic tree structure to determine all of the high utility itemsets. In general, UMMI outperforms all three of the previously used algorithms, including CTU-PRO, an optimized TWU-mining algorithm, and Two-Phase, in our experiments using synthetic datasets. On average, UMMI is 5, 3, and 7 times faster than CTU-PRO, TWU-mining, and Two-Phase, respectively. In a real data experiment, UMMI is 6 times faster than Two-Phase. The other two algorithms are not capable of completing the mining step in a reasonable amount of time. UMMI uses an approximately fixed amount of memory, which is generally less than the other algorithms for each mining. The experimental results show that the proposed algorithm can mine the high utility itemsets efficiently. In addition, UMMI is linearly scalable with respect to the number of transactions.

international symposium on consumer electronics | 2011

Secure cloud storage for convenient data archive of smart phones

Sue-Chen Hsueh; Jing-Yan Lin; Ming-Yen Lin

The importance of the data stored in the smart phones is increased as more applications are deployed and executed. Once the smart phone is damaged or lost, the valuable information treasured in the device is lost altogether. If cloud storage can be integrated with cloud services for periodical data backup of a mobile client, the risk of data lost can be minimized. However, the important data might be uncovered by a malicious third party during retrieval or transmission of information using wireless cloud storage without proper authentication and protection. Therefore, in this paper, we design an archive mechanism that integrates cloud storage, hybrid cryptography, and digital signatures to provide security requirements for data storage of mobile phones. Our mechanism not only can avoid malicious attackers from illegal access but also can share desired information with targeted friends by distinct access rights.

Knowledge and Information Systems | 2005

Efficient mining of sequential patterns with time constraints by delimited pattern growth

Ming-Yen Lin; Suh-Yin Lee

An active research topic in data mining is the discovery of sequential patterns, which finds all frequent subsequences in a sequence database. The generalized sequential pattern (GSP) algorithm was proposed to solve the mining of sequential patterns with time constraints, such as time gaps and sliding time windows. Recent studies indicate that the pattern-growth methodology could speed up sequence mining. However, the capabilities to mine sequential patterns with time constraints were previously available only within the Apriori framework. Therefore, we propose the DELISP (delimited sequential pattern) approach to provide the capabilities within the pattern-growth methodology. DELISP features in reducing the size of projected databases by bounded and windowed projection techniques. Bounded projection keeps only time-gap valid subsequences and windowed projection saves nonredundant subsequences satisfying the sliding time-window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the pattern growing process. The comprehensive experiments conducted show that DELISP has good scalability and outperforms the well-known GSP algorithm in the discovery of sequential patterns with time constraints.

Information Sciences | 2008

Fast discovery of sequential patterns in large databases using effective time-indexing

Ming-Yen Lin; Sue-Chen Hsueh; Chia-Wen Chang

Sequential pattern mining algorithms can often produce more accurate results if they work with specific constraints in addition to the support threshold. Many systems implement time-independent constraints by selecting qualified patterns. This selection cannot implement time-dependent constraints, because the support computation process must validate the time attributes of every data sequence during mining. Therefore, we propose a memory time-indexing approach, called METISP, to discover sequential patterns with time constraints including minimum-gap, maximum-gap, exact-gap, sliding window, and duration constraints. METISP scans the database into memory and constructs time-index sets for effective processing. METISP uses index sets and a pattern-growth strategy to mine patterns without generating any candidates or sub-databases. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting of potential items within the indicated ranges. Our comprehensive experiments show that METISP has better efficiency, even with low support and large databases, than the well-known GSP and DELISP algorithms. METISP scales up linearly with respect to database size.

Information Sciences | 2004

Interactive sequence discovery by incremental mining

Ming-Yen Lin; Suh-Yin Lee

Sequential pattern mining has become a challenging task in data mining due to its complexity. Essentially, the mining algorithms discover all the frequent patterns meeting the user specified minimum support threshold. However, it is very unlikely that the user could obtain the satisfactory patterns in just one query. Usually the user must try various support thresholds to mine the database for the final desirable set of patterns. Consequently, the time-consuming mining process has to be repeated several times. However, current approaches are inadequate for such interactive mining due to the long processing time required for each query. In order to reduce the response time for each query during the interactive process, we propose a knowledge base assisted mining algorithm for interactive sequence discovery. The proposed approach utilizes the knowledge acquired from each mining process, accumulates the counting information to facilitate efficient counting of patterns, and speeds up the whole interactive mining process. Furthermore, the knowledge base makes possible the direct generation of new candidate sets and the concurrent support counting of variable sized candidates. Even for some queries, due to the pattern information already kept in the knowledge base, database access is not required at all. The conducted experiments show that our approach outperforms GSP, a state-of-the-art sequential pattern mining algorithm, by several order of magnitudes for interactive sequence discovery.

Explore More