Junqiang Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junqiang Liu is active.

Explore More

Publication

Featured researches published by Junqiang Liu.

knowledge discovery and data mining | 2002

Mining frequent item sets by opportunistic projection

Junqiang Liu; Yunhe Pan; Ke Wang; Jiawei Han

In this paper, we present a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is fundamentally different from those proposed in the past in that it opportunistically chooses between two different structures, array-based or tree-based, to represent projected transaction subsets, and heuristically decides to build unfiltered pseudo projection or to make a filtered copy according to features of the subsets. More importantly, we propose novel methods to build tree-based pseudo projections and array-based unfiltered projections for projected transaction subsets, which makes our algorithm both CPU time efficient and memory saving. Basically, the algorithm grows the frequent item set tree by depth first search, whereas breadth first search is used to build the upper portion of the tree if necessary. We test our algorithm versus several other algorithms on real world datasets, such as BMS-POS, and on IBM artificial datasets. The empirical results show that our algorithm is not only the most efficient on both sparse and dense databases at all levels of support threshold, but also highly scalable to very large databases.

knowledge discovery and data mining | 2002

Top Down FP-Growth for Association Rule Mining

Ke Wang; Liu Tang; Jiawei Han; Junqiang Liu

In this paper, we propose an efficient algorithm, called TD-FP-Growth (the shorthand for Top-Down FP-Growth), to mine frequent patterns. TD-FP-Growth searches the FP-tree in the top-down order, as opposed to the bottom-up order of previously proposed FP-Growth. The advantage of the topdown search is not generating conditional pattern bases and sub-FP-trees, thus, saving substantial amount of time and space. We extend TD-FP-Growth to mine association rules by applying two new pruning strategies: one is to push multiple minimum supports and the other is to push the minimum confidence. Experiments show that these algorithms and strategies are highly effective in reducing the search space.

international conference on data mining | 2012

Direct Discovery of High Utility Itemsets without Candidate Generation

Junqiang Liu; Ke Wang; Benjamin C. M. Fung

Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interestingness measures that reflect both the statistical significance and the users expectation. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotone property holds with the interestingness measure. The state-of-the-art works on this problem all employ a two-phase, candidate generation approach, which suffers from the scalability issue due to the huge number of candidates. This paper proposes a high utility itemset growth approach that works in a single phase without generating candidates. Our basic approach is to enumerate itemsets by prefix extensions, to prune search space by utility upper bounding, and to maintain original utility information in the mining process by a novel data structure. Such a data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility itemsets in an efficient and scalable way. We further enhance the efficiency significantly by introducing recursive irrelevant item filtering with sparse data, and a lookahead strategy with dense data. Extensive experiments on sparse and dense, synthetic and real data suggest that our algorithm outperforms the state-of-the-art algorithms over one order of magnitude.

international conference on data engineering | 2010

On optimal anonymization for l + -diversity

Junqiang Liu; Ke Wang

Publishing person specific data while protecting privacy is an important problem. Existing algorithms that enforce the privacy principle called l-diversity are heuristic based due to the NP-hardness. Several questions remain open: can we get a significant gain in the data utility from an optimal solution compared to heuristic ones; can we improve the utility by setting a distinct privacy threshold per sensitive value; is it practical to find an optimal solution efficiently for real world datasets. This paper addresses these questions. Specifically, we present a pruning based algorithm for finding an optimal solution to an extended form of the l-diversity problem. The novelty lies in several strong techniques: a novel structure for enumerating all solutions, methods for estimating cost lower bounds, strategies for dynamically arranging the enumeration order and updating lower bounds. This approach can be instantiated with any reasonable cost metric. Experiments on real world datasets show that our algorithm is efficient and improves the data utility.

IEEE Transactions on Knowledge and Data Engineering | 2016

Mining High Utility Patterns in One Phase without Generating Candidates

Junqiang Liu; Ke Wang; Benjamin C. M. Fung

Utility mining is a new development of data mining technology. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotonicity property holds with the interestingness measure. Prior works on this problem all employ a two-phase, candidate generation approach with one exception that is however inefficient and not scalable with large databases. The two-phase approach suffers from scalability issue due to the huge number of candidates. This paper proposes a novel algorithm that finds high utility patterns in a single phase without generating candidates. The novelties lie in a high utility pattern growth approach, a lookahead strategy, and a linear data structure. Concretely, our pattern growth approach is to search a reverse set enumeration tree and to prune search space by utility upper bounding. We also look ahead to identify high utility patterns without enumeration by a closure property and a singleton property. Our linear data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility patterns in an efficient and scalable way, which targets the root cause with prior algorithms. Extensive experiments on sparse and dense, synthetic and real world data suggest that our algorithm is up to 1 to 3 orders of magnitude more efficient and is more scalable than the state-of-the-art algorithms.

Journal of Medical Systems | 2016

Improvement of a Privacy Authentication Scheme Based on Cloud for Medical Environment

Shin-Yan Chiou; Zhaoqin Ying; Junqiang Liu

Medical systems allow patients to receive care at different hospitals. However, this entails considerable inconvenience through the need to transport patients and their medical records between hospitals. The development of Telecare Medicine Information Systems (TMIS) makes it easier for patients to seek medical treatment and to store and access medical records. However, medical data stored in TMIS is not encrypted, leaving patients’ private data vulnerable to external leaks. In 2014, scholars proposed a new cloud-based medical information model and authentication scheme which would not only allow patients to remotely access medical services but also protects patient privacy. However, this scheme still fails to provide patient anonymity and message authentication. Furthermore, this scheme only stores patient medical data, without allowing patients to directly access medical advice. Therefore, we propose a new authentication scheme, which provides anonymity, unlinkability, and message authentication, and allows patients to directly and remotely consult with doctors. In addition, our proposed scheme is more efficient in terms of computation cost. The proposed system was implemented in Android system to demonstrate its workability.

Knowledge and Information Systems | 2013

Anonymizing bag-valued sparse data by semantic similarity-based clustering

Junqiang Liu; Ke Wang

Web query logs provide a rich wealth of information, but also present serious privacy risks. We preserve privacy in publishing vocabularies extracted from a web query log by introducing vocabulary k-anonymity, which prevents the privacy attack of re-identification that reveals the real identities of vocabularies. A vocabulary is a bag of query-terms extracted from queries issued by a user at a specified granularity. Such bag-valued data are extremely sparse, which makes it hard to retain enough utility in enforcing k-anonymity. To the best of our knowledge, the prior works do not solve such a problem, among which some achieve a different privacy principle, for example, differential privacy, some deal with a different type of data, for example, set-valued data or relational data, and some consider a different publication scenario, for example, publishing frequent keywords. To retain enough data utility, a semantic similarity-based clustering approach is proposed, which measures the semantic similarity between a pair of terms by the minimum path distance over a semantic network of terms such as WordNet, computes the semantic similarity between two vocabularies by a weighted bipartite matching, and publishes the typical vocabulary for each cluster of semantically similar vocabularies. Extensive experiments on the AOL query log show that our approach can retain enough data utility in terms of loss metrics and in frequent pattern mining.

international conference on big data | 2015

Secure Outsourced Frequent Pattern Mining by Fully Homomorphic Encryption

Junqiang Liu; Jiuyong Li; Shijian Xu; Benjamin C. M. Fung

With the advent of the big data era, outsourcing data storage together with data mining tasks to cloud service providers is becoming a trend, which however incurs security and privacy issues. To address the issues, this paper proposes two protocols for mining frequent patterns securely on the cloud by employing fully homomorphic encryption. One protocol requires little communication between the client and the cloud service provider, the other incurs less computation cost. Moreover, a new privacy notion, namely \(\alpha \)-pattern uncertainty, is proposed to reinforce the second protocol. Our scenario has two advantages: one is stronger privacy protection, and the other is that the outsourced data can be used in different mining tasks. Experimental evaluation demonstrates that the proposed protocols provide a feasible solution to the issues.

database and expert systems applications | 2015

Parallel Eclat for Opportunistic Mining of Frequent Itemsets

Junqiang Liu; Yongsheng Wu; Qingfeng Zhou; Benjamin C. M. Fung; Fanghui Chen; Binxiao Yu

Mining frequent itemsets is an essential data mining problem. As the big data era comes, the size of databases is becoming so large that traditional algorithms will not scale well. An approach to the issue is to parallelize the mining algorithm, which however is a challenge that has not been well addressed yet. In this paper, we propose a MapReduce-based algorithm, Peclat, that parallelizes the vertical mining algorithm, Eclat, with three improvements. First, Peclat proposes a hybrid vertical data format to represent the data, which saves both space and time in the mining process. Second, Peclat adopts the pruning technique from the Apriori algorithm to improve efficiency of breadth-first search. Third, Peclat employs an ordering of itemsets that helps balancing the workloads. Extensive experiments demonstrate that Peclat outperforms the existing MapReduce-based algorithms significantly.

international conference on data mining | 2010

Enforcing Vocabulary k-Anonymity by Semantic Similarity Based Clustering

Junqiang Liu; Ke Wang

Web query logs provide a rich wealth of information, but also present serious privacy risks. We consider publishing vocabularies, bags of query-terms extracted from web query logs, which has a variety of applications. We aim at preventing identity disclosure of such bag-valued data. The key feature of such data is the extreme sparsity, which renders conventional anonymization techniques not working well in retaining enough utility. We propose a semantic similarity based clustering approach to address the issue. We measure the semantic similarity between two vocabularies by a weighted bipartite matching and present a greedy algorithm to cluster vocabularies by the semantic similarities. Extensive experiments on the AOL query log show that our approach retains more data utility than existing approaches.

Explore More