Justin Zhan
University of Nevada, Las Vegas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Justin Zhan.
Journal of Big Data | 2015
Xing Fang; Justin Zhan
Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process descriptions. Data used in this study are online product reviews collected from Amazon.com. Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes. At last, we also give insight into our future work on sentiment analysis.
Knowledge Based Systems | 2016
Jerry Chun-Wei Lin; Wensheng Gan; Philippe Fournier-Viger; Tzung-Pei Hong; Justin Zhan
In the field of data mining, the topic of high-utility itemset mining (HUIM) has recently gained a lot of attention from researchers as it takes many factors into account that are useful for decision-making by retail managers. In the past, many algorithms have been presented for HUIM but most of them suffer from the limitation of using a single minimum utility threshold to identify high-utility itemsets (HUIs). For real-life applications, finding itemsets using a single threshold is inadequate and unfair since each item is different. Hence, the diversity or importance of each item should be considered. This paper proposes a solution to this issue by defining the novel task of HUIM with multiple minimum utility thresholds (named as HUIM-MMU). This task lets users specify a different minimum utility threshold for each item to identify more useful and specific HUIs, which would generate more profits when compared to HUIs discovered based on a single minimum utility threshold. The HUI-MMU algorithm is designed to mine HUIs in a level-wise manner. The sorted downward closure (SDC) property and the least minimum utility (LMU) concept are developed to avoid a combinatorial explosion for identifying HUIs and to ensure the completeness and correctness of HUI-MMU for discovering HUIs. Meanwhile, two improved algorithms, namely HUI-MMUTID and HUI-MMUTE, are presented based on the TID-index and EUCP strategies. Those strategies can be used to speed up the mining performance to discover HUIs. Substantial experiments on both real-life and synthetic datasets show that the designed algorithms can efficiently and effectively discover the complete set of HUIs in databases by considering multiple minimum utility thresholds.
Journal of Big Data | 2015
Pravin Chopade; Justin Zhan
Community structure is thought to be one of the main organizing principles in most complex networks. Big data and complex networks represent an area which researchers are analyzing worldwide. Of special interest are groups of vertices within which connections are dense. In this paper we begin with discussing community dynamics and exploring complex network structural parameters. We put forward structural and functional models for analyzing complex networks under situations of perturbations. We introduce modified adjacency and modified Laplacian matrices. We further introduce network or degree centrality (weighted Laplacian centrality) based on modified Laplacian, weighted micro-community centrality. We discuss its robustness and importance for micro-community detection for social and technological complex networks with overlapping communities. We also introduce ’k-clique sub-community’ overlapping community detection based on degree and weighted micro-community centrality. The proposed algorithms use optimal partition of k-clique sub-community for modularity optimization. We establish relationship between degree centrality and modularity. This proposed method with modified adjacency matrix helps us solve NP-hard problem.
Engineering Applications of Artificial Intelligence | 2016
Jerry Chun-Wei Lin; Lu Yang; Philippe Fournier-Viger; Jimmy Ming-Thai Wu; Tzung-Pei Hong; Leon Shyue-Liang Wang; Justin Zhan
High-utility itemset mining (HUIM) is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM) or association-rule mining (ARM). Several algorithms have been presented to mine high-utility itemsets (HUIs) and most of the designed algorithms have to handle the exponential search space for discovering HUIs when the number of distinct items and the size of database are very large. In the past, a heuristic HUPEumu-GRAM algorithm was proposed to mine HUIs based on genetic algorithm (GA). For the evolutionary computation (EC) techniques of particle swarm optimization (PSO), it only requires fewer parameters compared to the GA-based approach. Since the traditional PSO mechanism is used to handle the continuous problem, in this paper, the discrete PSO is adopted to encode the particles as the binary variables. An efficient PSO-based algorithm namely HUIM-BPSOsig is proposed to efficiently find HUIs. It first sets the number of discovered high-transaction-weighted utilization 1-itemsets (1-HTWUIs) as the size of a particle based on transaction-weighted utility (TWU) model, which can greatly reduce the combinational problem in evolution process. The sigmoid function is adopted in the updating process of the particles of the designed HUIM-BPSOsig algorithm. Substantial experiments on real-life datasets show that the proposed algorithm has better results compared to the state-of-the-art GA-based algorithm.
Journal of Big Data | 2016
Matin Pirouz; Justin Zhan
This paper proposes an algorithm called optimized relativity search to reduce the number of nodes in a graph when attempting to decrease the running time for personalized page rank (PPR) estimation. Even though similar estimations have been done, this method significantly increases the speed of computation, making it a feasible candidate for large graph solutions, such as search engines and friend recommendation techniques used in social media. In this study, the weighted page rank method was combined with the Monte-Carlo technique and a local update algorithm over a reduced map space; this algorithm was developed to achieve a more accurate and faster search method than FAST PPR. The experimental results showed that for nodes with a high degree of incoming nodes, the speed of estimation was twice as fast compared to FAST PPR, at the expense of a little accuracy.
Knowledge Based Systems | 2017
Jimmy Ming-Thai Wu; Justin Zhan; Jerry Chun-Wei Lin
High-utility itemset mining (HUIM) is a major contemporary data mining issue. It is different from frequent itemset mining (FIM), which only considers the frequency factor. HUIM applies both the quantity and profit factors to be used to reveal the most profitable products. Several previous approaches have been proposed to mine high-utility itemsets (HUIs) and most of them have to handle the exponential search space for discovering HUIs when the number of distinct items and the size of the database are both very large. Therefore, two evolutionary computation (EC) techniques, genetic algorithm (GA) and particle swarm optimization (PSO), were previously proposed to mine HUIs. In these studies, GAs and PSOs also could obtain the huge amount of high-utility items in a limitation time. In this paper, a novel algorithm based on the other evolutionary computation technique, ant colony optimization (ACO), is proposed to resolve this issue. Unlike GAs and PSOs, ACOs produce a feasible solution in a constructive way. They can avoid generating unreasonable solutions as much as possible. Thus, a well-defined ACO approach can always obtain suitable solutions efficiently. An ant colony system (ACS), which is extended from ACO and consists of high-utility itemset mining by ACS (HUIM-ACS), is proposed to efficiently find HUIs. In general, an EC algorithm cannot make sure the provided solution is the global optimal solution. But the designed HUIM-ACS algorithm maps the completed solution space into the routing graph and includes two pruning processes. Therefore, it guarantees that it obtains all of the HUIs when there is no candidate edge from the starting point. In addition, HUIM-ACS does not estimate the same feasible solution again in its process in order to avoid wasting computational resource. Substantial experiments on real-life datasets show that the proposed algorithm outperforms the other heuristic algorithms for mining HUIs in terms of the number of discovered HUIs, and convergence.
Engineering Applications of Artificial Intelligence | 2017
Wensheng Gan; Jerry Chun-Wei Lin; Philippe Fournier-Viger; Han-Chieh Chao; Jimmy Ming-Thai Wu; Justin Zhan
Weighted Frequent Itemset Mining (WFIM) has been proposed as an extension of frequent itemset mining that considers not only the frequency of items but also their relative importance. However, using WFIM algorithms in real applications raises some problems. First, they do not consider how recent the patterns are. Second, traditional WFIM algorithms cannot handle uncertain data, although this type of data is common in real-life. To address these limitations, this paper introduces the concept of Recent High Expected Weighted Itemset (RHEWI), which considers the recency, weight and uncertainty of patterns. By considering these three factors, more up-to-date and relevant results are found. A projection-based algorithm named RHEWI-P is presented to mine RHEWIs using a novel upper-bound downward closure (UBDC) property. An improved version of this algorithm called RHEWI-PS is further proposed based on a novel sorted upper-bound downward closure (SUBDC) property for pruning unpromising candidate itemsets early. An experimental evaluation against the state-of-the-art HEWI-Uapriori algorithm was carried out on both real-world and synthetic datasets. Results show that the proposed algorithms are highly efficient and are acceptable for mining the desired patterns.
Engineering Applications of Artificial Intelligence | 2017
Wensheng Gan; Jerry Chun-Wei Lin; Philippe Fournier-Viger; Han-Chieh Chao; Justin Zhan
Frequent pattern mining (FPM) is an important topic in data mining for discovering the implicit but useful information. Many algorithms have been proposed for this task but most of them suffer from an important limitation, which relies on a single uniform minimum support threshold as the sole criterion to identify frequent patterns (FPs). Using a single threshold value to assess the usefulness of all items in a database is inadequate and unfair in real-life applications since each item is different and not all items should be treated as the same. Several algorithms have been developed for mining FPs with multiple minimum supports but most of them suffer from the time-consuming problem and require a large amount of memory. In this paper, we address this issue by introducing the novel approach named Frequent Pattern mining with Multiple minimum supports from the Enumeration-tree (FP-ME). In the developed Set-Enumeration-tree with Multiple minimum supports (ME-tree) structure, a new sorted downward closure (SDC) property of FPs and the least minimum support (LMS) concept with multiple minimum supports are used to effectively prune the search space. The proposed FP-ME algorithm can directly discover FPs from the ME-tree without candidate generation. Moreover, an improved algorithm, named FP-MEDiffSet, is also developed based on the DiffSet concept, to further increase mining performance. Substantial experiments on both real-life and synthetic datasets show that the proposed algorithms can not only avoid the rare item problem, but also efficiently and effectively discover the complete set of FPs in transactional databases while considering multiple minimum supports and outperform the state-of-the-art CFP-growth++ algorithm in terms of execution time, memory usage and scalability.
Journal of Big Data | 2016
Matin Pirouz; Justin Zhan; Shahab Tayeb
Community structures and relation patterns, and ranking them for social networks provide us with great knowledge about network. Such knowledge can be utilized for target marketing or grouping similar, yet distinct, nodes. The ever-growing variety of social networks necessitates detection of minute and scattered communities, which are important problems across different research fields including biology, social studies, physics, etc. Existing community detection algorithms such as fast and folding or modularity based are either incapable of finding graph anomalies or too slow and impractical for large graphs. The main contributions of this work are twofold: (i) we optimize the Attractor algorithm, speeding it up by a factor depending on complexity of the graph; i.e. the more complex a social graph is, the better result the algorithm will achieve, and (ii) we propose a community ranker algorithm for the first time. The former is achieved by amalgamating loops and incorporating breadth-first search (BFS) algorithm for edge alignments and to fill in the missing cache, preserving a constant of time equal to the number of edges in the graph. For the latter, we make the first attempt to enumerate how influential each community is in a given graph, ranking them based on their normalized impact factor.
International Journal of Fuzzy Systems | 2017
Jerry Chun-Wei Lin; Ting Li; Philippe Fournier-Viger; Tzung-Pei Hong; Jimmy Ming-Tai Wu; Justin Zhan
Traditional association-rule mining or frequent itemset mining only can handle binary databases, in which each item or attribute is represented as either 0 or 1. Several algorithms were developed extensively to discover fuzzy frequent itemsets by adopting the fuzzy set theory to the quantitative databases. Most of them considered the maximum scalar cardinality to find, at most, one represented item from the transformed linguistic terms. This paper presents an MFFI-Miner algorithm to mine the complete set of multiple fuzzy frequent itemsets (MFFIs) without candidate generation. An efficient fuzzy-list structure was designed to keep the essential information for mining process, which can greatly reduce the computation of a database scan. Two efficient pruning strategies are developed to reduce the search space, thus speeding up the mining process to discover MFFIs directly. Substantial experiments were conducted to compare the performance of the proposed algorithm to the state-of-the-art approaches in terms of execution time, memory usage, and node analysis.