Jilian Zhang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jilian Zhang is active.

Explore More

Publication

Featured researches published by Jilian Zhang.

Applied Intelligence | 2007

Semi-parametric optimization for missing data imputation

Yongsong Qin; Shichao Zhang; Jilian Zhang; Chengqi Zhang

Missing data imputation is an important issue in machine learning and data mining. In this paper, we propose a new and efficient imputation method for a kind of missing data: semi-parametric data. Our imputation method aims at making an optimal evaluation about Root Mean Square Error (RMSE), distribution function and quantile after missing-data are imputed. We evaluate our approaches using both simulated data and real data experimentally, and demonstrate that our stochastic semi-parametric regression imputation is much better than existing deterministic semi-parametric regression imputation in efficiency and effectiveness.

trans. computational science | 2008

Missing value imputation based on data clustering

Shichao Zhang; Jilian Zhang; Xiaofeng Zhu; Yongsong Qin; Chengqi Zhang

We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an instance A with plausible values that are generated from the data in the instances which do not contain missing values and are most similar to the instance A using a kernel-based method. Specifically, we first divide the dataset (including the instances with missing values) into clusters. Next, missing values of an instance A are patched up with the plausible values generated from As cluster. Extensive experiments show the effectiveness of the proposed method in missing value imputation task.

international conference on industrial informatics | 2006

Clustering-based Missing Value Imputation for Data Preprocessing

Chengqi Zhang; Yongsong Qin; Xiaofeng Zhu; Jilian Zhang; Shichao Zhang

Missing value imputation is an actual yet challenging issue confronted by machine learning and data mining. Existing missing value imputation is a procedure that replaces the missing values in a dataset by some plausible values. The plausible values are generally generated from the dataset using a deterministic, or random method. In this paper we propose a new and efficient missing value imputation based on data clustering, called CRI (clustering-based random imputation). In our approach, we fill up the missing values of an instance with those plausible values that are generated from the data similar to this instance using a kernel-based random method. Specifically, we first divide the dataset (exclude instances with missing values) into clusters. And then each of those instances with missing-values is assigned to a cluster most similar to it. Finally, missing values of an instance A are thus patched up with those plausible values that are generated using a kernel-based method to those instances from As cluster. Our experiments (some of them are with the decision tree induction system C 5.0) have proved the effectiveness of our proposed method in missing value imputation task.

Expert Systems With Applications | 2009

POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases

Yongsong Qin; Shichao Zhang; Xiaofeng Zhu; Jilian Zhang; Chengqi Zhang

To complete missing values a solution is to use correlations between the attributes of the data. The problem is that it is difficult to identify relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation in this paper. This approach aims at making an optimal inference on statistical parameters: mean, distribution function and quantile after missing data are imputed. And we refer this approach to parameter optimization method (POP algorithm). We experimentally evaluate our approach, and demonstrate that our POP algorithm (random regression imputation) is much better than deterministic regression imputation in efficiency and generating an inference on the above parameters.

Information Sciences | 2007

EDUA: An efficient algorithm for dynamic database mining

Shichao Zhang; Jilian Zhang; Chengqi Zhang

Maintaining frequent itemsets (patterns) is one of the most important issues faced by the data mining community. While many algorithms for pattern discovery have been developed, relatively little work has been reported on mining dynamic databases, a major area of application in this field. In this paper, a new algorithm, namely the Efficient Dynamic Database Updating Algorithm (EDUA), is designed for mining dynamic databases. It works well when data deletion is carried out in any subset of a database that is partitioned according to the arrival time of the data. A pruning technique is proposed for improving the efficiency of the EDUA algorithm. Extensive experiments are conducted to evaluate the proposed approach and it is demonstrated that the EDUA is efficient.

knowledge discovery and data mining | 2007

GBKII: an imputation method for missing values

Chengqi Zhang; Jilian Zhang; Yongsong Qin; Shichao Zhang

Missing data imputation is an actual and challenging issue in machine learning and data mining. This is because missing values in a dataset can generate bias that affects the quality of the learned patterns or the classification performances. To deal with this issue, this paper proposes a Grey-Based K-NN Iteration Imputation method, called GBKII, for imputing missing values. GBKII is an instance-based imputation method, which is referred to a non-parametric regression method in statistics. It is also efficient for handling with categorical attributes. We experimentally evaluate our approach and demonstrate that GBKII is much more efficient than the k-NN and mean-substitution methods.

pacific rim international conference on artificial intelligence | 2006

Optimized parameters for missing data imputation

Shichao Zhang; Yongsong Qin; Jilian Zhang; Chengqi Zhang

To complete missing values, a solution is to use attribute correlations within data. However, it is difficult to identify such relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation method in this paper. This approach aims at making optimal statistical parameters: mean, distribution function after missing-data are imputed. We refer this approach to parameter optimization method (POP algorithm, a random regression imputation). We experimentally evaluate our approach, and demonstrate that our POP algorithm is much better than deterministic regression imputation in efficiency of generating an inference on the above two parameters. The results also show our algorithm is computationally efficient, robust and stable for the missing data imputation.

Knowledge and Information Systems | 2007

Mining follow-up correlation patterns from time-related databases

Shichao Zhang; Zifang Huang; Jilian Zhang

Research on traditional association rules has gained a great attention during the past decade. Generally, an association rule A → B is used to predict that B likely occurs when A occurs. This is a kind of strong correlation, and indicates that the two events will probably happen simultaneously. However, in real world applications such as bioinformatics and medical research, there are many follow-up correlations between itemsets A and B, such as, B is likely to occur n times after A has occurred m times. That is, the correlative itemsets do not belong to the same transaction. We refer to this relation as a follow-up correlation pattern (FCP). The task of mining FCP patterns brings more challenges on efficient processing than normal pattern discovery because the number of potentially interesting patterns becomes extremely large as the length limit of transactions no longer exists. In this paper, we develop an efficient algorithm to identify FCP patterns in time-related databases. We also experimentally evaluate our approach, and provide extensive results on mining this new kind of patterns.

international conference on data mining | 2006

Identifying Follow-Correlation Itemset-Pairs

Shichao Zhang; Jilian Zhang; Xiaofeng Zhu; Zifang Huang

An association rule ArarrB is useful to predict that B will likely occur when A occurs. This is a classical association rule. In real world applications, such as bioinformatics and medical research, there are many follow correlations between itemsets A and B: B likely occurs n times after A occurred m times, wrote to <Am, BN>. We refer to this follow-correlation as P3.1 itemset-pairs because <A3, B1> like that in the example ( Example 2) should be uninterested in association analysis. This paper designs an efficient algorithm for identifying P3.1 itemset-pairs in sequential data. We experimentally evaluate our approach, and demonstrate that the proposed approach is efficient and promising.

trans. computational science | 2009

Missing Data Analysis: A Kernel-Based Multi-Imputation Approach

Shichao Zhang; Zhi Jin; Xiaofeng Zhu; Jilian Zhang

Many missing data analysis techniques are of single-imputation. However, single-imputation cannot provide valid standard errors and confidence intervals, since it ignores the uncertainty implicit in the fact that the imputed values are not the actual values. Filling in each missing value with a set of plausible values is called multi-imputation. In this paper we propose a kernel-based stochastic non-parametric multi-imputation method under MAR (Missing at Random) and MCAR (Missing Completely at Random) missing mechanisms in nonparametric regression settings. Furthermore, we present a kernel-based stochastic semi-parametric multi-imputation method while we have some priori knowledge about the dataset with missing. Our algorithms are designed specifically with the aim of optimizing the confidence-interval and the relative efficiency. The proposed technique is evaluated by experimentations, using simulation data and real data, and the results demonstrate that our method performs much better than the NORM method, and is promising.

Explore More