Zhengxin Chen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhengxin Chen is active.

Explore More

Publication

Featured researches published by Zhengxin Chen.

International Journal of Information Technology and Decision Making | 2007

A descriptive framework for the field of data mining and knowledge discovery

Yong Shi; Zhengxin Chen; Yi Peng

As our abilities to collect and store various types of datasets are continually increasing, the demands for advanced techniques and tools to understand and make use of these large data keep growing. No single existing field is capable of satisfying the needs. Data Mining and Knowledge Discovery (DMKD), which utilizes methods, techniques, and tools from diverse disciplines, emerged in last decade to solve this problem. It brings knowledge and theories from several fields including databases, machine learning, optimization, statistics, and data visualization and has been applied to various real-life applications. Even though data mining has made significant progress during the past fifteen years, most research effort is devoted to developing effective and efficient algorithms that can extract knowledge from data and not enough attention has been paid to the philosophical foundations of data mining. The objective of this research is to provide a conceptual framework that identifies major research areas of data mining and knowledge discovery (DMKD) for students and beginners and describe the longitudinal changes of DMKD research activities. Using the textual documents collected from premier DMKD journals, conference proceedings, syllabi, and dissertations, this study is intended to address the following issues: What are the major subjects of this field? What is the central theme? What are the connections among these subjects? What are the longitudinal changes of DMKD research? To answer these questions, this research uses a combination of grounded theory and document clustering. The result will represent previous and current DMKD research activities in the form of a framework. The resulting framework should allow people to comprehend the entire domain of DMKD research and assist identification of areas in need of more research efforts.

decision support systems | 2008

A Multi-criteria Convex Quadratic Programming model for credit data analysis

Yi Peng; Gang Kou; Yong Shi; Zhengxin Chen

Speed and scalability are two essential issues in data mining and knowledge discovery. This paper proposed a mathematical programming model that addresses these two issues and applied the model to Credit Classification Problems. The proposed Multi-criteria Convex Quadric Programming (MCQP) model is highly efficient (computing time complexity O(n^1^.^5^-^2)) and scalable to massive problems (size of O(10^9)) because it only needs to solve linear equations to find the global optimal solution. Kernel functions were introduced to the model to solve nonlinear problems. In addition, the theoretical relationship between the proposed MCQP model and SVM was discussed.

Information Sciences | 2009

Multiple criteria mathematical programming for multi-class classification and application in network intrusion detection

Gang Kou; Yi Peng; Zhengxin Chen; Yong Shi

Multi-class classification problems are harder to solve and less studied than binary classification problems. The goal of this paper is to present a multi-criteria mathematical programming (MCMP) model for multi-class classification. Furthermore, we introduce the concept of e-support vector to facilitate computation of large-scale applications. Instead of finding the optimal solution for a convex mathematical programming problem, the computation of optimal solution for the model requires only matrix computation. Using two network intrusion datasets, we demonstrate that the proposed model can achieve both high classification accuracies and low false alarm rates for multi-class network intrusion classification.

Pattern Recognition Letters | 2002

An iterative initial-points refinement algorithm for categorical data clustering

Ying Sun; Qiuming Zhu; Zhengxin Chen

The original k-means clustering algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being directly applied to categorical data clustering in many data mining applications. The k-modes algorithm [Z. Huang, Clustering large data sets with mixed numeric and categorical value, in: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference. World Scientific, Singapore, 1997, pp. 21-34] extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus the k-means fashion of minimizing a numerically valued cost. However, as is the case with most data clustering algorithms, the algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. The differences on the initial points often lead to considerable distinct cluster results. In this paper we present an experimental study on applying Bradley and Fayyads iterative initial-point refinement algorithm to the k-modes clustering to improve the accurate and repetitiveness of the clustering results [cf. P. Bradley, U. Fayyad, Refining initial points for k-mean clustering, in: Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1998]. Experiments show that the k-modes clustering algorithm using refined initial points leads to higher precision results much more reliably than the random selection method without refinement, thus making the refinement process applicable to many data mining applications with categorical data.

Information & Software Technology | 2001

An integrated interactive environment for knowledge discovery from heterogeneous data resources

Miao Chen; Qiuming Zhu; Zhengxin Chen

Abstract Discovering knowledge such as causal relations among objects in large data collections is very important in many decision-making processes. In this paper, we present our development of an integrated environment acting as a software agent for discovering correlative attributes of data objects from multiple heterogeneous resources. The environment provides necessary supporting tools and processing engines for acquiring, collecting, and extracting relevant information from multiple data resources, and then forming meaningful knowledge patterns. The agent system is featured with an interactive user interface that provides useful communication channels for human supervisors to actively engage in necessary consultation and guidance in the entire knowledge discovery processes. A cross-reference technique is employed for searching and discovering coherent set of correlative patterns from the heterogeneous data resources. A Bayesian network approach is applied as a knowledge representation scheme for recording and manipulating the discovered causal relations. The system employs common data warehousing and OLAP techniques to form integrated data repository and generate database queries over large data collections from various distinct data resources.

International Journal of Information Technology and Decision Making | 2006

FROM DATA MINING TO BEHAVIOR MINING

Zhengxin Chen

Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rather than focusing on the data alone), complex data injects semantics into the mining process, thus enhancing the potential of making better contribution to knowledge economy. Since the relationships between the data reveal certain behavioral aspects underlying the plain data, this shift of mining from simple data to complex data signals a fundamental change to a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining. Behavior mining also has the potential of unifying some other recent activities in data mining. We discuss important aspects on behavior mining, and discuss its implications for the future of data mining.

CASDMKM'04 Proceedings of the 2004 Chinese academy of sciences conference on Data Mining and Knowledge Management | 2004

A multiple-criteria quadratic programming approach to network intrusion detection

Gang Kou; Yi Peng; Yong Shi; Zhengxin Chen; Xiaojun Chen

The early and reliable detection and deterrence of malicious attacks, both from external and internal sources are a crucial issue for today’s e-business. There are various methods available today for intrusion detection; however, every method has its limitations and new approaches should still be explored. The objectives of this study are twofold: one is to discuss the formulation of Multiple Criteria Quadratic Programming (MCQP) approach, and to investigate the applicability of the quadratic classification method to the intrusion detection problem. The demonstration of successful Multiple Criteria Quadratic Programming application in intrusion detection can add another option to network security toolbox. The classification results are examined by cross-validation and improved by an ensemble method. The results demonstrated that MCQP is excellent and stable. Furthermore, the outcome of MCQP can be improved by the ensemble method.

international conference on computational science | 2005

Improving clustering analysis for credit card accounts classification

Yi Peng; Gang Kou; Yong Shi; Zhengxin Chen

In credit card portfolio management, predicting the cardholders’ behavior is a key to reduce the charge off risk of credit card issuers. The most commonly used methods in predicting credit card defaulters are credit scoring models. Most of these credit scoring models use supervised classification methods. Although these methods have made considerable progress in bankruptcy prediction, they are unsuitable for data records without predefined class labels. Therefore, it is worthwhile to investigate the applicability of unsupervised learning methods in credit card accounts classification. The objectives of this paper are: (1) to explore an unsupervised learning method: cluster analysis, for credit card accounts classification, (2) to improve clustering classification results using ensemble and supervised learning methods. In particular, a general purpose clustering toolkit, CLUTO, from university of Minnesota, was used to classify a real-life credit card dataset and two supervised classification methods, decision tree and multiple-criteria linear programming (MCLP), were used to improve the clustering results. The classification results indicate that clustering can be used to either as a stand-alone classification method or as a preprocess step for supervised classification methods.

international conference on computational science | 2004

Cross-Validation and Ensemble Analyses on Multiple-Criteria Linear Programming Classification for Credit Cardholder Behavior

Yi Peng; Gang Kou; Zhengxin Chen; Yong Shi

In credit card portfolio management, predicting the cardholders’ behavior is a key to reduce the charge off risk of credit card issuers. As a promising data mining approach, multiple criteria linear programming (MCLP) has been successfully applied to classify credit cardholders’ behavior into two or multiple-groups for business intelligence. The objective of this paper is to study the stability of MCLP in classifying credit cardholders’ behavior by using cross-validation and ensemble techniques. An overview of the two-group MCLP model formulation and a description of the dataset used in this paper are introduced first. Then cross-validation and ensemble methods are tested respectively. As the results demonstrated, the classification rates of cross-validation and ensemble methods are close to the rates of using MCLP alone. In other words, MCLP is a relatively stable method in classifying credit cardholders’ behavior.

international conference on conceptual structures | 2007

Application of Classification Methods to Individual Disability Income Insurance Fraud Detection

Yi Peng; Gang Kou; Alan Sabatka; Jeff Matza; Zhengxin Chen; Deepak Khazanchi; Yong Shi

As the number of electronic insurance claims increases each year, it is difficult to detect insurance fraud in a timely manner by manual methods alone. The objective of this study is to use classification modeling techniques to identify suspicious policies to assist manual inspections. The predictive models can label high-risk policies and help investigators to focus on suspicious records and accelerate the claim-handling process. The study uses health insurance data with some known suspicious and normal policies. These known policies are used to train the predictive models. Missing values and irrelevant variables are removed before building predictive models. Three predictive models: Naive Bayes (NB), decision tree, and Multiple Criteria Linear Programming (MCLP), are trained using the claim data. Experimental study shows that NB outperformed decision tree and MCLP in terms of classification accuracy.

Explore More