Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yi Peng is active.

Publication


Featured researches published by Yi Peng.


International Journal of Information Technology and Decision Making | 2007

A descriptive framework for the field of data mining and knowledge discovery

Yong Shi; Zhengxin Chen; Yi Peng

As our abilities to collect and store various types of datasets are continually increasing, the demands for advanced techniques and tools to understand and make use of these large data keep growing. No single existing field is capable of satisfying the needs. Data Mining and Knowledge Discovery (DMKD), which utilizes methods, techniques, and tools from diverse disciplines, emerged in last decade to solve this problem. It brings knowledge and theories from several fields including databases, machine learning, optimization, statistics, and data visualization and has been applied to various real-life applications. Even though data mining has made significant progress during the past fifteen years, most research effort is devoted to developing effective and efficient algorithms that can extract knowledge from data and not enough attention has been paid to the philosophical foundations of data mining. The objective of this research is to provide a conceptual framework that identifies major research areas of data mining and knowledge discovery (DMKD) for students and beginners and describe the longitudinal changes of DMKD research activities. Using the textual documents collected from premier DMKD journals, conference proceedings, syllabi, and dissertations, this study is intended to address the following issues: What are the major subjects of this field? What is the central theme? What are the connections among these subjects? What are the longitudinal changes of DMKD research? To answer these questions, this research uses a combination of grounded theory and document clustering. The result will represent previous and current DMKD research activities in the form of a framework. The resulting framework should allow people to comprehend the entire domain of DMKD research and assist identification of areas in need of more research efforts.


International Journal of Information Technology and Decision Making | 2005

CLASSIFYING CREDIT CARD ACCOUNTS FOR BUSINESS INTELLIGENCE AND DECISION MAKING: A MULTIPLE-CRITERIA QUADRATIC PROGRAMMING APPROACH

Yong Shi; Yi Peng; Gang Kou; Zhengxin Chen

A major challenge in credit card portfolio management is to classify and predict credit cardholders behaviors in a reliable precision because cardholders behaviors are rather dynamic in nature. This is crucial for creditors because it allows them to take proactive actions and minimize charge-off and bankruptcy losses. Although the methods used in the area of credit portfolio management have improved significantly, the demand for alternative and sophisticated analytical tools is still strong.The objective of this paper is to propose a multiple criteria quadratic programming (MCQP) to classify credit card accounts for business intelligence and decision making. MCQP is intended to predict credit cardholders behaviors from a nonlinear perspective that is justifiable because both the objective functions and constraints in credit card accounts classification may be nonlinear. Using a real-life credit card dataset from a major US bank, the MCQP method is compared with popular and similar classification methods: linear discriminant analysis, decision tree, multiple criteria linear programming, support vector machine, and neural network. The results indicate that MCQP is a promising business intelligence method in credit card portfolio management.


Optimization Methods & Software | 2003

Multiple criteria linear programming approach to data mining: Models, algorithm designs and software development

Gang Kou; Xiantao Liu; Yi Peng; Yong Shi; Morgan Wise; Weixuan Xu

It is well known that data mining has been implemented by statistical regressions, induction decision tree, neural networks, rough set, fuzzy set and etc. This paper promotes a multiple criteria linear programming (MCLP) approach to data mining based on linear discriminant analysis. This paper first describes the fundamental connections between MCLP and data mining, including several general models of MCLP approaches. Given the general models, it focuses on a designing architecture of MCLP-data mining algorithms in terms of a process of real-life business intelligence. This architecture consists of finding MCLP solutions, preparing mining scores, and interpreting the knowledge patterns. Secondly, this paper elaborates the software development of the MCLP-data mining algorithms. Based on a pseudo coding, two versions of software (SAS- and Linux-platform) will be discussed. Finally, the software performance analysis over business and experimental databases is reported to show its mining and prediction power. As a part of the performance analysis, a series of data testing comparisons between the MCLP and induction decision tree approaches are demonstrated. These findings suggest that the MCLP-data mining techniques have a great potential in discovering knowledge patterns from a large-scale real-life database or data warehouse.


Annals of Operations Research | 2005

Discovering Credit Cardholders' Behavior by Multiple Criteria Linear Programming

Gang Kou; Yi Peng; Yong Shi; Morgan Wise; Weixuan Xu

In credit card portfolio management, predicting the cardholder’s spending behavior is a key to reduce the risk of bankruptcy. Given a set of attributes for major aspects of credit cardholders and predefined classes for spending behaviors, this paper proposes a classification model by using multiple criteria linear programming to discover behavior patterns of credit cardholders. It shows a general classification model that can theoretically handle any class-size. Then, it focuses on a typical case where the cardholders’ behaviors are predefined as four classes. A dataset from a major US bank is used to demonstrate the applicability of the proposed method.


CASDMKM'04 Proceedings of the 2004 Chinese academy of sciences conference on Data Mining and Knowledge Management | 2004

A multiple-criteria quadratic programming approach to network intrusion detection

Gang Kou; Yi Peng; Yong Shi; Zhengxin Chen; Xiaojun Chen

The early and reliable detection and deterrence of malicious attacks, both from external and internal sources are a crucial issue for today’s e-business. There are various methods available today for intrusion detection; however, every method has its limitations and new approaches should still be explored. The objectives of this study are twofold: one is to discuss the formulation of Multiple Criteria Quadratic Programming (MCQP) approach, and to investigate the applicability of the quadratic classification method to the intrusion detection problem. The demonstration of successful Multiple Criteria Quadratic Programming application in intrusion detection can add another option to network security toolbox. The classification results are examined by cross-validation and improved by an ensemble method. The results demonstrated that MCQP is excellent and stable. Furthermore, the outcome of MCQP can be improved by the ensemble method.


international conference on computational science | 2005

Improving clustering analysis for credit card accounts classification

Yi Peng; Gang Kou; Yong Shi; Zhengxin Chen

In credit card portfolio management, predicting the cardholders’ behavior is a key to reduce the charge off risk of credit card issuers. The most commonly used methods in predicting credit card defaulters are credit scoring models. Most of these credit scoring models use supervised classification methods. Although these methods have made considerable progress in bankruptcy prediction, they are unsuitable for data records without predefined class labels. Therefore, it is worthwhile to investigate the applicability of unsupervised learning methods in credit card accounts classification. The objectives of this paper are: (1) to explore an unsupervised learning method: cluster analysis, for credit card accounts classification, (2) to improve clustering classification results using ensemble and supervised learning methods. In particular, a general purpose clustering toolkit, CLUTO, from university of Minnesota, was used to classify a real-life credit card dataset and two supervised classification methods, decision tree and multiple-criteria linear programming (MCLP), were used to improve the clustering results. The classification results indicate that clustering can be used to either as a stand-alone classification method or as a preprocess step for supervised classification methods.


international conference on computational science | 2004

Cross-Validation and Ensemble Analyses on Multiple-Criteria Linear Programming Classification for Credit Cardholder Behavior

Yi Peng; Gang Kou; Zhengxin Chen; Yong Shi

In credit card portfolio management, predicting the cardholders’ behavior is a key to reduce the charge off risk of credit card issuers. As a promising data mining approach, multiple criteria linear programming (MCLP) has been successfully applied to classify credit cardholders’ behavior into two or multiple-groups for business intelligence. The objective of this paper is to study the stability of MCLP in classifying credit cardholders’ behavior by using cross-validation and ensemble techniques. An overview of the two-group MCLP model formulation and a description of the dataset used in this paper are introduced first. Then cross-validation and ensemble methods are tested respectively. As the results demonstrated, the classification rates of cross-validation and ensemble methods are close to the rates of using MCLP alone. In other words, MCLP is a relatively stable method in classifying credit cardholders’ behavior.


international conference on conceptual structures | 2007

Application of Classification Methods to Individual Disability Income Insurance Fraud Detection

Yi Peng; Gang Kou; Alan Sabatka; Jeff Matza; Zhengxin Chen; Deepak Khazanchi; Yong Shi

As the number of electronic insurance claims increases each year, it is difficult to detect insurance fraud in a timely manner by manual methods alone. The objective of this study is to use classification modeling techniques to identify suspicious policies to assist manual inspections. The predictive models can label high-risk policies and help investigators to focus on suspicious records and accelerate the claim-handling process. n nThe study uses health insurance data with some known suspicious and normal policies. These known policies are used to train the predictive models. Missing values and irrelevant variables are removed before building predictive models. Three predictive models: Naive Bayes (NB), decision tree, and Multiple Criteria Linear Programming (MCLP), are trained using the claim data. Experimental study shows that NB outperformed decision tree and MCLP in terms of classification accuracy.


international conference on service systems and service management | 2006

Application of Clustering Methods to Health Insurance Fraud Detection

Yi Peng; Gang Kou; Alan Sabatka; Zhengxin Chen; Deepak Khazanchi; Yong Shi

Health insurance fraud detection is an important and challenging task. Traditionally, insurance companies use human inspections and heuristic rules to detect fraud. As the size of databases increases, the traditional approaches may miss a great portion of fraud for two main reasons. First, it is impossible to detect all health care fraud by manual inspection over large databases. Second, new types of health care fraud emerge constantly. SQL operations based on heuristic rules cannot identify those new emerging fraud schemes. Such a situation demands more sophisticated analytical methods and techniques that are capable of detecting fraud activities from large databases. The goal of this paper is to understand and detect suspicious health care frauds from large databases using clustering technique. Specifically, this paper applies two clustering methods, SAS EM and CLUTO, to a large real-life health insurance dataset and compares the performances of these two methods


international conference on service systems and service management | 2006

Recent trends in Data Mining (DM): Document Clustering of DM Publications

Yi Peng; Gang Kou; Zhengxin Chen; Yong Shi

Data mining (DM) brings knowledge and theories from several fields including databases, machine learning, optimization, statistics, and data visualization and has been applied to various real-life applications. A large amount of data mining articles have been published. The goal of this study is to establish an overview of the past and current data mining research activities from the title and abstract more than 1400 textual documents collected from premier data mining journals and conference proceedings. Specifically, this study applied document clustering approaches to determine which subjects had been studied over the last several years, which subjects are currently popular, and describe the longitudinal changes of data mining publications

Collaboration


Dive into the Yi Peng's collaboration.

Top Co-Authors

Avatar

Yong Shi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Gang Kou

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Zhengxin Chen

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Weixuan Xu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Deepak Khazanchi

University of Nebraska Omaha

View shared research outputs
Top Co-Authors

Avatar

Morgan Wise

National Australia Bank

View shared research outputs
Top Co-Authors

Avatar

Xiantao Liu

American Petroleum Institute

View shared research outputs
Top Co-Authors

Avatar

Jia Wan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Xinyang Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Yajun Guo

Second Military Medical University

View shared research outputs
Researchain Logo
Decentralizing Knowledge