Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xiao-Bai Li is active.

Publication


Featured researches published by Xiao-Bai Li.


European Journal of Operational Research | 1999

A multiple criteria approach to data envelopment analysis

Xiao-Bai Li; Gary R. Reeves

In this paper, we present a Multiple Criteria Data Envelopment Analysis (MCDEA) model which can be used to improve discriminating power of DEA methods and also effectively yield more reasonable input and output weights without a priori information about the weights. In the proposed model, several different efficiency measures, including classical DEA efficiency, are defined under the same constraints. Each measure serves as a criterion to be optimized. Efficiencies are then evaluated under the framework of multiple objective linear programming (MOLP). The method is illustrated through three examples in which data sets are taken from previous research on DEAs discriminating power and weight restriction.


IEEE Transactions on Automatic Control | 2005

General model-set design methods for multiple-model approach

X.R. Li; Zhanlue Zhao; Xiao-Bai Li

Multiple-model approach provides the state-of-the-art solutions to many problems involving estimation, filtering, control, and/or modeling. One of the most important problems in the application of the multiple-model approach is the design of the model set used in a multiple-model algorithm. To our knowledge, however, it has never been addressed systematically in the literature. This paper deals with this challenging topic in a general setting. General problems of model-set design are considered. A concept of a random model is introduced. In other words, modeling of models used in a multiple model (MM) algorithm as well as the true model as random variables is proposed. Three classes of general methods for optimal design of model sets-by minimizing distribution mismatch, minimizing modal distance, and moment matching, respectively-are proposed. Theoretical results that address many of the associated issues are presented. Examples that demonstrate how some of these theoretical results can be used as well as their effectiveness are given. Many of the general results presented in this paper are also useful for performance evaluation of MM algorithms.


Informs Journal on Computing | 2001

A Dynamic Programming Based Pruning Method for Decision Trees

Xiao-Bai Li; James R. Sweigart; James T. C. Teng; Joan M. Donohue; Lori A. Thombs

This paper concerns a decision-tree pruning method, a key issue in the development of decision trees. We propose a new method that applies the classical optimization technique, dynamic programming, to a decision-tree pruning procedure. We show that the proposed method generates a sequence of pruned trees that are optimal with respect to tree size. The dynamic-programming-based pruning (DPP) algorithm is then compared with cost-complexity pruning (CCP) in an experimental study. The results of our study indicate that DPP performs better than CCP in terms of classification accuracy.


decision support systems | 2009

Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

Dan Zhu; Xiao-Bai Li; Shuning Wu

Identity disclosure is one of the most serious privacy concerns in todays information age. A well-known method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the dataset cannot be distinguished from at least k-1 individuals whose information also appears in the dataset. There is a flaw in k-anonymity that would still allow an intruder to discern the confidential information of individuals in the anonymized data. To overcome this problem, we propose a data reconstruction approach to achieve k-anonymity protection in predictive data mining. In this approach, the potentially identifying attributes are first masked using aggregation (for numeric data) and swapping (for nominal data). A genetic algorithm technique is then applied to the masked data to find a good subset of it. This subset is then replicated to form the released dataset that satisfies the k-anonymity constraint.


European Journal of Operational Research | 2008

Adaptive data reduction for large-scale transaction data

Xiao-Bai Li; Varghese S. Jacob

Abstract Data reduction is an important issue in the field of data mining. The goal of data reduction techniques is to extract a subset of data from a massive dataset while maintaining the properties and characteristics of the original data in the reduced set. This allows an otherwise difficult or impossible data mining task to be carried out efficiently and effectively. This paper describes a new method for selecting a subset of data that closely represents the original data in terms of its joint and univariate distributions. A pair of distance criteria, motivated by the χ2-statistic, are used for measuring the goodness-of-fit between the distributions of the reduced and full datasets. Under these criteria, the data reduction problem can be formulated as a bi-objective quadratic program. A genetic algorithm technique is used in the search/optimization process. Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.


Information Systems Research | 2011

Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data

Xiao-Bai Li; Sumit Sarkar

Record linkage techniques have been widely used in areas such as antiterrorism, crime analysis, epidemiologic research, and database marketing. On the other hand, such techniques are also being increasingly used for identity matching that leads to the disclosure of private information. These techniques can be used to effectively reidentify records even in deidentified data. Consequently, the use of such techniques can lead to individual privacy being severely eroded. Our study addresses this important issue and provides a solution to resolve the conflict between privacy protection and data utility. We propose a data-masking method for protecting private information against record linkage disclosure that preserves the statistical properties of the data for legitimate analysis. Our method recursively partitions a data set into smaller subsets such that data records within each subset are more homogeneous after each partition. The partition is made orthogonal to the maximum variance dimension represented by the first principal component in each partitioned set. The attribute values of a record in a subset are then masked using a double-bounded swapping method. The proposed method, which we call multivariate swapping trees, is nonparametric in nature and does not require any assumptions about statistical distributions of the original data. Experiments conducted on real-world data sets demonstrate that the proposed approach significantly outperforms existing methods in terms of both preventing identity disclosure and preserving data quality.


systems man and cybernetics | 2012

Evaluation of Estimation Algorithms: Credibility Tests

X.R. Li; Zhanlue Zhao; Xiao-Bai Li

Assessments of estimation performance are often available. For example, many statistical estimators and filters provide assessments of the first two moments of their own estimation error (i.e., mean-square error [MSE] or error covariance matrix and bias). Are these assessments credible in that they reflect the true situation? The paper addresses this important yet little studied topic, referred to as the credibility of the assessments (or the estimators that make the assessments). We define the concept of credibility and formulate three classes of commonly encountered credibility-testing problems: MSE alone, bias alone, and MSE and bias jointly. Taking advantage of results in multivariate statistical analysis, we present several statistical tests for the credibility problems formulated and analyze and discuss in detail pros and cons of the proposed tests, contrasting with the existing test. How these tests can be used and how they perform are illustrated by representative numerical examples. For the existing MSE credibility test, we explain its underlying principle and analyze, discuss, and demonstrate its drawbacks and limitations. We also propose a test for comparing different credibility assessments.


Journal of Data and Information Quality | 2009

A Bayesian Approach for Estimating and Replacing Missing Categorical Data

Xiao-Bai Li

We propose a new approach for estimating and replacing missing categorical data. With this approach, the posterior probabilities of a missing attribute value belonging to a certain category are estimated using the simple Bayes method. Two alternative methods for replacing the missing value are proposed: The first replaces the missing value with the value having the estimated maximum probability; the second uses a value that is selected with probability proportional to the estimated posterior distribution. The effectiveness of the proposed approach is evaluated based on some important data quality measures for data warehousing and data mining. The results of the experimental study demonstrate the effectiveness of the proposed approach.


Management Science | 2013

Class-Restricted Clustering and Microperturbation for Data Privacy

Xiao-Bai Li; Sumit Sarkar

The extensive use of information technologies by organizations to collect and share personal data has raised strong privacy concerns. To respond to the publics demand for data privacy, a class of clustering-based data masking techniques is increasingly being used for privacy-preserving data sharing and analytics. Traditional clustering-based approaches for masking numeric attributes, while addressing re-identification risks, typically do not consider the disclosure risk of categorical confidential attributes. We propose a new approach to deal with this problem. The proposed method clusters data such that the data points within a group are similar in the non-confidential attribute values whereas the confidential attribute values within a group are well distributed. To accomplish this, the clustering method, which is based on a minimum spanning tree (MST) technique, uses two risk-utility tradeoff measures in the growing and pruning stages of the MST technique respectively. As part of our approach we also propose a novel cluster-level micro-perturbation method for masking data that overcomes a common problem of traditional clustering-based methods for data masking, which is their inability to preserve important statistical properties such as the variance of attributes and the covariance across attributes. We show that the mean vector and the covariance matrix of the masked data generated using the micro-perturbation method are unbiased estimates of the original mean vector and covariance matrix. An experimental study on several real-world datasets demonstrates the effectiveness of the proposed approach.


business information systems | 2013

Developing privacy solutions for sharing and analysing healthcare data

Luvai Motiwalla; Xiao-Bai Li

The extensive use of electronic health data has increased privacy concerns. While most healthcare organizations are conscientious in protecting their data in their databases, very few organizations take enough precautions to protect data that is shared with third party organizations. Recently the regulatory environment has tightened the laws to enforce privacy protection. The goal of this research is to explore the application of data masking solutions for protecting patient privacy when data is shared with external organizations for research, analysis and other similar purposes. Specifically, this research project develops a system that protects data without removing sensitive attributes. Our application allows high quality data analysis with the masked data. Dataset-level properties and statistics remain approximately the same after data masking; however, individual record-level values are altered to prevent privacy disclosure. A pilot evaluation study on large real-world healthcare data shows the effectiveness of our solution in privacy protection.

Collaboration


Dive into the Xiao-Bai Li's collaboration.

Top Co-Authors

Avatar

Sumit Sarkar

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar

Luvai Motiwalla

University of Massachusetts Lowell

View shared research outputs
Top Co-Authors

Avatar

Xiaoping Liu

University of Massachusetts Lowell

View shared research outputs
Top Co-Authors

Avatar

James R. Sweigart

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar

X.R. Li

University of New Orleans

View shared research outputs
Top Co-Authors

Avatar

Dan Zhu

Iowa State University

View shared research outputs
Top Co-Authors

Avatar

Hua Zheng

University of Massachusetts Medical School

View shared research outputs
Top Co-Authors

Avatar

James T. C. Teng

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joan M. Donohue

University of South Carolina

View shared research outputs
Researchain Logo
Decentralizing Knowledge