Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bowei Xi is active.

Publication


Featured researches published by Bowei Xi.


international world wide web conferences | 2004

A smart hill-climbing algorithm for application server configuration

Bowei Xi; Zhen Liu; Mukund Raghavachari; Cathy H. Xia; Li Zhang

The overwhelming success of the Web as a mechanism for facilitating information retrieval and for conducting business transactions has ledto an increase in the deployment of complex enterprise applications. These applications typically run on Web Application Servers, which assume the burden of managing many tasks, such as concurrency, memory management, database access, etc., required by these applications. The performance of an Application Server depends heavily on appropriate configuration. Configuration is a difficult and error-prone task dueto the large number of configuration parameters and complex interactions between them. We formulate the problem of finding an optimal configuration for a given application as a black-box optimization problem. We propose a smart hill-climbing algorithm using ideas of importance sampling and Latin Hypercube Sampling (LHS). The algorithm is efficient in both searching and random sampling. It consists of estimating a local function, and then, hill-climbing in the steepest descent direction. The algorithm also learns from past searches and restarts in a smart and selective fashion using the idea of importance sampling. We have carried out extensive experiments with an on-line brokerage application running in a WebSphere environment. Empirical results demonstrate that our algorithm is more efficient than and superior to traditional heuristic methods.


Analytica Chimica Acta | 2011

Principal component directed partial least squares analysis for combining nuclear magnetic resonance and mass spectrometry data in metabolomics: Application to the detection of breast cancer

Haiwei Gu; Zhengzheng Pan; Bowei Xi; Vincent Asiago; Brian Musselman; Daniel Raftery

Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are the two most commonly used analytical tools in metabolomics, and their complementary nature makes the combination particularly attractive. A combined analytical approach can improve the potential for providing reliable methods to detect metabolic profile alterations in biofluids or tissues caused by disease, toxicity, etc. In this paper, (1)H NMR spectroscopy and direct analysis in real time (DART)-MS were used for the metabolomics analysis of serum samples from breast cancer patients and healthy controls. Principal component analysis (PCA) of the NMR data showed that the first principal component (PC1) scores could be used to separate cancer from normal samples. However, no such obvious clustering could be observed in the PCA score plot of DART-MS data, even though DART-MS can provide a rich and informative metabolic profile. Using a modified multivariate statistical approach, the DART-MS data were then reevaluated by orthogonal signal correction (OSC) pretreated partial least squares (PLS), in which the Y matrix in the regression was set to the PC1 score values from the NMR data analysis. This approach, and a similar one using the first latent variable from PLS-DA of the NMR data resulted in a significant improvement of the separation between the disease samples and normals, and a metabolic profile related to breast cancer could be extracted from DART-MS. The new approach allows the disease classification to be expressed on a continuum as opposed to a binary scale and thus better represents the disease and healthy classifications. An improved metabolic profile obtained by combining MS and NMR by this approach may be useful to achieve more accurate disease detection and gain more insight regarding disease mechanisms and biology.


Archive | 2006

Network tomography: A review and recent developments

Earl Lawrence; George Michailidis; Vijayan N. Nair; Bowei Xi

The modeling and analysis of computer communications networks give rise to a variety of interesting statistical problems. This paper focuses on network tomography, a term used to characterize two classes of large-scale inverse problems. The first deals with passive tomography where aggregate data are collected at the individual router/node level and the goal is to recover path-level information. The main problem of interest here is the estimation of the origin-destination traffic matrix. The second, referred to as active tomography, deals with reconstructing link-level information from end-to-end path-level measurements obtained by actively probing the network. The primary application in this case is estimation of quality-of-service parameters such as loss rates and delay distributions. The paper provides a review of the statistical issues and developments in network tomography with an emphasis on active tomography. An application to Internet telephony is used to illustrate the results.


NMR in Biomedicine | 2009

1H NMR metabolomics study of age profiling in children

Haiwei Gu; Zhengzheng Pan; Bowei Xi; Bryan E. Hainline; Narasimhamurthy Shanaiah; Vincent Asiago; G. A. Nagana Gowda; Daniel Raftery

Metabolic profiling of urine provides a fingerprint of personalized endogenous metabolite markers that correlate to a number of factors such as gender, disease, diet, toxicity, medication, and age. It is important to study these factors individually, if possible to unravel their unique contributions. In this study, age‐related metabolic changes in children of age 12 years and below were analyzed by 1H NMR spectroscopy of urine. The effect of age on the urinary metabolite profile was observed as a distinct age‐dependent clustering even from the unsupervised principal component analysis. Further analysis, using partial least squares with orthogonal signal correction regression with respect to age, resulted in the identification of an age‐related metabolic profile. Metabolites that correlated with age included creatinine, creatine, glycine, betaine/TMAO, citrate, succinate, and acetone. Although creatinine increased with age, all the other metabolites decreased. These results may be potentially useful in assessing the biological age (as opposed to chronological) of young humans as well as in providing a deeper understanding of the confounding factors in the application of metabolomics. Copyright


Journal of the American Statistical Association | 2006

Estimating network loss rates using active tomography

Bowei Xi; George Michailidis; Vijayan N. Nair

Active network tomography refers to an interesting class of large-scale inverse problems that arise in estimating the quality of service parameters of computer and communications networks. This article focuses on estimation of loss rates of the internal links of a network using end-to-end measurements of nodes located on the periphery. A class of flexible experiments for actively probing the network is introduced, and conditions under which all of the link-level information is estimable are obtained. Maximum likelihood estimation using the EM algorithm, the structure of the algorithm, and the properties of the maximum likelihood estimators are investigated. This includes simulation studies using the ns (network simulator) to obtain realistic network traffic. The optimal design of probing experiments is also studied. Finally, application of the results to network monitoring is briefly illustrated.


Methods of Molecular Biology | 2014

Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

Bowei Xi; Haiwei Gu; Hamid Baniasadi; Daniel Raftery

Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.


Data Mining and Knowledge Discovery | 2011

Classifier evaluation and attribute selection against active adversaries

Murat Kantarcioglu; Bowei Xi; Chris Clifton

Many data mining applications, such as spam filtering and intrusion detection, are faced with active adversaries. In all these applications, the future data sets and the training data set are no longer from the same population, due to the transformations employed by the adversaries. Hence a main assumption for the existing classification techniques no longer holds and initially successful classifiers degrade easily. This becomes a game between the adversary and the data miner: The adversary modifies its strategy to avoid being detected by the current classifier; the data miner then updates its classifier based on the new threats. In this paper, we investigate the possibility of an equilibrium in this seemingly never ending game, where neither party has an incentive to change. Modifying the classifier causes too many false positives with too little increase in true positives; changes by the adversary decrease the utility of the false negative items that are not detected. We develop a game theoretic framework where equilibrium behavior of adversarial classification applications can be analyzed, and provide solutions for finding an equilibrium point. A classifier’s equilibrium performance indicates its eventual success or failure. The data miner could then select attributes based on their equilibrium performance, and construct an effective classifier. A case study on online lending data demonstrates how to apply the proposed game theoretic framework to a real application.


knowledge discovery and data mining | 2012

Adversarial support vector machine learning

Yan Zhou; Murat Kantarcioglu; Bhavani M. Thuraisingham; Bowei Xi

Many learning tasks such as spam filtering and credit card fraud detection face an active adversary that tries to avoid detection. For learning problems that deal with an active adversary, it is important to model the adversarys attack strategy and develop robust learning models to mitigate the attack. These are the two objectives of this paper. We consider two attack models: a free-range attack model that permits arbitrary data corruption and a restrained attack model that anticipates more realistic attacks that a reasonable adversary would devise under penalties. We then develop optimal SVM learning strategies against the two attack models. The learning algorithms minimize the hinge loss while assuming the adversary is modifying data to maximize the loss. Experiments are performed on both artificial and real data sets. We demonstrate that optimal solutions may be overly pessimistic when the actual attacks are much weaker than expected. More important, we demonstrate that it is possible to develop a much more resilient SVM learning model while making loose assumptions on the data corruption models. When derived under the restrained attack model, our optimal SVM learning strategy provides more robust overall performance under a wide range of attack parameters.


Electronic Journal of Statistics | 2010

Statistical analysis and modeling of Internet VoIP traffic for network engineering

Bowei Xi; Hui Chen; William S. Cleveland; Thomas Telkamp

Network engineering for quality-of-service (QoS) of Internet voice communication (VoIP) can benefit substantially from simulation study of the VoIP packet traffic on a network of routers. This requires accurate statistical models for the packet arrivals to the network from a gateway. The arrival point process is the superposition, or statistical multiplexing, of the arrival processes of packets of individual calls. The packets of each call form a transient point process with on-intervals of transmission and off-intervals of silence. This article presents the development and validation of models for the multiplexed process based on statistical analyses of VoIP traffic from the Global Crossing (GBLX) international network: 48 hr of VoIP arrival times and headers of 1.315 billion packets from 332018 calls. Statistical models and methods involve point processes and their superposition; time series autocorrelations and power spectra; long-range dependence; random effects and hierarchical modeling; bootstrapping; robust estimation; modeling independence and identical distribution; and visualization methods for model building. The result is two models validated by the analyses that can generate accurate synthetic multiplexed packet traffic. One is a semiempirical model: empirical data are a part of the model. The second is a mathematical model: the components are parametric statistical models. This is the first comprehensive modeling of VoIP traffic based on data from a service provider carrying a full range of VoIP applications. The models can be used for simulation of any IP network architecture, wireline or wireless, because the modeling is for the IP-inbound traffic to an IP network. This is achieved because the GBLX data, collected on an IP link, are very close to their properties when they entered the GBLX network. Communicating author. Supported by NSF DMS-0904548. Supported by ARO MURI Award W911NF-08-1-0238. 58 B. Xi et al./VoIP: Analysis and modeling 59 AMS 2000 subject classifications: Primary 62P30, 62-07; secondary 62M10, 62M15.


international conference on tools with artificial intelligence | 2007

Conceptual Clustering Categorical Data with Uncertainty

Yuni Xia; Bowei Xi

Many real datasets have uncertain categorical attribute values that are only approximately measured or imputed. Uncertainty in categorical data is commonplace in many applications, including biological annotation, medial diagnosis and automatic error detection. In such domains, the exact value of an attribute is often unknown, but may be estimated from a number of reasonable alternatives. Current conceptual clustering algorithms do not provide a convenient means for handling this type of uncertainty. In this paper we extend traditional conceptual clustering algorithm to explicitly handle uncertainty in data values. In this paper we propose new total utility (TU) index for measuring the quality of the clustering. And we develop improved algorithms for efficiently clustering uncertain categorical data, based on the COBWEB conceptual clustering algorithm. Experimental results using real datasets demonstrate how these algorithms and new TU measure can effectively improve the performance of clustering through the use of internal probabilistic information.

Collaboration


Dive into the Bowei Xi's collaboration.

Top Co-Authors

Avatar

Murat Kantarcioglu

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Haiwei Gu

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Daniel Raftery

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yan Zhou

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge