Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Xingye Qiao is active.

Publication


Featured researches published by Xingye Qiao.


Nature Medicine | 2015

The oral and gut microbiomes are perturbed in rheumatoid arthritis and partly normalized after treatment

Dongya Zhang; Huijue Jia; Qiang Feng; Donghui Wang; Di Liang; Xiang-ni Wu; Junhua Li; Longqing Tang; Yin Li; Zhou Lan; Bing Chen; Yanli Li; Huanzi Zhong; Hailiang Xie; Zhuye Jie; Weineng Chen; Shanmei Tang; Xiaoqiang Xu; Xiaokai Wang; Xianghang Cai; Sheng Liu; Yan Xia; Jiyang Li; Xingye Qiao; Jumana Y. Al-Aama; Hua Chen; Wang L; Qingjun Wu; Fengchun Zhang; Wenjie Zheng

We carried out metagenomic shotgun sequencing and a metagenome-wide association study (MGWAS) of fecal, dental and salivary samples from a cohort of individuals with rheumatoid arthritis (RA) and healthy controls. Concordance was observed between the gut and oral microbiomes, suggesting overlap in the abundance and function of species at different body sites. Dysbiosis was detected in the gut and oral microbiomes of RA patients, but it was partially resolved after RA treatment. Alterations in the gut, dental or saliva microbiome distinguished individuals with RA from healthy controls, were correlated with clinical measures and could be used to stratify individuals on the basis of their response to therapy. In particular, Haemophilus spp. were depleted in individuals with RA at all three sites and negatively correlated with levels of serum autoantibodies, whereas Lactobacillus salivarius was over-represented in individuals with RA at all three sites and was present in increased amounts in cases of very active RA. Functionally, the redox environment, transport and metabolism of iron, sulfur, zinc and arginine were altered in the microbiota of individuals with RA. Molecular mimicry of human antigens related to RA was also detectable. Our results establish specific alterations in the gut and oral microbiomes in individuals with RA and suggest potential ways of using microbiome composition for prognosis and diagnosis.


Journal of the American Statistical Association | 2010

Weighted Distance Weighted Discrimination and Its Asymptotic Properties

Xingye Qiao; Hao Helen Zhang; Yufeng Liu; Michael J. Todd; J. S. Marron

While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the development of optimal weighting schemes for various nonstandard classification problems. In addition, we discuss several alternative criteria and propose an adaptive weighting scheme (awDWD) and demonstrate its advantages over nonadaptive weighting schemes under some situations. The second major contribution is a theoretical study of weighted DWD. Both high-dimensional low sample-size asymptotics and Fisher consistency of DWD are studied. The performance of weighted DWD is evaluated using simulated examples and two real data examples. The theoretical results are also confirmed by simulations.


Biometrics | 2009

Adaptive weighted learning for unbalanced multicategory classification.

Xingye Qiao; Yufeng Liu

In multicategory classification, standard techniques typically treat all classes equally. This treatment can be problematic when the dataset is unbalanced in the sense that certain classes have very small class proportions compared to others. The minority classes may be ignored or discounted during the classification process due to their small proportions. This can be a serious problem if those minority classes are important. In this article, we study the problem of unbalanced classification and propose new criteria to measure classification accuracy. Moreover, we propose three different weighted learning procedures, two one-step weighted procedures, as well as one adaptive weighted procedure. We demonstrate the advantages of the new procedures, using multicategory support vector machines, through simulated and real datasets. Our results indicate that the proposed methodology can handle unbalanced classification problems effectively.


Procedia Computer Science | 2011

Partial Least Squares (PLS) Applied to Medical Bioinformatics

Walker H. Land; William S. Ford; Jin-Woo Park; Ravi Mathur; Nathan Hotchkiss; John J. Heine; Steven Eschrich; Xingye Qiao; Timothy J. Yeatman

Abstract PLS initially creates uncorrelated latent variables which are linear combinations of the original input vectors Xi, where weights are used to determine linear combinations, which are proportional to the covariance. Secondly, a least squares regression is then performed on the subset of extracted latent variables that lead to a lower and biased variance on transformed data. This process, leads to a lower variance estimate of the regression coefficients when compared to the Ordinary Least Squares regression approach. Classical Principal Component Analysis (PCA), linear PLS and kernel ridge regression (KRR) techniques are well known shrinkage estimators designed to deal with multi- collinearity, which can be a serious problem. That is, multi-collinearity can dramatically influence the effectiveness of a regression model by changing the values and signs of estimated regression coefficients given different but similar data samples, thereby leading to a regression model which represents training data reasonably well, but generalizes poorly to validation and test data. We explain how to address these problems, which is followed by performing a PLS hypotheses driven preliminary research study and sensitivities analysis by not doing a combinatorial analysis as PLS will eliminate the unnecessary variables using a microarray colon cancer data set. Research studies as well as preliminary results are described in the results section.


Procedia Computer Science | 2012

PNN/GRNN Ensemble Processor Design for Early Screening of Breast Cancer

Walker H. Land; Xinpei Ma; Erin Barnes; Xingye Qiao; John J. Heine; Timothy Masters; Jin-Woo Park

Abstract Breast cancer screening has reference to screening of asymptomatic, generally healthy women for breast cancer, to identify those who should receive a follow up check. Early screening can detect non-invasive ductal carcinoma in situ (called “pre breast cancer”), which almost never forms a lump and is generally non-detectible, except by mammography. This paper will describe the design and preliminary evaluation of this PNN/GRNN ensemble pre-screener, in the context of a possible pre-screening protocol, which may, if required, include other data. The results show that using the ensemble technique provides almost a 20% AUC increase over the average standalone PNN and almost 10% over the best performing PNN.


Biometrics | 2014

A statistical approach to set classification by feature selection with applications to classification of histopathology images

Sungkyu Jung; Xingye Qiao

Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with N sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.


Procedia Computer Science | 2012

GRNN Ensemble Classifier for Lung Cancer Prognosis Using Only Demographic and TNM features

J. David Schaffer; Jin-Woo Park; Erin Barnes; Qiyi Lu; Xingye Qiao; Youping Deng; Yan Li; Walker H. Land

Abstract Predicting the recurrence of non-small cell lung cancer remains a clinical challenge. The current best practice employs heuristic decisions based on the TNM classification scheme that many believe can be improved upon. Much research has recently been devoted to searching for gene signatures derived from gene expression microarrays for this challenge, but a consensus signature is still elusive. We present an approach to first create a benchmark for recurrence prediction based only upon gender, age and TNM features that uses several learning classifier induction methods and combines them into an ensemble using a recent extension of the general regression neural network. Using this approach on a pooled sample of 422 patients from two previously published studies (Shedden and Raponi), we demonstrate error rates in the low 20% for both false positives and negatives. Future work will focus on discovering if gene signatures can be discovered that can improve this performance.


Procedia Computer Science | 2011

A complex adaptive system using statistical learning theory as an inline preprocess for clinical survival analysis

Dan Margolis; Walker H. Land; Ronald Gottlieb; Xingye Qiao

Abstract New advances in medicine have led to a disparity between the existing information about patients and the ability of clinicians to utilize it. Lack of training and incompatibility with clinical techniques has made the use of the complex adaptive systems approach difficult. To avoid this, we used statistical learning theory as an inline preprocess between existing data collection methods and clinical analysis of data. Clinicians would be able to use this system without any changes to their techniques, while improving accuracy. We used data from CT scans of patients with metastatic carcinoma to predict prognosis. Specifically, we used the standard for evaluating response to treatment, RECIST, and new qualitative and quantitative features. An Evolutionary Programming trained Support Vector Machine (EP-SVM), was used to preprocess the data for two traditional survival analysis techniques: Cox Proportional Hazard Models and Kaplan Meier curves. This was compared to Logistic Regression (LR) and using cutoff points. Analyses were also done to compare different inputs and different radiologists. The EP-SVM outperformed both LR and the cutoff method significantly and allowed us to both intelligently combine data from multiple sources and identify the most predictive features without necessitating changes in clinical methods.


Statistical Analysis and Data Mining | 2018

Sparse Fisher's linear discriminant analysis for partially labeled data

Qiyi Lu; Xingye Qiao

Classification is an important tool with many useful applications. Among the many classification methods, Fishers Linear Discriminant Analysis (LDA) is a traditional model-based approach which makes use of the covariance information. However, in the high-dimensional, low-sample size setting, LDA cannot be directly deployed because the sample covariance is not invertible. While there are modern methods designed to deal with high-dimensional data, they may not fully use the covariance information as LDA does. Hence in some situations, it is still desirable to use a model-based method such as LDA for classification. This article exploits the potential of LDA in more complicated data settings. In many real applications, it is costly to manually place labels on observations; hence it is often that only a small portion of labeled data is available while a large number of observations are left without a label. It is a great challenge to obtain good classification performance through the labeled data alone, especially when the dimension is greater than the size of the labeled data. In order to overcome this issue, we propose a semi-supervised sparse LDA classifier to take advantage of the seemingly useless unlabeled data. They provide additional information which helps to boost the classification performance in some situations. A direct estimation method is used to reconstruct LDA and achieve the sparsity; meanwhile we employ the difference-convex algorithm to handle the non-convex loss function associated with the unlabeled data. Theoretical properties of the proposed classifier are studied. Our simulated examples help to understand when and how the information extracted from the unlabeled data can be useful. A real data example further illustrates the usefulness of the proposed method.


Journal of the American Statistical Association | 2016

Stabilized Nearest Neighbor Classifier and its Statistical Properties

Will Wei Sun; Xingye Qiao; Guang Cheng

ABSTRACT The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of the scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this article, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn. Supplementary materials for this article are available online.

Collaboration


Dive into the Xingye Qiao's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John J. Heine

University of South Florida

View shared research outputs
Top Co-Authors

Avatar

Qiyi Lu

Binghamton University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yufeng Liu

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J. S. Marron

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge