Is this you? Create Your Porfile

Xiaoqian Jiang

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xiaoqian Jiang is active.

Explore More

Publication

Featured researches published by Xiaoqian Jiang.

acm symposium on applied computing | 2006

Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data

Hwanjo Yu; Xiaoqian Jiang; Jaideep Vaidya

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What we need is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the non-disclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from the data distributed at multiple parties, without disclosing the data of each party to others. We assume that data is horizontally partitioned -- each party collects the same features of information for different data objects. We quantify the security and efficiency of the proposed method, and highlight future challenges.

knowledge discovery and data mining | 2006

Privacy-Preserving SVM classification on vertically partitioned data

Hwanjo Yu; Jaideep Vaidya; Xiaoqian Jiang

Classical data mining algorithms implicitly assume complete access to all data, either in centralized or federated form. However, privacy and security concerns often prevent sharing of data, thus derailing data mining projects. Recently, there has been growing focus on finding solutions to this problem. Several algorithms have been proposed that do distributed knowledge discovery, while providing guarantees on the non-disclosure of data. Classification is an important data mining problem applicable in many diverse domains. The goal of classification is to build a model which can predict an attribute (binary attribute in this work) based on the rest of attributes. We propose an efficient and secure privacy-preserving algorithm for support vector machine (SVM) classification over vertically partitioned data.

Knowledge and Information Systems | 2008

Privacy-preserving SVM classification

Jaideep Vaidya; Hwanjo Yu; Xiaoqian Jiang

Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

Journal of the American Medical Informatics Association | 2012

iDASH: integrating data for analysis, anonymization, and sharing

Lucila Ohno-Machado; Vineet Bafna; Aziz A. Boxwala; Brian E. Chapman; Wendy W. Chapman; Kamalika Chaudhuri; Michele E. Day; Claudiu Farcas; Nathaniel D. Heintzman; Xiaoqian Jiang; Hyeoneui Kim; Jihoon Kim; Michael E. Matheny; Frederic S. Resnic; Staal A. Vinterbo

iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.

Briefings in Bioinformatics | 2017

Deep learning for healthcare: review, opportunities and challenges.

Riccardo Miotto; Fei Wang; Shuang Wang; Xiaoqian Jiang; Joel T. Dudley

Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.

Journal of the American Medical Informatics Association | 2012

Grid Binary LOgistic REgression (GLORE): building shared models without sharing data

Yuan Wu; Xiaoqian Jiang; Jihoon Kim; Lucila Ohno-Machado

Objective The classification of complex or rare patterns in clinical and genomic data requires the availability of a large, labeled patient set. While methods that operate on large, centralized data sources have been extensively used, little attention has been paid to understanding whether models such as binary logistic regression (LR) can be developed in a distributed manner, allowing researchers to share models without necessarily sharing patient data. Material and methods Instead of bringing data to a central repository for computation, we bring computation to the data. The Grid Binary LOgistic REgression (GLORE) model integrates decomposable partial elements or non-privacy sensitive prediction values to obtain model coefficients, the variance-covariance matrix, the goodness-of-fit test statistic, and the area under the receiver operating characteristic (ROC) curve. Results We conducted experiments on both simulated and clinically relevant data, and compared the computational costs of GLORE with those of a traditional LR model estimated using the combined data. We showed that our results are the same as those of LR to a 10−15 precision. In addition, GLORE is computationally efficient. Limitation In GLORE, the calculation of coefficient gradients must be synchronized at different sites, which involves some effort to ensure the integrity of communication. Ensuring that the predictors have the same format and meaning across the data sets is necessary. Conclusion The results suggest that GLORE performs as well as LR and allows data to remain protected at their original sites.

Journal of the American Medical Informatics Association | 2012

Calibrating predictive model estimates to support personalized medicine

Xiaoqian Jiang; Melanie Osl; Jihoon Kim; Lucila Ohno-Machado

Objective Predictive models that generate individualized estimates for medically relevant outcomes are playing increasing roles in clinical care and translational research. However, current methods for calibrating these estimates lose valuable information. Our goal is to develop a new calibration method to conserve as much information as possible, and would compare favorably to existing methods in terms of important performance measures: discrimination and calibration. Material and methods We propose an adaptive technique that utilizes individualized confidence intervals (CIs) to calibrate predictions. We evaluate this new method, adaptive calibration of predictions (ACP), in artificial and real-world medical classification problems, in terms of areas under the ROC curves, the Hosmer-Lemeshow goodness-of-fit test, mean squared error, and computational complexity. Results ACP compared favorably to other calibration methods such as binning, Platt scaling, and isotonic regression. In several experiments, binning, isotonic regression, and Platt scaling failed to improve the calibration of a logistic regression model, whereas ACP consistently improved the calibration while maintaining the same discrimination or even improving it in some experiments. In addition, the ACP algorithm is not computationally expensive. Limitations The calculation of CIs for individual predictions may be cumbersome for certain predictive models. ACP is not completely parameter-free: the length of the CI employed may affect its results. Conclusions ACP can generate estimates that may be more suitable for individualized predictions than estimates that are calibrated using existing methods. Further studies are necessary to explore the limitations of ACP.

international conference on image processing | 2007

New Directions in Contact Free Hand Recognition

Xiaoqian Jiang; Wanhong Xu; Latanya Sweeney; Yiheng Li; Ralph Gross; Daniel Yurovsky

The ability to quickly compute hand geometry measurements from a freely posed hand offers advantages to biometric identification systems. While hand geometry systems are not new, typical measurements of lengths and widths of fingers and palms require rigid placement of the hand against pegs. Slight deviations in hand position, finger stretch or pressure can yield different measurements. This paper offers novel approaches to computing hand geometry measurements from frontal views of freely posed hands. These approaches offer advantages in hygiene, comfort and reliability. Our algorithms segment the hand from a known background under spot lights and locate feature points along the fingers and wrists. Given a database of 54 hand images, with three different images of the same hand of each subject, our approach uniquely identified a previously unseen hand with an overall accuracy of 92%.

Bioinformatics | 2015

HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS

Shuang Wang; Yuchen Zhang; Wenrui Dai; Kristin E. Lauter; Miran Kim; Yuzhe Tang; Hongkai Xiong; Xiaoqian Jiang

MOTIVATION Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individuals privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. RESULTS We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. AVAILABILITY AND IMPLEMENTATION Download HEALER at http://research.ucsd-dbmi.org/HEALER/ CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Journal of the American Medical Informatics Association | 2013

Privacy-preserving heterogeneous health data sharing

Noman Mohammed; Xiaoqian Jiang; Rui Chen; Benjamin C. M. Fung; Lucila Ohno-Machado

OBJECTIVE Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversarys background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. METHODS The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. RESULTS We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. LIMITATION The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. CONCLUSIONS Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.

Explore More