Is this you? Create Your Porfile

Wenrui Dai

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wenrui Dai is active.

Explore More

Publication

Featured researches published by Wenrui Dai.

Bioinformatics | 2015

HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS

Shuang Wang; Yuchen Zhang; Wenrui Dai; Kristin E. Lauter; Miran Kim; Yuzhe Tang; Hongkai Xiong; Xiaoqian Jiang

MOTIVATION Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individuals privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. RESULTS We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. AVAILABILITY AND IMPLEMENTATION Download HEALER at http://research.ucsd-dbmi.org/HEALER/ CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

BMC Medical Informatics and Decision Making | 2015

FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption

Yuchen Zhang; Wenrui Dai; Xiaoqian Jiang; Hongkai Xiong; Shuang Wang

BackgroundThe increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment.MethodsWe presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics.ResultsThe proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets.ConclusionsUnlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.

BMC Medical Informatics and Decision Making | 2016

Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE)

Haoyi Shi; Chao Jiang; Wenrui Dai; Xiaoqian Jiang; Yuzhe Tang; Lucila Ohno-Machado; Shuang Wang

BackgroundIn biomedical research, data sharing and information exchange are very important for improving quality of care, accelerating discovery, and promoting the meaningful secondary use of clinical data. A big concern in biomedical data sharing is the protection of patient privacy because inappropriate information leakage can put patient privacy at risk.MethodsIn this study, we deployed a grid logistic regression framework based on Secure Multi-party Computation (SMAC-GLORE). Unlike our previous work in GLORE, SMAC-GLORE protects not only patient-level data, but also all the intermediary information exchanged during the model-learning phase.ResultsThe experimental results demonstrate the feasibility of secure distributed logistic regression across multiple institutions without sharing patient-level data.ConclusionsIn this study, we developed a circuit-based SMAC-GLORE framework. The proposed framework provides a practical solution for secure distributed logistic regression model learning.

IEEE Transactions on Image Processing | 2016

Sparse Representation With Spatio-Temporal Online Dictionary Learning for Promising Video Coding

Wenrui Dai; Yangmei Shen; Xin Tang; Junni Zou; Hongkai Xiong; Chang Wen Chen

Classical dictionary learning methods for video coding suffer from high computational complexity and interfered coding efficiency by disregarding its underlying distribution. This paper proposes a spatio-temporal online dictionary learning (STOL) algorithm to speed up the convergence rate of dictionary learning with a guarantee of approximation error. The proposed algorithm incorporates stochastic gradient descents to form a dictionary of pairs of 3D low-frequency and high-frequency spatio-temporal volumes. In each iteration of the learning process, it randomly selects one sample volume and updates the atoms of dictionary by minimizing the expected cost, rather than optimizes empirical cost over the complete training data, such as batch learning methods, e.g., K-SVD. Since the selected volumes are supposed to be independent identically distributed samples from the underlying distribution, decomposition coefficients attained from the trained dictionary are desirable for sparse representation. Theoretically, it is proved that the proposed STOL could achieve better approximation for sparse representation than K-SVD and maintain both structured sparsity and hierarchical sparsity. It is shown to outperform batch gradient descent methods (K-SVD) in the sense of convergence speed and computational complexity, and its upper bound for prediction error is asymptotically equal to the training error. With lower computational complexity, extensive experiments validate that the STOL-based coding scheme achieves performance improvements than H.264/AVC or High Efficiency Video Coding as well as existing super-resolution-based methods in rate-distortion performance and visual quality.Classical dictionary learning methods for video coding suffer from high computational complexity and interfered coding efficiency by disregarding its underlying distribution. This paper proposes a spatio-temporal online dictionary learning (STOL) algorithm to speed up the convergence rate of dictionary learning with a guarantee of approximation error. The proposed algorithm incorporates stochastic gradient descents to form a dictionary of pairs of 3D low-frequency and high-frequency spatio-temporal volumes. In each iteration of the learning process, it randomly selects one sample volume and updates the atoms of dictionary by minimizing the expected cost, rather than optimizes empirical cost over the complete training data, such as batch learning methods, e.g., K-SVD. Since the selected volumes are supposed to be independent identically distributed samples from the underlying distribution, decomposition coefficients attained from the trained dictionary are desirable for sparse representation. Theoretically, it is proved that the proposed STOL could achieve better approximation for sparse representation than K-SVD and maintain both structured sparsity and hierarchical sparsity. It is shown to outperform batch gradient descent methods (K-SVD) in the sense of convergence speed and computational complexity, and its upper bound for prediction error is asymptotically equal to the training error. With lower computational complexity, extensive experiments validate that the STOL-based coding scheme achieves performance improvements than H.264/AVC or High Efficiency Video Coding as well as existing super-resolution-based methods in rate-distortion performance and visual quality.

IEEE Transactions on Image Processing | 2014

Large Discriminative Structured Set Prediction Modeling With Max-Margin Markov Network for Lossless Image Coding

Wenrui Dai; Hongkai Xiong; Jia Wang; Yuan F. Zheng

Inherent statistical correlation for context-based prediction and structural interdependencies for local coherence is not fully exploited in existing lossless image coding schemes. This paper proposes a novel prediction model where the optimal correlated prediction for a set of pixels is obtained in the sense of the least code length. It not only exploits the spatial statistical correlations for the optimal prediction directly based on 2D contexts, but also formulates the data-driven structural interdependencies to make the prediction error coherent with the underlying probability distribution for coding. Under the joint constraints for local coherence, max-margin Markov networks are incorporated to combine support vector machines structurally to make max-margin estimation for a correlated region. Specifically, it aims to produce multiple predictions in the blocks with the model parameters learned in such a way that the distinction between the actual pixel and all possible estimations is maximized. It is proved that, with the growth of sample size, the prediction error is asymptotically upper bounded by the training error under the decomposable loss function. Incorporated into the lossless image coding framework, the proposed model outperforms most prediction schemes reported.

BMC Medical Genomics | 2017

PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension

Feng Chen; Chenghong Wang; Wenrui Dai; Xiaoqian Jiang; Noman Mohammed; Momin Al Aziz; Nazmus Sadat; Cenk Sahinalp; Kristin E. Lauter; Shuang Wang

BackgroundAdvances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data.MethodsWe present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing.ResultsWe compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation.ConclusionsThe proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.

IEEE Transactions on Big Data | 2016

Big Data Privacy in Biomedical Research

Shuang Wang; Luca Bonomi; Wenrui Dai; Feng Chen; Cynthia Cheung; Cinnamon S. Bloss; Samuel Cheng; Xiaoqian Jiang

Biomedical research often involves studying patient data that contain personal information. Inappropriate use of these data might lead to leakage of sensitive information, which can put patient privacy at risk. The problem of preserving patient privacy has received increasing attentions in the era of big data. Many privacy methods have been developed to protect against various attack models. This paper reviews relevant topics in the context of biomedical research. We discuss privacy preserving technologies related to (1) record linkage, (2) synthetic data generation, and (3) genomic data privacy. We also discuss the ethical implications of big data privacy in biomedicine and present challenges in future research directions for improving data privacy in biomedical research.

data compression conference | 2014

Gaussian Process Regression Based Prediction for Lossless Image Coding

Wenrui Dai; Hongkai Xiong

LS-based adaptation cannot fully exploit high-dimensional correlations in image signals, as linear prediction model in the input space of supports is undesirable to capture higher order statistics. This paper proposes Gaussian process regression for prediction in lossless image coding. Incorporating kernel functions, the prediction support is projected into a high-dimensional feature space to fit the anisotropic and nonlinear image statistics. Instead of directly conditioned on the support, Gaussian process regression is leveraged to make prediction in the feature space. The model parameters are optimized by measuring the similarities based on the training set, which is evaluated by combined kernel function in the sense of translation and rotation invariance among supports mapped in the feature space. Experimental results show that the proposed predictor outperforms most benchmark predictors reported.

data compression conference | 2013

An Adaptive Difference Distribution-Based Coding with Hierarchical Tree Structure for DNA Sequence Compression

Wenrui Dai; Hongkai Xiong; Xiaoqian Jiang; Lucila Ohno-Machado

Previous reference-based compression on DNA sequences do not fully exploit the intrinsic statistics by merely concerning the approximate matches. In this paper, an adaptive difference distribution-based coding framework is proposed by the fragments of nucleotides with a hierarchical tree structure. To keep the distribution of difference sequence from the reference and target sequences concentrated, the sub-fragment size and matching offset for predicting are flexible to the stepped size structure. The matching with approximate repeats in reference will be imposed with the Hamming-like weighted distance measure function in a local region closed to the current fragment, such that the accuracy of matching and the overhead of describing matching offset can be balanced. A well-designed coding scheme will make compact both the difference sequence and the additional parameters, e.g. sub-fragment size and matching offset. Experimental results show that the proposed scheme achieves 150% compression improvement in comparison with the best reference-based compressor GReEn.

data compression conference | 2016

Compressive Tensor Sampling with Structured Sparsity

Yong Li; Wenrui Dai; Hongkai Xiong

Conventional Compressive Sensing (CS) obscures the intrinsic structures of multidimensional signals with the vectorized representation. Although tensor-based CS methods can preserve the intrinsic multidimensional structures with reduced computational complexity, their sampling efficiency and recovery performance are degraded with the assumption of standard/simple sparsity. This paper proposes a general and adaptive model that incorporates structured sparsity into tensor representation to fit the varying nonstationary statistics of multidimensional signals. To guarantee the block sparsity, subspace clustering is adopted to adaptively generate the union of tensor subspaces with its basis of each tensor subspace learned for optimized representation. For sampled tensors, the stable recovery algorithm is developed to achieve desirable recovery performance using fewer degrees of freedom. Moreover, the proposed model inherits the merit from tensor-based CS to alleviate the computational and storage burden in sampling and recovery. Experimental results demonstrate that the proposed model can achieve better recovery performance in video sampling in comparison to the state-of-the-art tensor-based method.

Explore More