The Annals of Applied Statistics | 2019

A penalized regression model for the joint estimation of eQTL associations and gene network structure

 
 
 
 

Abstract


Background: A critical task in the study of biological systems is understanding how gene expression is regulated within the cell. This problem is typically divided into multiple separate tasks, including performing eQTL mapping to identify SNP-gene relationships and estimating gene network structure to identify gene-gene relationships. Aim: In this work, we pursue a holistic approach to discovering the patterns of gene regulation in the cell. We present a new method for jointly performing eQTL mapping and gene network estimation while encouraging a transfer of information between the two tasks. Data: We evaluate our approach on both synthetic data and on a real yeast eQTL dataset that contains 1, 157 SNP genotypes and 1, 409 gene expression measurements for 114 yeast samples. Methods: To construct a unified model for jointly performing eQTL mapping and gene network inference, we formulate the problem as a multiple-output regression task in which we aim to learn the regression coefficients while simultaneously estimating the conditional independence relationships among the set of response variables. The approach we develop uses structured sparsity penalties to encourage the sharing of information between the regression coefficients and the output network in a mutually beneficial way. Our model, inverse-covariance-fused lasso, is formulated as a biconvex optimization problem that we solve via alternating minimization. We derive new, efficient optimization routines to solve each convex sub-problem that are based on existing state-of-the-art methods. Results: We demonstrate the value of our approach by applying our method to both simulated data and a real yeast eQTL dataset data. Experimental results demonstrate that our approach outperforms a large number of existing methods on the recovery of the true sparse structure of both the eQTL associations and the gene network. Conclusions: We show that inverse-covariance-fused lasso can be used to perform joint eQTL mapping and gene network estimation on a yeast dataset, yielding more biologically coherent results than previous work. Furthermore, the same problem setting appears in many different applications, and therefore our model can be deployed in a wide range of domains.

Volume 13
Pages 248-270
DOI 10.1214/18-AOAS1186
Language English
Journal The Annals of Applied Statistics

Full Text