Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qingzhao Zhang is active.

Publication


Featured researches published by Qingzhao Zhang.


Computational Statistics & Data Analysis | 2014

Model detection for functional polynomial regression

Tao Zhang; Qingzhao Zhang; Qihua Wang

A functional polynomial regression model which includes the functional linear model and functional quadratic model as two special cases is considered. In functional polynomial regression, one must balance the costs and benefits of using more parameters in the model. The method of model detection to determine which orders of the polynomial are significant in functional polynomial regression is developed. The proposed methods can identify the true model consistently and have good prediction performances. Numerical studies clearly confirm our theories.


Journal of Multivariate Analysis | 2013

A note on tail dependence regression

Qingzhao Zhang; Deyuan Li; Hansheng Wang

In financial practice, it is important to understand the dependence structure between the returns of individual assets and the market index. This is particularly true under extreme situations. Theoretically, this amounts to regressing the dependence relationship against a set of pre-specified predictive variables. To this end, we propose here a novel method called tail dependence regression. It assumes a tail dependence index model between individual assets and market index. Subsequently, such a tail dependence index is modeled as a linear combination of the predictors through a monotonic transformation. An approximate maximum likelihood method is then developed to estimate the unknown regression coefficients. The resulting estimators asymptotic properties are investigated theoretically. Numerical studies including both simulated and real datasets are presented for illustration purposes.


Statistics & Probability Letters | 2017

Focused information criterion and model averaging with generalized rank regression

Qingzhao Zhang; Xiaogang Duan; Shuangge Ma

Generalized rank regression, which is a class of weighted rank regression with weights based on factor space, provides a powerful tool for conducting robust estimation. In this article, we first establish the asymptotic properties of generalized rank regression under local model misspecification. We then apply the generalized rank regression to the focus information criterion and frequentist model averaging and establish their properties.


Archive | 2017

Identifying Gene–Environment Interactions Associated with Prognosis Using Penalized Quantile Regression

Guohua Wang; Yinjun Zhao; Qingzhao Zhang; Yangguang Zang; Sanguo Zang; Shuangge Ma

In the omics era, it has been well recognized that for complex traits and outcomes, the interactions between genetic and environmental factors (i.e., the G×E interactions) have important implications beyond the main effects. Most of the existing interaction analyses have been focused on continuous and categorical traits. Prognosis is of essential importance for complex diseases. However with significantly more complexity, prognosis outcomes have been less studied. In the existing interaction analysis on prognosis outcomes, the most common practice is to fit marginal (semi)parametric models (for example, Cox) using likelihood-based estimation and then identify important interactions based on significance level. Such an approach has limitations. First data contamination is not uncommon. With likelihood-based estimation, even a single contaminated observation can result in severely biased estimation and misleading conclusions. Second, when sample size is not large, the significance-based approach may not be reliable. To overcome these limitations, in this study, we adopt the quantile-based estimation which is robust to data contamination. Two techniques are adopted to accommodate right censoring. For identifying important interactions, we adopt penalization as an alternative to significance level. An efficient computational algorithm is developed. Simulation shows that the proposed method can significantly outperform the alternative. We analyze a lung cancer prognosis study with gene expression measurements.


Journal of Multivariate Analysis | 2017

Improved model checking methods for parametric models with responses missing at random

Zhihua Sun; Feifei Chen; Xiao Hua Zhou; Qingzhao Zhang

In this paper, we consider the lack-of-fit test of a parametric model when the response variable is missing at random. The popular imputation and inverse probability weighting methods are first employed to tackle the missing data. Then by employing the projection technique, we propose empirical-process-based testing methods to check the appropriateness of the parametric model. The asymptotic properties of the test statistics are obtained under the null and local alternative hypothetical models. It is shown that the proposed testing methods are consistent, and can detect local alternative hypothetical models converging to the null model at the parametric rate. To determine the critical values, a consistent bootstrap method is proposed, and its asymptotic properties are established. The simulation results show that the tests outperform the existing methods in terms of empirical sizes and powers, especially under the situation with high dimensional covariates. Analysis of a diabetes data set of Pima Indians is carried out to demonstrate the application of the testing procedures.


Genetic Epidemiology | 2017

Inferring gene regulatory relationships with a high-dimensional robust approach

Yangguang Zang; Qing Zhao; Qingzhao Zhang; Yang Li; Sanguo Zhang; Shuangge Ma

Gene expression (GE) levels have important biological and clinical implications. They are regulated by copy number alterations (CNAs). Modeling the regulatory relationships between GEs and CNAs facilitates understanding disease biology and can also have values in translational medicine. The expression level of a gene can be regulated by its cis‐acting as well as trans‐acting CNAs, and the set of trans‐acting CNAs is usually not known, which poses a high‐dimensional selection and estimation problem. Most of the existing studies share a common limitation in that they cannot accommodate long‐tailed distributions or contamination of GE data. In this study, we develop a high‐dimensional robust regression approach to infer the regulatory relationships between GEs and CNAs. A high‐dimensional regression model is used to accommodate the effects of both cis‐acting and trans‐acting CNAs. A density power divergence loss function is used to accommodate long‐tailed GE distributions and contamination. Penalization is adopted for regularized estimation and selection of relevant CNAs. The proposed approach is effectively realized using a coordinate descent algorithm. Simulation shows that it has competitive performance compared to the nonrobust benchmark and the robust LAD (least absolute deviation) approach. We analyze TCGA (The Cancer Genome Atlas) data on cutaneous melanoma and study GE‐CNA regulations in the RAP (regulation of apoptosis) pathway, which further demonstrates the satisfactory performance of the proposed approach.


Genomics | 2018

Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach

Yaqing Xu; Mengyun Wu; Qingzhao Zhang; Shuangge Ma

Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.


Journal of the American Statistical Association | 2017

Promoting Similarity of Sparsity Structures in Integrative Analysis With Penalization

Yuan Huang; Qingzhao Zhang; Sanguo Zhang; Jian Huang; Shuangge Ma

ABSTRACT For data with high-dimensional covariates but small sample sizes, the analysis of single datasets often generates unsatisfactory results. The integrative analysis of multiple independent datasets provides an effective way of pooling information and outperforms single-dataset and several alternative multi-datasets methods. Under many scenarios, multiple datasets are expected to share common important covariates, that is, the corresponding models have similarity in their sparsity structures. However, the existing methods do not have a mechanism to promote the similarity in sparsity structures in integrative analysis. In this study, we consider penalized variable selection and estimation in integrative analysis. We develop an L0-penalty-based method, which explicitly promotes the similarity in sparsity structures. Computationally it is realized using a coordinate descent algorithm. Theoretically it has the selection and estimation consistency properties. Under a wide spectrum of simulation scenarios, it has identification and estimation performance comparable to or better than the alternatives. In the analysis of three lung cancer datasets with gene expression measurements, it identifies genes with sound biological implications and satisfactory prediction performance. Supplementary materials for this article are available online.


Genetic Epidemiology | 2017

Analysis of cancer gene expression data with an assisted robust marker identification approach

Hao Chai; Xingjie Shi; Qingzhao Zhang; Qing Zhao; Yuan Huang; Shuangge Ma

Gene expression (GE) studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long‐tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on GEs as well as their regulators (e.g., copy number alterations (CNAs), methylation, and microRNAs), which can provide additional information on the associations between GEs and cancer outcomes. In this study, we develop an ARMI (assisted robust marker identification) approach for analyzing cancer studies with measurements on GEs as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing GE data alone. A robust objective function is adopted to accommodate long‐tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable. Simulation shows its satisfactory performance under a variety of settings. TCGA (The Cancer Genome Atlas) data on melanoma and lung cancer are analyzed, which leads to biologically plausible marker identification and superior prediction.


Journal of Multivariate Analysis | 2018

Robust network-based analysis of the associations between (epi)genetic measurements

Cen Wu; Qingzhao Zhang; Yu Jiang; Shuangge Ma

With its important biological implications, modeling the associations of gene expression (GE) and copy number variation (CNV) has been extensively conducted. Such analysis is challenging because of the high data dimensionality, lack of knowledge regulating CNVs for a specific GE, different behaviors of the cis-acting and trans-acting CNVs, possible long-tailed distributions and contamination of GE measurements, and correlations between CNVs. The existing methods fail to address one or more of these challenges. In this study, a new method is developed to model more effectively the GE-CNV associations. Specifically, for each GE, a partially linear model, with a nonlinear cis-acting CNV effect, is assumed. A robust loss function is adopted to accommodate long-tailed distributions and data contamination. We adopt penalization to accommodate the high dimensionality and identify relevant CNVs. A network structure is introduced to accommodate the correlations among CNVs. The proposed method comprehensively accommodates multiple challenging characteristics of GE-CNV modeling and effectively overcomes the limitations of existing methods. We develop an effective computational algorithm and rigorously establish the consistency properties. Simulation shows the superiority of the proposed method over alternatives. The TCGA (The Cancer Genome Atlas) data on the PCD (programmed cell death) pathway are analyzed, and the proposed method has improved prediction and stability and biologically plausible findings.

Collaboration


Dive into the Qingzhao Zhang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sanguo Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yangguang Zang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Qihua Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yu Jiang

University of Memphis

View shared research outputs
Top Co-Authors

Avatar

Guohua Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xinqi Wu

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge