Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yijun Zuo is active.

Publication


Featured researches published by Yijun Zuo.


Genetics | 2006

Two-stage designs in case-control association analysis

Yijun Zuo; Guohua Zou; Hongyu Zhao

DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), ∼3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.


Journal of Statistical Planning and Inference | 2000

On the performance of some robust nonparametric location measures relative to a general notion of multivariate symmetry

Yijun Zuo; Robert Serfling

Abstract Several robust nonparametric location estimators are examined with respect to several criteria, with emphasis on the criterion that they should agree with the point of symmetry in the case of a symmetric distribution. For this purpose, a broad version of multidimensional symmetry is introduced, namely “halfspace symmetry”, generalizing the well-known notions of “central” and “angular” symmetry. Characterizations of these symmetry notions are established, permitting their properties and interrelations to be illuminated. The particular location measures considered consist of several nonparametric notions of multidimensional median: The “L2” (or “spatial”), “Tukey/Donoho halfspace”, “projection”, and “Liu simplicial” medians, all of which are robust in the sense of nonzero breakdown point. It is established that the first three of these in general do identify the point of symmetry when it exists, whereas the latter, however, fails to do so in some circumstances. Combining this finding with consideration of other criteria such as affine equivariance, stochastic order preserving, and degree of robustness, we conclude that among these choices, the “halfspace” and “projection” medians, both of which are based on projection pursuit methodology, are the most attractive overall.


Annals of Statistics | 2005

Depth weighted scatter estimators

Yijun Zuo; Hengjian Cui

General depth weighted scatter estimators are introduced and investigated. For general depth functions, we find out that these affine equivariant scatter estimators are Fisher consistent and unbiased for a wide range of multivariate distributions, and show that the sample scatter estimators are strong and √n-consistent and asymptotically normal, and the influence functions of the estimators exist and are bounded in general. We then concentrate on a specific case of the general depth weighted scatter estimators, the projection depth weighted scatter estimators, which include as a special case the well-known Stahel-Donoho scatter estimator whose limiting distribution has long been open until this paper. Large sample behavior, including consistency and asymptotic normality, and efficiency and finite sample behavior, including breakdown point and relative efficiency of the sample projection depth weighted scatter estimators, are thoroughly investigated. The influence function and the maximum bias of the projection depth weighted scatter estimators are derived and examined. Unlike typical high-breakdown competitors, the projection depth weighted scatter estimators can integrate high breakdown point and high efficiency while enjoying a bounded-influence function and a moderate maximum bias curve. Comparisons with leading estimators on asymptotic relative efficiency and gross error sensitivity reveal that the projection depth weighted scatter estimators behave very well overall and, consequently, represent very favorable choices of affine equivariant multivariate scatter estimators.


Annals of Statistics | 2006

Multidimensional trimming based on projection depth

Yijun Zuo

As estimators of location parameters, univariate trimmed means are well known for their robustness and efficiency. They can serve as robust alternatives to the sample mean while possessing high efficiencies at normal as well as heavy-tailed models. This paper introduces multidimensional trimmed means based on projection depth induced regions. Robustness of these depth trimmed means is investigated in terms of the influence function and finite sample breakdown point. The influence function captures the local robustness whereas the breakdown point measures the global robustness of estimators. It is found that the projection depth trimmed means are highly robust locally as well as globally. Asymptotics of the depth trimmed means are investigated via those of the directional radius of the depth induced regions. The strong consistency, asymptotic representation and limiting distribution of the depth trimmed means are obtained. Relative to the mean and other leading competitors, the depth trimmed means are highly efficient at normal or symmetric models and overwhelmingly more efficient when these models are contaminated. Simulation studies confirm the validity of the asymptotic efficiency results at finite samples.


Journal of Multivariate Analysis | 2010

Smooth depth contours characterize the underlying distribution

Linglong Kong; Yijun Zuo

The Tukey depth is an innovative concept in multivariate data analysis. It can be utilized to extend the univariate order concept and advantages to a multivariate setting. While it is still an open question as to whether the depth contours uniquely determine the underlying distribution, some positive answers have been provided. We extend these results to distributions with smooth depth contours, with elliptically symmetric distributions as special cases. The key ingredient of our proofs is the well-known Cramer-Wold theorem.


Annals of Statistics | 2006

On the limiting distributions of multivariate depth-based rank sum statistics and related tests

Yijun Zuo; Xuming He

A depth-based rank sum statistic for multivariate data introduced by Liu and Singh [J. Amer. Statist. Assoc. 88 (1993) 252-260] as an extension of the Wilcoxon rank sum statistic for univariate data has been used in multivariate rank tests in quality control and in experimental studies. Those applications, however, are based on a conjectured limiting distribution, provided by Liu and Singh [J. Amer. Statist. Assoc. 88 (1993) 252-260]. The present paper proves the conjecture under general regularity conditions and, therefore, validates various applications of the rank sum statistic in the literature. The paper also shows that the corresponding rank sum tests can be more powerful than Hotellings T 2 test and some commonly used multivariate rank tests in detecting location-scale changes in multivariate distributions.


Statistics and Computing | 2014

Computing projection depth and its associated estimators

Xiaohui Liu; Yijun Zuo

To facilitate the application of projection depth, an exact algorithm is proposed from the view of cutting a convex polytope with hyperplanes. Based on this algorithm, one can obtain a finite number of optimal direction vectors, which are x-free and therefore enable us (Liu et al., Preprint, 2011) to compute the projection depth and most of its associated estimators of dimension p≥2, including Stahel-Donoho location and scatter estimators, projection trimmed mean, projection depth contours and median, etc. Both real and simulated examples are also provided to illustrate the performance of the proposed algorithm.


Annals of Human Genetics | 2006

A Combinatorial Searching Method for Detecting a Set of Interacting Loci Associated with Complex Traits

Qiuying Sha; Xiaofeng Zhu; Yijun Zuo; Richard S. Cooper; Shuanglin Zhang

Complex diseases are presumed to be the results of the interaction of several genes and environmental factors, with each gene only having a small effect on the disease. Mapping complex disease genes therefore becomes one of the greatest challenges facing geneticists. Most current approaches of association studies essentially evaluate one marker or one gene (haplotype approach) at a time. These approaches ignore the possibility that effects of multilocus functional genetic units may play a larger role than a single‐locus effect in determining trait variability. In this article, we propose a Combinatorial Searching Method (CSM) to detect a set of interacting loci (may be unlinked) that predicts the complex trait. In the application of the CSM, a simple filter is used to filter all the possible locus‐sets and retain the candidate locus‐sets, then a new objective function based on the cross‐validation and partitions of the multi‐locus genotypes is proposed to evaluate the retained locus‐sets. The locus‐set with the largest value of the objective function is the final locus‐set and a permutation procedure is performed to evaluate the overall p‐value of the test for association between the final locus‐set and the trait. The performance of the method is evaluated by simulation studies as well as by being applied to a real data set. The simulation studies show that the CSM has reasonable power to detect high‐order interactions. When the CSM is applied to a real data set to detect the locus‐set (among the 13 loci in the ACE gene) that predicts systolic blood pressure (SBP) or diastolic blood pressure (DBP), we found that a four‐locus gene‐gene interaction model best predicts SBP with an overall p‐value = 0.033, and similarly a two‐locus gene‐gene interaction model best predicts DBP with an overall p‐value = 0.045.


Communications in Statistics - Simulation and Computation | 2014

Computing Halfspace Depth and Regression Depth

Xiaohui Liu; Yijun Zuo

In this article, we consider the exact computation of the famous halfspace depth (HD) and regression depth (RD) from the view of cutting a convex cone with hyperplanes. Two new algorithms are proposed for computing these two notions of depth. The first one is relatively straightforward but quite inefficient, whereas the second one is much faster. It is noteworthy that both of them can be implemented to spaces with dimension beyond three. Some numerical examples are also provided in what follows to illustrate the performances.


Archive | 2006

Robust location and scatter estimators in multivariate analysis

Yijun Zuo

The sample mean vector and the sample covariance matrix are the corner stone of the classical multivariate analysis. They are optimal when the underlying data are normal. They, however, are notorious for being extremely sensitive to outliers and heavy tailed noise data. This article surveys robust alternatives of these classical location and scatter estimators and discusses their applications to the multivariate data analysis.

Collaboration


Dive into the Yijun Zuo's collaboration.

Top Co-Authors

Avatar

Guolian Kang

St. Jude Children's Research Hospital

View shared research outputs
Top Co-Authors

Avatar

Xiaohui Liu

Jiangxi University of Finance and Economics

View shared research outputs
Top Co-Authors

Avatar

Robert Serfling

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Guohua Zou

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hengjian Cui

Beijing Normal University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xuming He

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ji-Feng Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge