Kazuyoshi Yata
University of Tsukuba
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kazuyoshi Yata.
Journal of Multivariate Analysis | 2012
Kazuyoshi Yata; Makoto Aoshima
In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a @r-mixing-type dependency appears in variables or not. When the @r-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.
Sequential Analysis | 2011
Makoto Aoshima; Kazuyoshi Yata
Abstract In this article, we consider a variety of inference problems for high-dimensional data. The purpose of this article is to suggest directions for future research and possible solutions about p ≫ n problems by using new types of two-stage estimation methodologies. This is the first attempt to apply sequential analysis to high-dimensional statistical inference ensuring prespecified accuracy. We offer the sample size determination for inference problems by creating new types of multivariate two-stage procedures. To develop theory and methodologies, the most important and basic idea is the asymptotic normality when p → ∞. By developing asymptotic normality when p → ∞, we first give (a) a given-bandwidth confidence region for the square loss. In addition, we give (b) a two-sample test to assure prespecified size and power simultaneously together with (c) an equality-test procedure for two covariance matrices. We also give (d) a two-stage discriminant procedure that controls misclassification rates being no more than a prespecified value. Moreover, we propose (e) a two-stage variable selection procedure that provides screening of variables in the first stage and selects a significant set of associated variables from among a set of candidate variables in the second stage. Following the variable selection procedure, we consider (f) variable selection for high-dimensional regression to compare favorably with the lasso in terms of the assurance of accuracy and the computational cost. Further, we consider variable selection for classification and propose (g) a two-stage discriminant procedure after screening some variables. Finally, we consider (h) pathway analysis for high-dimensional data by constructing a multiple test of correlation coefficients.
Journal of Multivariate Analysis | 2010
Kazuyoshi Yata; Makoto Aoshima
In this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer.
Communications in Statistics-theory and Methods | 2009
Kazuyoshi Yata; Makoto Aoshima
In this article, we investigate both sample eigenvalues and Principal Component (PC) directions along with PC scores when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We consider general settings that include the case when the eigenvalues are all in the range of sphericity. We do not assume either the normality or a ρ-mixing condition. We attempt finding a difference among the eigenvalues by choosing n with a suitable order of d. We give the consistency properties for both the sample eigenvalues and the PC directions along with the PC scores. We also show that the sample eigenvalue has a Gaussian limiting distribution when the population counterpart is of multiplicity one.
Communications in Statistics-theory and Methods | 2010
Kazuyoshi Yata; Makoto Aoshima
High-dimension, low sample size (HDLSS) data are becoming common in various fields such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. Such data have surprising and often counterintuitive geometric structures because of the high-dimensional noise that dominates and corrupts the local neighborhoods. In this article, we estimate the intrinsic dimension (ID) that allows one to distinguish between deterministic chaos and random noise of HDLSS data. A new ID estimating methodology is given and its properties are studied by using a d-asymptotic approach.
Journal of Multivariate Analysis | 2013
Kazuyoshi Yata; Makoto Aoshima
In this paper, we consider tests of correlation when the sample size is much lower than the dimension. We propose a new estimation methodology called the extended cross-data-matrix methodology. By applying the method, we give a new test statistic for high-dimensional correlations. We show that the test statistic is asymptotically normal when p->~ and n->~. We propose a test procedure along with sample size determination to ensure both prespecified size and power for testing high-dimensional correlations. We further develop a multiple testing procedure to control both family wise error rate and power. Finally, we demonstrate how the test procedures perform in actual data analyses by using two microarray data sets.
Journal of Multivariate Analysis | 2013
Kazuyoshi Yata; Makoto Aoshima
In this paper, we propose a general spiked model called the power spiked model in high-dimensional settings. We derive relations among the data dimension, the sample size and the high-dimensional noise structure. We first consider asymptotic properties of the conventional estimator of eigenvalues. We show that the estimator is affected by the high-dimensional noise structure directly, so that it becomes inconsistent. In order to overcome such difficulties in a high-dimensional situation, we develop new principal component analysis (PCA) methods called the noise-reduction methodology and the cross-data-matrix methodology under the power spiked model. We show that the new PCA methods can enjoy consistency properties not only for eigenvalues but also for PC directions and PC scores in high-dimensional settings.
Sequential Analysis | 2015
Makoto Aoshima; Kazuyoshi Yata
Abstract In this article, we consider a geometric classifier that is applicable to multiclass classification for high-dimensional data. We show the consistency property and the asymptotic normality of the geometric classifier under certain mild conditions. We discuss sample size determination so that the geometric classifier can ensure that its misclassification rates are less than prespecified thresholds. We give a two-stage procedure to estimate the sample sizes required in such a geometric classifier and propose a misclassification rate–adjusted classifier (MRAC) based on the geometric classifier. We evaluate the performance of the MRAC theoretically and numerically. Finally, we demonstrate the MRAC in actual data analyses by using a microarray data set.
Sequential Analysis | 2010
Kazuyoshi Yata
We consider fixed-size estimation for a linear function of mean vectors from π i : N p (μ i , Σ i ), i = 1,…, k. The goal of inference is to construct a fixed-span confidence region with required accuracy. We find a new sample-size formulation and propose a two-stage estimation methodology to give the fixed-span confidence region satisfying the probability requirement approximately. We show that the proposed methodology greatly reduces the sample size to enjoy the asymptotic first-order consistency when the dimensionality p is extremely high. With the help of simulation studies, we show that the proposed methodology is still efficient even when p is moderate. We give an actual example to illustrate how it should be done by using the proposed methodology in the inference.
Methodology and Computing in Applied Probability | 2018
Makoto Aoshima; Kazuyoshi Yata
In this paper, we consider high-dimensional quadratic classifiers in non-sparse settings. The quadratic classifiers proposed in this paper draw information about heterogeneity effectively through both the differences of growing mean vectors and covariance matrices. We show that they hold a consistency property in which misclassification rates tend to zero as the dimension goes to infinity under non-sparse settings. We also propose a quadratic classifier after feature selection by using both the differences of mean vectors and covariance matrices. We discuss the performance of the classifiers in numerical simulations and actual data analyzes. Finally, we give concluding remarks about the choice of the classifiers for high-dimensional, non-sparse data.