Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Makoto Aoshima is active.

Publication


Featured researches published by Makoto Aoshima.


Journal of Multivariate Analysis | 2012

Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations

Kazuyoshi Yata; Makoto Aoshima

In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a @r-mixing-type dependency appears in variables or not. When the @r-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.


Sequential Analysis | 2011

Two-Stage Procedures for High-Dimensional Data

Makoto Aoshima; Kazuyoshi Yata

Abstract In this article, we consider a variety of inference problems for high-dimensional data. The purpose of this article is to suggest directions for future research and possible solutions about p ≫ n problems by using new types of two-stage estimation methodologies. This is the first attempt to apply sequential analysis to high-dimensional statistical inference ensuring prespecified accuracy. We offer the sample size determination for inference problems by creating new types of multivariate two-stage procedures. To develop theory and methodologies, the most important and basic idea is the asymptotic normality when p → ∞. By developing asymptotic normality when p → ∞, we first give (a) a given-bandwidth confidence region for the square loss. In addition, we give (b) a two-sample test to assure prespecified size and power simultaneously together with (c) an equality-test procedure for two covariance matrices. We also give (d) a two-stage discriminant procedure that controls misclassification rates being no more than a prespecified value. Moreover, we propose (e) a two-stage variable selection procedure that provides screening of variables in the first stage and selects a significant set of associated variables from among a set of candidate variables in the second stage. Following the variable selection procedure, we consider (f) variable selection for high-dimensional regression to compare favorably with the lasso in terms of the assurance of accuracy and the computational cost. Further, we consider variable selection for classification and propose (g) a two-stage discriminant procedure after screening some variables. Finally, we consider (h) pathway analysis for high-dimensional data by constructing a multiple test of correlation coefficients.


Journal of Multivariate Analysis | 2010

Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix

Kazuyoshi Yata; Makoto Aoshima

In this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer.


Communications in Statistics-theory and Methods | 2009

PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context

Kazuyoshi Yata; Makoto Aoshima

In this article, we investigate both sample eigenvalues and Principal Component (PC) directions along with PC scores when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We consider general settings that include the case when the eigenvalues are all in the range of sphericity. We do not assume either the normality or a ρ-mixing condition. We attempt finding a difference among the eigenvalues by choosing n with a suitable order of d. We give the consistency properties for both the sample eigenvalues and the PC directions along with the PC scores. We also show that the sample eigenvalue has a Gaussian limiting distribution when the population counterpart is of multiplicity one.


Sequential Analysis | 2002

TWO-STAGE ESTIMATION OF A LINEAR FUNCTION OF NORMAL MEANS WITH SECOND-ORDER APPROXIMATIONS

Makoto Aoshima; Nitis Mukhopadhyay

ABSTRACT When a confidence interval tends to be too wide, its effectiveness in bolstering any inferential statements becomes limited. Hence, an experimenter may opt to construct a confidence interval with some preassigned “small” width and preassigned “large” confidence coefficient so that any inferences drawn from this can be of some value in practice. We consider k(≥2) independent normal populations with unknown means and unknown and unequal variances. We discuss the estimation problem for a linear function of the population means with a fixed-width (=2d) confidence interval having the preassigned confidence coefficient (≥1 − α), d > 0, 0 < α < 1. But, the goal of having such a confidence interval with both preassigned width and confidence coefficient is not attainable when the sample sizes are held fixed in advance [Dantzig (1940)[8], Ghosh et al. (1997, Sec. 3.7)[13]]. Chapman (1950)[4] first gave a Stein-type two-stage procedure for the problem on hand when k = 2. It is known that in a k-sample problem, the analogous two-stage procedure requires the upper percentage points of the distribution of the sum of k independent Students t variates. First, a Cornish-Fisher expansion of such a percentage point is derived (Theorem 2.1) in general, followed by the Tables 1 and 2 of these (approximate) percentage points which are constructed by using this expansion, with the pilot sample sizes not necessarily all equal, when k = 2, α = .05, .01 and k = 3, α = .05. Next, under the limited additional assumption that each unknown population variance has a known positive lower bound, the Chapman type two-stage estimation procedure is modified along the lines of the one-sample considerations of Mukhopadhyay and Duggan (1997)[23]. For this modified two-stage procedure, various second-order expansions for both the lower and upper bounds of the average sample sizes (Theorem 2.2) and the associated confidence coefficient (Theorem 2.3) are obtained. We may remark that the second-order expansions are meant to provide faster rates of convergence for useful approximations. Then, through extensive sets of simulations we show that the extent of over-sampling experienced by the Chapman-type procedure is significantly reduced under the new modification when k = 2(1)5, α = .05. We include examples and data to illustrate usefulness of the modified k-sample two-stage estimation technique when k = 2, 3. Additionally, the importance of the asymptotic second-order terms is highlighted with the help of data analysis (Examples 1 and 2, Sec. 3).


Communications in Statistics-theory and Methods | 2010

Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics

Kazuyoshi Yata; Makoto Aoshima

High-dimension, low sample size (HDLSS) data are becoming common in various fields such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. Such data have surprising and often counterintuitive geometric structures because of the high-dimensional noise that dominates and corrupts the local neighborhoods. In this article, we estimate the intrinsic dimension (ID) that allows one to distinguish between deterministic chaos and random noise of HDLSS data. A new ID estimating methodology is given and its properties are studied by using a d-asymptotic approach.


Journal of Multivariate Analysis | 2013

Correlation tests for high-dimensional data using extended cross-data-matrix methodology

Kazuyoshi Yata; Makoto Aoshima

In this paper, we consider tests of correlation when the sample size is much lower than the dimension. We propose a new estimation methodology called the extended cross-data-matrix methodology. By applying the method, we give a new test statistic for high-dimensional correlations. We show that the test statistic is asymptotically normal when p->~ and n->~. We propose a test procedure along with sample size determination to ensure both prespecified size and power for testing high-dimensional correlations. We further develop a multiple testing procedure to control both family wise error rate and power. Finally, we demonstrate how the test procedures perform in actual data analyses by using two microarray data sets.


Sequential Analysis | 1997

Two_stage procedures for estimating a linear function of multinormal mean vectors

Yoshikazu Takada; Makoto Aoshima

We consider the problenl of const,ructing a fixed-size confidence region for a linear functioll of mean vectors of k multinormal populations. The covariance matrices are assunlcd to be known except for the unknown scalar multipliers. A two-stage procedure is proposed to derive such a confidence region. We also discuss the asylnptotic efficiency of the procedure.


Journal of Multivariate Analysis | 2013

PCA consistency for the power spiked model in high-dimensional settings

Kazuyoshi Yata; Makoto Aoshima

In this paper, we propose a general spiked model called the power spiked model in high-dimensional settings. We derive relations among the data dimension, the sample size and the high-dimensional noise structure. We first consider asymptotic properties of the conventional estimator of eigenvalues. We show that the estimator is affected by the high-dimensional noise structure directly, so that it becomes inconsistent. In order to overcome such difficulties in a high-dimensional situation, we develop new principal component analysis (PCA) methods called the noise-reduction methodology and the cross-data-matrix methodology under the power spiked model. We show that the new PCA methods can enjoy consistency properties not only for eigenvalues but also for PC directions and PC scores in high-dimensional settings.


Journal of Statistical Planning and Inference | 2002

A two-stage procedure for estimating a linear function of K multinormal mean vectors when covariance matrices are unknown

Makoto Aoshima; Yoshikazu Takada; Muni S. Srivastava

We consider the problem of constructing a fixed-size confidence region for a linear function of mean vectors of k multinormal populations, where all covariance matrices are completely unknown. A two-stage procedure is proposed to construct such a confidence region. It is shown that the proposed two-stage procedure is consistent and its asymptotic property for the expected sample size is also given. A Monte Carlo simulation study is given for an illustration.

Collaboration


Dive into the Makoto Aoshima's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aki Ishii

University of Tsukuba

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge