Ming-Yen Cheng
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ming-Yen Cheng.
Journal of The Royal Statistical Society Series B-statistical Methodology | 1998
Ming-Yen Cheng; Peter Hall
Summary. Nonparametric tests of modality are a distribution-free way of assessing evidence about inhomogeneity in a population, provided that the potential subpopulations are sufficiently well separated. They include the excess mass and dip tests, which are equivalent in univariate settings and are alternatives to the bandwidth test. Only very conservative forms of the excess mass and dip tests are available at present, however, and for that reason they are generally not competitive with the bandwidth test. In the present paper we develop a practical approach to calibrating the excess mass and dip tests to improve their level accuracy and power substantially. Our method exploits the fact that the limiting distribution of the excess mass statistic under the null hypothesis depends on unknowns only through a constant, which may be estimated. Our calibrated test exploits this fact and is shown to have greater power and level accuracy than the bandwidth test has. The latter tends to be quite conservative, even in an asymptotic sense. Moreover, the calibrated test avoids difficulties that the bandwidth test has with spurious modes in the tails, which often must be discounted through subjective intervention of the experimenter.
Journal of the American Statistical Association | 2009
Ming-Yen Cheng; Wenyang Zhang; Lu-Hung Chen
Multiparameter likelihood models (MLMs) with multiple covariates have a wide range of applications; however, they encounter the “curse of dimensionality” problem when the dimension of the covariates is large. We develop a generalized multiparameter likelihood model that copes with multiple covariates and adapts to dynamic structural changes well. It includes some popular models, such as the partially linear and varying-coefficient models, as special cases. We present a simple, effective two-step method to estimate both the parametric and the nonparametric components when the model is fixed. The proposed estimator of the parametric component has the n −1/2convergence rate, and the estimator of the nonparametric component enjoys an adaptivity property. We suggest a data-driven procedure for selecting the bandwidths, and propose an initial estimator in profile likelihood estimation of the parametric part to ensure stability of the approach in general settings. We further develop an automatic procedure to identify constant parameters in the underlying model. We provide a simulation study and an application to infant mortality data of China to demonstrate the performance of our proposed method.
Journal of The Royal Statistical Society Series B-statistical Methodology | 1997
Ming-Yen Cheng
Integrated squared density derivatives are important to the plug-in type of bandwidth selector for kernel density estimation. Conventional estimators of these quantities are inefficient when there is a non-smooth boundary in the support of the density. We introduce estimators that utilize density derivative estimators obtained from local polynomial fitting. They retain the rates of convergence in mean-squared error that are familiar from non-boundary cases, and the constant coefficients have similar forms. The estimators and the formula for their asymptotically optimal bandwidths, which depend on integrated products of density derivatives, are applied to automatic bandwidth selection for local linear density estimation. Simulation studies show that the constructed bandwidth rule and the Sheather-Jones bandwidth are competitive in non-boundary cases, but the former overcomes boundary problems whereas the latter does not.
Journal of the American Statistical Association | 2013
Ming-Yen Cheng; Hau-Tieng Wu
High-dimensional data analysis has been an active area, and the main focus areas have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this article is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. As regards the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension d of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computational time when the ambient space dimension p ≫ d. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and nonuniform sampling effects. We propose a bandwidth selector that can handle heteroscedastic errors. With reference to the second aim, we analyze carefully the asymptotic behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace–Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. A simulation study and applications to the Isomap face data and a clinically computed tomography scan dataset are used to illustrate the computational speed and estimation accuracy of our methods. Supplementary materials for this article are available online.
Annals of Statistics | 2014
Ming-Yen Cheng; Toshio Honda; Jialiang Li; Heng Peng
Ultra-high dimensional longitudinal data are increasingly common and the analysis is challenging both theoretically and methodologically. We offer a new automatic procedure for finding a sparse semivarying coefficient model, which is widely accepted for longitudinal data analysis. Our proposed method first reduces the number of covariates to a moderate order by employing a screening procedure, and then identifies both the varying and constant coefficients using a group SCAD estimator, which is subsequently refined by accounting for the within-subject correlation. The screening procedure is based on working independence and B-spline marginal models. Under weaker conditions than those in the literature, we show that with high probability only irrelevant variables will be screened out, and the number of selected variables can be bounded by a moderate order. This allows the desirable sparsity and oracle properties of the subsequent structure identification step. Note that existing methods require some kind of iterative screening in order to achieve this, thus they demand heavy computational effort and consistency is not guaranteed. The refined semivarying coefficient model employs profile least squares, local linear smoothing and nonparametric covariance estimation, and is semiparametric efficient. We also suggest ways to implement the proposed methods, and to select the tuning parameters. An extensive simulation study is summarized to demonstrate its finite sample performance and the yeast cell cycle data is analyzed.
Journal of Computational and Graphical Statistics | 2008
Ming-Yen Cheng; Marc Raimondo
In this article we propose an implementation of the so-called zero-crossing-time detection technique specifically designed for estimating the location of jump-points in the first derivative (kinks) of a regression function f. Our algorithm relies on a new class of kernel functions having a second derivative with vanishing moments and an asymmetric first derivative steep enough near the origin. We provide a software package which, for a sample of size n, produces estimators with an accuracy of order, at least, O(n−2/5). This contrasts with current algorithms for kink estimation which at best provide an accuracy of order O(n−1/3). In the software, the kernel statistic is standardized and compared to the universal threshold to test the existence of a kink. A simulation study shows that our algorithm enjoys very good finite sample properties even for low sample sizes. The method reveal skink features in real data sets with high noise levels at places where traditional smoothers tend to over smooth the data.
computer vision and pattern recognition | 2011
Lu-Hung Chen; Yao-Hsiang Yang; Chu-Song Chen; Ming-Yen Cheng
Natural images are known to carry several distinct properties which are not shared with randomly generated images. In this article we utilize the scale invariant property of natural images to construct a filter which extracts features invariant to illumination conditions. In contrast to most of the existing methods which assume that such features lie in high frequency part of spectrum, by analyzing the power spectra of natural images we show that some of these features could lie in low frequency part as well. From this fact, we derive a Wiener filter approach to best separate the illumination-invariant features from an image. We also provide a linear time algorithm for our proposed Wiener filter, which only involves solving linear equations with narrowly banded matrix. Our experiments on variable lighting face recognition show that our proposed method does achieve the best recognition rate and is generally faster compared to the state-of-the-art methods.
Annals of Statistics | 2007
Ming-Yen Cheng; Liang Peng; Jyh-Shyang Wu
A variance reduction technique in nonparametric smoothing is proposed: at each point of estimation, form a linear combination of a preliminary estimator evaluated at nearby points with the coefficients specified so that the asymptotic bias remains unchanged. The nearby points are chosen to maximize the variance reduction. We study in detail the case of univariate local linear regression. While the new estimator retains many advantages of the local linear estimator, it has appealing asymptotic relative efficiencies. Bandwidth selection rules are available by a simple constant factor adjustment of those for local linear estimation. A simulation study indicates that the finite sample relative efficiency often matches the asymptotic relative efficiency for moderate sample sizes. This technique is very general and has a wide range of applications.
中國統計學報 | 2006
Ming-Yen Cheng; Shan Sun
In this article, we summarize some quantile estimators and related bandwidth selection methods and give two new bandwidth selection methods. By four distributions: standard normal, exponential, double exponential and log normal we simulated the methods and compared their efficiencies to that of the empirical quantile. It turns out that kernel smoothed quantile estimators, with no matter which bandwidth selection method used, are more efficient than the empirical quantile estimator in most situations. And when sample size is relatively small, kernel smoothed estimators are especially more efficient than the empirical quantile estimator. However, no one method can beat any other methods for all distributions.
Bernoulli | 1998
Ming-Yen Cheng; Peter Hall; D. Michael Titterington
for defining an optimal design measure in the context of local linear regression, when the bandwidth is chosen in a locally optimal manner. An algorithm is proposed that constructs a sequence of piecewise uniform designs with the help of current estimates of the integral of the mean squared error. These estimates do not require direct estimation of the second derivative of the regression function. Asymptotic properties of the algorithm are established and numerical results illustrate the gains that can be made, relative to a uniform design, by using the optimal design or sub-optimal, piecewise uniform designs. The behaviour of the algorithm in practice is also illustrated.