Jinchi Lv
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jinchi Lv.
IEEE Transactions on Information Theory | 2011
Jianqing Fan; Jinchi Lv
Penalized likelihood methods are fundamental to ultrahigh dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of nonpolynomial (NP) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a long-standing gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1-penalty, which is a convex function at the boundary of the class of folded-concave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis.
Annals of Statistics | 2009
Jinchi Lv; Yingying Fan
Model selection and sparse recovery are two important problems for which many regularization methods have been proposed. We study the properties of regularization methods in both problems under the unified framework of regularized least squares with concave penalties. For model selection, we establish conditions under which a regularized least squares estimator enjoys a nonasymptotic property, called the weak oracle property, where the dimensionality can grow exponentially with sample size. For sparse recovery, we present a sufficient condition that ensures the recoverability of the sparsest solution. In particular, we approach both problems by considering a family of penalties that give a smooth homotopy between L 0 and L 1 penalties. We also propose the sequentially and iteratively reweighted squares (SIRS) algorithm for sparse recovery. Numerical studies support our theoretical results and demonstrate the advantage of our new methods for model selection and sparse recovery.
Journal of the American Statistical Association | 2013
Wei Lin; Jinchi Lv
High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by modern applications in high-throughput genomic data analysis and credit risk analysis. In this article, we propose a class of regularization methods for simultaneous variable selection and estimation in the additive hazards model, by combining the nonconcave penalized likelihood approach and the pseudoscore method. In a high-dimensional setting where the dimensionality can grow fast, polynomially or nonpolynomially, with the sample size, we establish the weak oracle property and oracle property under mild, interpretable conditions, thus providing strong performance guarantees for the proposed methodology. Moreover, we show that the regularity conditions required by the L 1 method are substantially relaxed by a certain class of sparsity-inducing concave penalties. As a result, concave penalties such as the smoothly clipped absolute deviation, minimax concave penalty, and smooth integration of counting and absolute deviation can significantly improve on the L 1 method and yield sparser models with better prediction performance. We present a coordinate descent algorithm for efficient implementation and rigorously investigate its convergence properties. The practical use and effectiveness of the proposed methods are demonstrated by simulation studies and a real data example.
Journal of the American Statistical Association | 2013
Yingying Fan; Jinchi Lv
High-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular methods being convex and concave ones. A long debate has taken place on whether one class dominates the other, an important question both in theory and to practitioners. In this article, we characterize the asymptotic equivalence of regularization methods, with general penalty functions, in a thresholded parameter space under the generalized linear model setting, where the dimensionality can grow exponentially with the sample size. To assess their performance, we establish the oracle inequalities—as in Bickel, Ritov, and Tsybakov (2009)—of the global minimizer for these methods under various prediction and variable selection losses. These results reveal an interesting phase transition phenomenon. For polynomially growing dimensionality, the L 1-regularization method of Lasso and concave methods are asymptotically equivalent, having the same convergence rates in the oracle inequalities. For exponentially growing dimensionality, concave methods are asymptotically equivalent but have faster convergence rates than the Lasso. We also establish a stronger property of the oracle risk inequalities of the regularization methods, as well as the sampling properties of computable solutions. Our new theoretical results are illustrated and justified by simulation and real data examples.
PLOS Biology | 2015
Sungshin Kim; Kenji Ogawa; Jinchi Lv; Nicolas Schweighofer; Hiroshi Imamizu
Recent computational and behavioral studies suggest that motor adaptation results from the update of multiple memories with different timescales. Here, we designed a model-based functional magnetic resonance imaging (fMRI) experiment in which subjects adapted to two opposing visuomotor rotations. A computational model of motor adaptation with multiple memories was fitted to the behavioral data to generate time-varying regressors of brain activity. We identified regional specificity to timescales: in particular, the activity in the inferior parietal region and in the anterior-medial cerebellum was associated with memories for intermediate and long timescales, respectively. A sparse singular value decomposition analysis of variability in specificities to timescales over the brain identified four components, two fast, one middle, and one slow, each associated with different brain networks. Finally, a multivariate decoding analysis showed that activity patterns in the anterior-medial cerebellum progressively represented the two rotations. Our results support the existence of brain regions associated with multiple timescales in adaptation and a role of the cerebellum in storing multiple internal models.
Biometrika | 2014
Yingying Fan; Jinchi Lv
Two important goals of high-dimensional modelling are prediction and variable selection. In this article, we consider regularization with combined L1 and concave penalties, and study the sampling properties of the global optimum of the suggested method in ultrahigh-dimensional settings. The L1 penalty provides the minimum regularization needed for removing noise variables in order to achieve oracle prediction risk, while a concave penalty imposes additional regularization to control model sparsity. In the linear model setting, we prove that the global optimum of our method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Moreover, we establish oracle risk inequalities for the method and the sampling properties of computable solutions. Numerical studies suggest that our method yields more stable estimates than using a concave penalty alone.
Annals of Statistics | 2017
Yinfei Kong; Daoji Li; Yingying Fan; Jinchi Lv
Feature interactions can contribute to a large proportion of variation in many prediction models. In the era of big data, the coexistence of high dimensionality in both responses and covariates poses unprecedented challenges in identifying important interactions. In this paper, we suggest a two-stage interaction identification method, called the interaction pursuit via distance correlation (IPDC), in the setting of high-dimensional multi-response interaction models that exploits feature screening applied to transformed variables with distance correlation followed by feature selection. Such a procedure is computationally efficient, generally applicable beyond the heredity assumption, and effective even when the number of responses diverges with the sample size. Under mild regularity conditions, we show that this method enjoys nice theoretical properties including the sure screening property, support union recovery, and oracle inequalities in prediction and estimation for both interactions and main effects. The advantages of our method are supported by several simulation studies and real data analysis.
Archive | 2006
Jianqing Fan; Yingying Fan; Jinchi Lv
High dimensionality comparable to sample size is common in many statistical problems. We examine covariance matrix estimation in the asymptotic framework that the dimensionality p tends to infinity as the sample size n increases. Motivated by the Arbitrage Pricing Theory in finance, a multi-factor model is employed to reduce dimensionality and to estimate the covariance matrix. The factors are observable and the number of factors K is allowed to grow with p. We investigate impact of p and K on the performance of the model-based covariance matrix estimator. Under mild assumptions, we have established convergence rates and asymptotic normality of the model-based estimator. Its performance is compared with that of the sample covariance matrix. We identify situations under which the factor approach increases performance substantially or marginally. The impacts of covariance matrix estimation on portfolio allocation and risk management are studied. The asymptotic results are supported by a thorough simulation study.
Annals of Statistics | 2016
Yingying Fan; Jinchi Lv
Large-scale precision matrix estimation is of fundamental importance yet challenging in many contemporary applications for recovering Gaussian graphical models. In this paper, we suggest a new approach of innovated scalable efficient estimation (ISEE) for estimating large precision matrix. Motivated by the innovated transformation, we convert the original problem into that of large covariance matrix estimation. The suggested method combines the strengths of recent advances in high-dimensional sparse modeling and large covariance matrix estimation. Compared to existing approaches, our method is scalable and can deal with much larger precision matrices with simple tuning. Under mild regularity conditions, we establish that this procedure can recover the underlying graphical structure with significant probability and provide efficient estimation of link strengths. Both computational and theoretical advantages of the procedure are evidenced through simulation and real data examples.
Annals of Statistics | 2013
Jinchi Lv
High-dimensional data sets are commonly collected in many contemporary applications arising in various fields of scientific research. We present two views of finite samples in high dimensions: a probabilistic one and a nonprobabilistic one. With the probabilistic view, we establish the concentration property and robust spark bound for large random design matrix generated from elliptical distributions, with the former related to the sure screening property and the latter related to sparse model identifiability. An interesting concentration phenomenon in high dimensions is revealed. With the nonprobabilistic view, we derive general bounds on dimensionality with some distance constraint on sparse models. These results provide new insights into the impacts of high dimensionality in finite samples. 1. Introduction. Thanks to the advances of information technologies, largescale data sets with a large number of variables or dimensions are commonly collected in many contemporary applications that arise in different fields of sciences, engineering and humanities. Examples include marketing data in business, panel data in economics and finance, genomics data in heath sciences, and brain imaging data in neuroscience, among many others. The emergence of a large amount of information contained in high-dimensional data sets provides opportunities, as well as unprecedented challenges, for developing new statistical methods and theory. See, for example, Hall (2006 )a ndFan and Li (2006) for insights and discussions on the statistical challenges associated with high dimensionality, and Fan and Lv (2010) for a brief review of some recent developments in high-dimensional sparse modeling with variable selection. The approach of variable selection aims to effectively identify important variables and efficiently estimate their effects on a response variable of interest. For the purpose of prediction and variable selection, it is important to understand and characterize the impacts of high dimensionality in finite samples. Hall, Marron and Neeman (2005) investigated this problem under the asymptotic framework of fixed sample size n and diverging dimensionality p, and revealed an interesting geometric representation of high dimension, low sample size data. When viewed in the diverging p-dimensional Euclidean space, the randomness in the