Shaogao Lv
Southwestern University of Finance and Economics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shaogao Lv.
Neural Computation | 2016
Yunlong Feng; Shaogao Lv; Hanyuan Hang; Johan A. K. Suykens
Kernelized elastic net regularization (KENReg) is a kernelization of the well-known elastic net regularization (Zou & Hastie, 2005). The kernel in KENReg is not required to be a Mercer kernel since it learns from a kernelized dictionary in the coefficient space. Feng, Yang, Zhao, Lv, and Suykens (2014) showed that KENReg has some nice properties including stability, sparseness, and generalization. In this letter, we continue our study on KENReg by conducting a refined learning theory analysis. This letter makes the following three main contributions. First, we present refined error analysis on the generalization performance of KENReg. The main difficulty of analyzing the generalization error of KENReg lies in characterizing the population version of its empirical target function. We overcome this by introducing a weighted Banach space associated with the elastic net regularization. We are then able to conduct elaborated learning theory analysis and obtain fast convergence rates under proper complexity and regularity assumptions. Second, we study the sparse recovery problem in KENReg with fixed design and show that the kernelization may improve the sparse recovery ability compared to the classical elastic net regularization. Finally, we discuss the interplay among different properties of KENReg that include sparseness, stability, and generalization. We show that the stability of KENReg leads to generalization, and its sparseness confidence can be derived from generalization. Moreover, KENReg is stable and can be simultaneously sparse, which makes it attractive theoretically and practically.
Information Sciences | 2015
Shaogao Lv; Fanyin Zhou
One of the most promising learning kernel methods is the l p -type multiple kernel learning proposed by Kloft et al. (2009). This method can adaptively select kernel function in supervised learning problems. The majority of the studies associated with generalization error have recently received wide attention in machine learning and statistics. The present study aims to establish a new generalization error bound under more general frameworks, in which the correlation among reproducing kernel Hilbert spaces (RKHSs) is considered, and the restriction of smooth condition on the target function is relaxed. In this case, the interaction between the estimation and approximation errors must be simultaneously regarded. In this investigation, optimal learning rates are derived by applying the local Rademacher complexity technique, which is given in terms of the capacity of RKHSs spanned by multi-kernels and target function regularity.
Neural Computation | 2015
Shaogao Lv
Gradient learning (GL), initially proposed by Mukherjee and Zhou (2006) has been proved to be a powerful tool for conducting variable selection and dimensional reduction simultaneously. This approach presents a nonparametric version of a gradient estimator with positive definite kernels without estimating the true function itself, so that the proposed version has wide applicability and allows for complex effects between predictors. In terms of theory, however, existing generalization bounds for GL depend on capacity-independent techniques, and the capacity of kernel classes cannot be characterized completely. Thus, this letter considers GL estimators that minimize the empirical convex risk. We prove generalization bounds for such estimators with rates that are faster than previous results. Moreover, we provide a novel upper bound for Rademacher chaos complexity of order two, which also plays an important role in general pairwise-type estimations, including ranking and score problems.
Mathematical and Computer Modelling | 2013
Shaogao Lv; Yunlong Feng
Abstract In recent years there has been an increasing interest in learning with the coefficient-based space spanned by a kernel function, since it provides great flexibility for the learning process and is adopted easily to other algorithms. We investigate the spectral clustering algorithms by learning with the l 1 -regularizer scheme in a coefficient-based hypothesis space. The main difficulty to study spectral clustering in our setting is that the hypothesis space not only depends on a coefficient-based space, but also depends on some constrained conditions. We technically overcome this difficultly by a local polynomial reproduction formula and a construction method. The consistency of spectral clustering algorithms given consideration to sparsity is stated in terms of properties of the data space, the underlying measure, the kernel as well as the regularity of a target function.
Neural Processing Letters | 2017
Guo Niu; Zhengming Ma; Shaogao Lv
As a class of semi-supervised learning methods, manifold regularization learning has recently attracted a lot of attention, due to their great success in exploiting the underlying geometric structures among data. This paper presents a novel semi-supervised approach by combining manifold regularization learning with the idea of multiple kernels, named after ensemble multiple-kernel manifold regularization learning. In our approach, multiple kernels we introduced are not only used to add the flexibility and diversity of the candidate space for the learning problem, but also act as a similarity measure to search for an optimal graph Laplacian in some sense. In other words, the proposed method allows us to learn an ’ideal’ kernel and an optimal graph Laplacian simultaneously, which is of significant difference from existing methods. The associated optimization problem is solved efficiently by an alternating iteration procedure. We implement experiments over four real world data sets to demonstrate the benefits of the proposed method.
Neurocomputing | 2013
Shaogao Lv; Tiefeng Ma; Liu Liu; Yunlong Feng
Learning with coefficient-based regularization has attracted a considerable amount of attention recently in both machine learning and statistics. This paper presents a kernelized version of a quantile estimator integrated with coefficient-based regularization, which can be solved efficiently by a simple linear programming. Fast convergence rates are obtained under mild condition on the underlying distribution. Besides, this algorithm can be adapted easily to large scale problems and sparse solution is often achieved as that of Lasso. In our work we make the following main contributions: girst, improved learning rates are obtained by employing so called variance bounds, which is optimal in the literatures of learning theory; second, we establish stronger convergence rates by employing self-calibration inequalities; third, our learning rates can also be derived by a simple data-dependent parameter selection method; finally, the performance of the classical and our new algorithms are compared respectively in a simulation study and an actual problem.
Statistics | 2018
Anchao Song; Tiefeng Ma; Shaogao Lv; Changsheng Lin
ABSTRACT Under the sufficient dimension reduction () framework, we propose a model-free variable selection method for reducing the number of redundant predictors. The method adopts the distance correlation as a dependence measure to quantify the relevance and redundancy of a predictor, and searches for a set of the relevant but non-redundant predictors. Two forward screening algorithms are given to find an approximate solution to the set of the relevant but non-redundant predictors. The screening consistency of the proposed method and algorithms has been fully studied. The effectiveness of the proposed method and algorithms is illustrated by the simulation experiments and two real examples. The experimental results show that the proposed method can effectively exclude the redundant predictors and yield a more parsimonious subset of the relevant predictors.
Journal of Multivariate Analysis | 2018
Shaogao Lv; Mengying You; Huazhen Lin; Heng Lian; Jian Huang
In this paper we study the l1-penalized partial likelihood estimator for the sparse high-dimensional Cox proportional hazards model. In particular, we investigate how the l1-penalized partial likelihood estimation recovers the sparsity pattern and the conditions under which the sign support consistency is guaranteed. We establish sign recovery consistency and l∞-error bounds for the Lasso partial likelihood estimator under suitable and interpretable conditions, including mutual incoherence conditions. More importantly, we show that the conditions of the incoherence and bounds on the minimal non-zero coefficients are necessary, which provides significant and instructional implications for understanding the Lasso for the Cox model. Numerical studies are presented to illustrate the theoretical results.
Mathematical Problems in Engineering | 2016
Shaogao Lv; Luhong Wang
Partial linear models, a family of popular semiparametric models, provide us with an interpretable and flexible assumption for modelling complex data. One challenging question in partial linear models is the structure identification for the linear components and the nonlinear components, especially for high dimensional data. This paper considers the structure identification problem in the general partial linear single-index models, where the link function is unknown. We propose two penalized methods based on a modern dimension reduction technique. Under certain regularity conditions, we show that the second estimator is able to identify the underlying true model structure correctly. The convergence rate of the new estimator is established as well.
Mathematical Problems in Engineering | 2015
Chao Zhang; Shaogao Lv
Kernel selection is a central issue in kernel methods of machine learning. In this paper, we investigate the regularized learning schemes based on kernel design methods. Our ideal kernel is derived from a simple iterative procedure using large scale unlabeled data in a semisupervised framework. Compared with most of existing approaches, our algorithm avoids multioptimization in the process of learning kernels and its computation is as efficient as the standard single kernel-based algorithms. Moreover, large amounts of information associated with input space can be exploited, and thus generalization ability is improved accordingly. We provide some theoretical support for the least square cases in our settings; also these advantages are shown by a simulation experiment and a real data analysis.