Kernel methods are increasingly used in the fields of statistics and machine learning. This method is mainly based on the assumption of an inner product space and improves the prediction performance by modeling the similarity structure of the input samples. When we talk about traditional methods such as support vector machines (SVM), the original definitions of these methods and their regularization procedures were not from a Bayesian perspective. However, from a Bayesian point of view, understanding the background of these methods yields important insights.
The introduction of kernel methods not only improves the performance of various learning machines, but also provides a new perspective for the theoretical basis of machine learning.
The properties of the kernel are diverse and not necessarily semi-definite, which means that the structure behind it may go beyond the traditional inner product space and turn to the more general repeated kernel Hilbert space (RKHS). In Bayesian probability theory, kernel methods become a key component of Gaussian processes, where the kernel function is called the covariance function. In the past, kernel methods have been traditionally used for supervised learning problems, which usually involve a vector-like input space and a scalar-like output space. In recent years, these methods have been extended to handle multi-output problems, such as multi-task learning.
The main task of supervised learning is to estimate the output of a new input point based on the input and output data of the training set. For example, given a new input point x'
, we need to learn a scalar value estimator _f(x')
, and this estimate is It is based on a training set S
. This training set is composed of n
input-output pairs, represented by S = (X, Y) = (x1, y1), …, (xn, yn)
. A common estimation method is to use a symmetric and positive bivariate function k(⋅, ⋅)
, often called a kernel function.
The challenge of supervised learning is how to effectively learn from known input-output pairs and apply this learning to unseen data points.
In the regularized framework, the main assumption is that the set of functions F
is contained in a repeating kernel Hilbert space Hk
. The properties of the repeating kernel Hilbert space make it even more attractive. First, the "repetitive" property here ensures that we can express any function through a linear combination of kernel functions. Second, these functions are within the closure of linear combinations at given points, which means that we can construct linear and generalized linear models. Third, the square norm of this space can be used to measure the complexity of a function.
The repeated kernel Hilbert space not only provides flexibility in function representation, but also provides a feasible framework for the balance between model complexity.
The explicit form of the estimator is obtained by solving a minimization procedure of the regularization function. This regularization function consists of two main parts: on the one hand, it takes into account the mean squared prediction error; on the other hand, it is a norm that controls the model complexity through the regularization parameter. The regularization parameter λ
determines how much to penalize complexity and instability in the repeating kernel Hilbert space.
In this way, we can not only obtain valid estimates but also reduce the risk of overfitting to a great extent.
Based on the combination of these theories, the estimation method of repeated kernel Hilbert space is adopted, which makes it possible to transform from the traditional view to the Bayesian perspective. Therefore, whether it is regularization or Bayesian inference, we can eventually obtain approximately equivalent estimators. This reciprocal relationship undoubtedly shows the potential of kernel methods in the development of a diverse family of machine learning models.
In the future, as data and computing power grow, will these methods become important milestones in the evolution of machine learning?