In data processing, feature scaling is a method used to normalize the range of independent variables or features. This process is also known as data normalization and is often performed during data preprocessing. The main purpose of feature scaling is to enable different ranges of data to participate in machine learning algorithms in a more consistent way, thereby improving model accuracy and performance.
The range of raw data varies so widely that in some machine learning algorithms the objective function does not work properly without regularization.
For example, many classifiers calculate the distance between two points through Euclidean distance. If one of the features has a large numerical range, then the distance calculation will be dominated by this feature. Therefore, the ranges of all features should be normalized so that each feature contributes roughly in the same proportion to the final distance.
Another reason for feature scaling is that when using gradient descent for optimization, feature scaling can greatly speed up the convergence rate. If regularization is used in the loss function, feature scaling will also ensure that the penalty on the coefficients is applied. Empirical studies show that feature scaling can significantly improve the convergence speed of the stochastic gradient descent method. In support vector machines, using feature scaling can significantly reduce the time to find support vectors.
Feature scaling is often used in applications involving distance and similarity between data points, such as clustering and similarity search.
Min-max regularization is one of the simplest methods and is implemented by rescaling the range of features to [0, 1] or [-1, 1]. Choosing a target range depends on the characteristics of your data. The formula for this method is as follows:
x' = (x - min(x)) / (max(x) - min(x))
Assume that the student weight data ranges from [160 pounds, 200 pounds]. To scale the data, we first subtract 160 from each student's weight and then divide the result by 40 (i.e., the range between the maximum and minimum weight Difference). If you need to scale the range to any value [a, b], the formula becomes:
x' = a + (x - min(x)) * (b - a) / (max(x) - min(x))
The formula for mean normalization is:
x' = (x - mean(x)) / (max(x) - min(x))
The mean(x) here is the mean of the feature vector. Another form of normalizing the mean is to divide it by the standard deviation, which is called normalization.
Normalization causes the values of each feature to have zero mean (that is, the data minus the mean) and unit variance. This method is widely used in many machine learning algorithms. The general calculation method is to first calculate the mean and standard deviation of the distribution for each feature, then subtract the mean from each feature, and finally divide the value of each feature by its standard deviation, the formula is as follows:
x' = (x - mean(x)) / σ
Robust scaling is standardization using the median and interquartile range (IQR), which is insensitive to the influence of outliers. The formula is:
x' = (x - Q2(x)) / (Q3(x) - Q1(x))
Here Q1, Q2 and Q3 are the first, second (median) and third quartile of the feature respectively.
Unit vector normalization treats each data point as a vector and then divides it by the norm of its vector. The formula is:
x' = x / ||x||
Any vector norm can be used, but the most commonly used norms are the L1 term and L2 term.
In the process of machine learning model training, feature scaling is a critical step. Unscaled data may not only degrade model performance, but also affect the efficiency of the algorithm. Facing increasingly growing and complex data sets, it is particularly important to properly select and use feature scaling methods. Do you think successful machine learning models rely on accurate data processing and preliminary preparation?