In statistics, sample standard deviation is a basic tool to describe the degree of dispersion of data. However, this indicator often fails to reflect the real situation as we would like. This is not only because the data relied on when calculating the sample standard deviation may be biased, but also because the mathematical methods used when making inferences may be imperfect. In particular, Bessel's correction is mentioned, which uses n-1 instead of n method in the calculation of sample variance, aiming to correct the estimation bias of the parent variance, but can it really eliminate these biases completely?
Although the sample variance (with Bess correction) is an unbiased estimate of the population variance, its square root, the sample standard deviation, is a biased estimate of the population standard deviation.
According to the principles of statistics, the purpose of Bays correction is to correct the bias caused by limited samples. When the sample mean is used instead of the population mean, the sample variation estimate will still be biased. Using the sample mean as an alternative to the unknown parent mean will lead to the unbiasedness of the sample variance, but the calculation of its square root is no longer unbiased, which is one of the reasons why the sample standard deviation often cannot accurately reflect the real situation.
For example, in the extreme case of a sample size of 1, the number of variations cannot be accurately calculated because there is no other variation in the sample to use as a reference. As our sample size increases, the problem remains that the calculated values are still a reflection of the sample itself and therefore of limited help with the true characteristics of the parent population.
For the calculation of sample standard deviation, when the population mean is known, the calculation of variation has a higher degree of freedom.
Bess correction is necessary only when we lack knowledge of the parent mean. If we can obtain an accurate population mean, it would be more efficient to use all n degrees of freedom when calculating the variance and standard deviation. In addition, for the case of different parent distributions, novel correction methods may be needed to further reduce the bias, because simply using Bess correction does not always give the best mean square error (MSE).
To explain this in more depth, let's consider a concrete example. Assume that the data set of a parent is (0, 0, 0, 1, 2, 9). The calculated values of the parent mean and variance are obviously similar to the sample mean, but they cannot accurately reflect the fluctuation of the true variance. If we only rely on a small sample for estimation, the sample variation will be 0, and the result obtained will be the result of bias.
If the sample mean and the population mean are the same, then the variation calculation with any other mean will inevitably lead to a larger result.
As in many situations in data statistics, the so-called accuracy is often difficult to achieve, especially in the calculation of sample standard deviation that we often use. Many changing factors may have an effect depending on the sample size or distribution, and These impacts are often beyond our complete control and predictability.
Can we fundamentally reexamine the accuracy of sample standard deviations without considering Bass correction? And more importantly, can we find more precise statistical methods to reduce the bias in these measurements to ensure that comparisons between different samples can be considered true and trustworthy?