In the field of statistics, the p-value is undoubtedly one of the most popular and controversial concepts. The p-value represents the probability of observing a result at least as extreme as the test statistic when the null hypothesis is true. This number is crucial for the interpretation and publication of research results, but its misuse and misunderstanding are widespread in the scientific community.
The American Statistical Association states: "The p-value does not measure the probability that the research hypothesis is true or the probability that the data were generated by random chance alone."
In statistics, each assumption about the distribution of observed data is called a statistical hypothesis. When we test a specific hypothesis, the null hypothesis, our goal is to test whether this hypothesis is true given that the null hypothesis is true.
The null hypothesis usually states that a parameter (such as a correlation or a mean difference) is zero in the specific context of the study. For example, suppose a test statistic T follows a standard normal distribution N(0, 1) under the null hypothesis. If we reject the null hypothesis, it usually means that we support a non-zero consideration to some extent. But this doesn't cover the full picture of the data we know about.
The calculation of the p-value is the core of statistical testing. If observations are drawn from a distribution and a statistic is calculated, the p-value is the probability of the statistic being true if the hypothesis is true. For example, if the statistic t is the outcome statistic of interest, the p-value can be viewed as the probability of observing a value less than or equal to t given the null hypothesis H0.
The null hypothesis H0 usually means that a parameter is zero. For the accepted critical value α, when the p value is less than or equal to α, we will reject the null hypothesis.
When conducting a hypothesis test, researchers will set the significance level α in advance, usually 0.05. If the calculated p-value is lower than this value, it means that the observed data are sufficiently incompatible with the null hypothesis to reject it. But this does not mean that the null hypothesis is absolutely wrong.
The American Statistical Association notes that p-values are often misused. In particular, some scholars tend to assume that the alternative hypothesis is valid simply because the p-value is less than 0.05, while ignoring the importance of other supporting evidence. Many statisticians suggest that the p-value should not be viewed as a tool to measure the correctness of a hypothesis, but should be combined with other statistical indicators to make a comprehensive assessment.
For example, if we want to test whether a coin is fair, suppose we toss it 20 times and it comes up heads 14 times. Our null hypothesis is that the coin is fair. In this case, we calculate the p-value to find out the probability of getting so many heads given a fair coin. If this probability is very small, we have reason to doubt the fairness of the coin.
Conclusion“The p-value does not make a statement about the correctness of a hypothesis, but rather tests the strength of the incompatibility of the observed data with a particular model.”
The p-value is undoubtedly one of the indispensable tools in scientific research, but it should be used with caution. For researchers, understanding the nature of the p-value, the limitations it brings, and learning how to appropriately interpret and report the p-value will help them interpret the data more correctly. In this case, what key evaluation criteria are more needed for scientific progress?