In scientific research and statistical analysis, p-value is an important statistical concept, but it often confuses people. The p-value is reflected in false positive tests, especially when we perform null hypothesis tests, and represents the likelihood that the observed data are at least extreme given that the null hypothesis is true. However, misunderstandings and incorrect uses of p-values are common in mathematics and related sciences. Therefore, we need to delve deeper into what the p-value really means and its applications.
While reporting p-values for statistical tests is common practice in many academic publications, the misunderstanding and incorrect use of p-values has become a major topic.
In statistics, every conjecture about an unknown probability distribution of observed data is called a statistical hypothesis. If we only state a hypothesis and the purpose of the statistical test is to see if the hypothesis is reasonable, then this type of test is called null hypothesis testing. The null hypothesis means that the property of the hypothesis does not exist. Typically, the null hypothesis assumes that some parameter, such as a correlation or mean difference, is zero. When we perform an inspection, we calculate a numerical statistic and use it to infer whether the observed data is statistically significant.
By definition, the p-value is the probability of obtaining a detection statistic that is at least as extreme as the observed result if the null hypothesis is true. Therefore, the smaller the p-value, the more we can doubt the validity of the null hypothesis. However, this does not mean that the null hypothesis is false.
The American Statistical Association states, "The p-value does not measure the chance that the research hypothesis is true, nor does it measure the chance that the data were generated randomly."
The p value is widely used in statistical hypothesis testing. Before conducting a study, the researcher selects a model (null hypothesis) and a significance level α (most commonly 0.05). If the p-value is less than α, it means that the observed data are sufficiently inconsistent with the null hypothesis that we can reject the hypothesis. However, many statisticians have raised the issue of misuse and misinterpretation of p-values, such as treating any p-value less than 0.05 as supporting the alternative hypothesis.
Other statisticians recommend abandoning p-values and focusing more on other inferential statistics methods, such as confidence intervals, likelihood ratios, or Bayes factors.
Typically, calculating a p-value requires identifying the test statistic, the one-tailed or two-tailed test the researcher chose to perform, and the data. If the null hypothesis is true, the p-values should be evenly distributed between 0 and 1, which means that when you repeat the same test, you will usually get different p-values, even if the null hypothesis is true.
Suppose you conduct an experiment to test whether a coin is fair. The results showed that out of 20 coin tosses, heads came up 14 times. In this case, the null hypothesis is that the coin is fair. If we perform right-tail testing, that is, focus on confirming whether the coin is biased towards heads, then the p value is the probability of at least 14 heads occurring if the coin is fair.
In summary, p-values are undoubtedly an integral part of statistics, but we must be careful when using them as a tool to judge research hypotheses. Careful consideration of the context of the p-value and the corresponding research design is a necessary step. Do you already have a deeper understanding of this number?