In medicine and psychology, "clinical significance" refers to the practical importance of the treatment effect, that is, whether it has a real and perceptible impact on daily life. In this article, we will delve into the difference between statistical significance and practical significance and reveal the critical role of clinical significance in changing a patient’s diagnostic label during the treatment process.
Statistical significance is used in hypothesis testing to test the validity of the "no relationship hypothesis" (i.e., there is no relationship between the variables).
Statistical significance is usually chosen as α = 0.05 or 0.01, which represents the probability of falsely rejecting the hypothesis of true no relationship in hypothesis testing. If a significant difference is obtained at a significance level of α = 0.05, this means that there is only a 5% probability of obtaining the observed result assuming that the no relationship hypothesis is true. However, this is only a statistically significant result and does not provide an indication of the magnitude or clinical importance of the difference. In contrast, practical implications focus on the effectiveness of an intervention or treatment and quantify the extent of change caused by the treatment. This involves using measures such as effect size, number needed to treat (NNT) and proportion prevented. Effect size is a kind of practical significance. It can quantify the deviation between the sample and the expectation, which helps to understand the research results. However, it should be noted that the effect size itself has potential sources of bias and usually focuses on group effects rather than individual effects. change.
Clinical significance answers the question, "Is the effect of the treatment significant enough to change the patient's diagnostic label?"
In psychology and psychotherapy, the concept of "clinical significance" is more precisely defined. In clinical research, clinical significance focuses on the ability of a treatment to make a patient no longer meet the criteria for a diagnosis. For example, a treatment may produce a statistically significant change in depressive symptoms and have a large effect size, but this does not mean that all patients are no longer dysfunctional.
There are many methods for calculating clinical significance. Five common methods include: Jacobson-Truax method, Gulliksen-Lord-Novick method, Edwards-Nunnally method, Hageman-Arrindell method and hierarchical linear model (HLM).
Jacobson-Truax methodThe Jacobson-Truax method is a common method for calculating clinical significance, and its calculation process involves the "Reliability Change Index (RCI)". This index is calculated as the difference between a participant's pre-test and post-test scores divided by the standard error of the score difference. Based on the directionality and cutoff value of the RCI, participants were classified as: recovered, improved, unchanged, or worsened.
The Gulliksen-Lord-Novick method is similar to the Jacobson-Truax method, but it takes into account the effects of mean regression. It was calculated by subtracting the mean of the relevant population from the pre-test and post-test scores and dividing by the standard deviation of the population.
Edwards-Nunnally MethodThe Edwards-Nunnally method is a more rigorous alternative for calculating clinical significance. In this approach, pretest scores are reliability corrected and confidence intervals are constructed for the adjusted pretest scores so that the actual score change required to show clinical significance is larger relative to the Jacobson-Truax method.
The Hageman-Arrindell method involves indices of group change and individual change, using a reliability index of change to indicate the extent to which a patient has improved. This approach also provides four categories similar to the Jacobson-Truax approach: worsening, no reliable change, improved but not recovered, and recovered.
Hierarchical linear models investigate changes using growth curve analysis rather than just pretest vs. posttest comparisons, thus requiring three data points per patient. When using HLM for analysis, estimates of change were calculated for each participant and allowed for analysis of growth curve models for groups and dyads.
Finally, although there is a difference between statistical significance and practical significance, in the clinical setting, a good treatment effect must not only be statistically significant but also have practical clinical impact. In other words, how to define "successful" treatment may be a question that each of us needs to reflect on?