In psychometrics, content validity is the assessment of the extent to which a measurement instrument adequately represents all aspects of a construct. For example, if a depression scale only assesses affective aspects but ignores behavioral aspects, the scale may lack content validity. The subjectivity involved makes determining content validity less clear-cut, as interpretations of specific personality traits, such as extraversion, often differ. If there are differences in the understanding of a personality trait among experts, it will be difficult to achieve high content validity.
Content validity is different from face validity. Face validity only assesses whether a test appears to be valid but does not actually reflect what it measures.
Face validity focuses on whether a test appears to "work" to test takers, administrators, and other technically unsophisticated observers. For example, when candidates see a piece of material, do they think it tests their knowledge or skills? In contrast, content validity requires the use of professional content reviewers to evaluate whether the test items reflect the defined content, and also requires more rigorous statistical testing than face validity. Content validity is often cited in the context of academic and vocational testing because test items need to reflect the knowledge required for a specific professional field (e.g., history) or job skill (e.g., accounting). In a clinical setting, content validity refers to the correspondence between test items and disease content.
The method of measuring content validity proposed by Lawshe mainly evaluates the degree of agreement between evaluators, and this method is still widely used today.
A famous method proposed by Lawshe in 1975 to assess reviewers' views on whether a project is "necessary". According to Lawshe, each expert reviewer answers the question for each test item: "Does the skill or knowledge measured by the item be 'essential', 'useful but not essential', or 'unnecessary'?" If more than half of the reviewers believe that an item is essential, then the item has at least some degree of content validity. As more reviewers agree on the necessity of an item, the degree of content validity increases accordingly. Lawshe therefore developed a formula called the content validity ratio.
The content validity ratio is calculated between +1 and -1, with positive values indicating that at least half of the expert reviewers consider the item to be required.
In further exploration of Lawshe's method, scholars found some unusual variations in the minimum required content validity ratio when using eight reviewers. This phenomenon attracted the attention of scholars such as Wilson, Pan and Schumsky, who proposed a reassessment of these values in 2012. By comparing the binomial distribution values they calculated with Schipper's values, the scholars found that the table marked by Lawshe and Schipper was actually the result of a two-tailed test, but was mislabeled as a one-tailed test, which caused confusion about the original values. Such errors make the critical values of content validity ratios inconsistent when calculated using different numbers of reviewers.
Therefore, re-evaluation of content validity ratios is crucial. This recalculation process provides valuable information that allows us to examine the importance and impact of content validity within a new framework of understanding. This series of studies not only reveals possible biases in test design and evaluation, but also prompts the academic community to reflect on the reliability and validity of content measurement.
From theory to practice, Lawshe's content validity ratio is not just a set of numbers, it is also a guarantee that the test we use can truly reflect the personality traits or behavioral standards we care about.
When discussing the issue of content validity, we can't help but ask, how can we find a balance between subjectivity and objectivity, and thus improve our confidence in the test results?