In psychometrics, content validity (also known as logistic validity) refers to the extent to which a measurement tool can represent all aspects of a specific construct. For example, if a depression assessment tool only assesses the affective aspects of depression and ignores the behavioral aspects, its content validity will be questioned.
There is a degree of subjectivity in judging content validity, which requires a certain consensus on a particular personality trait, such as extraversion.
Content validity differs from face validity, which is concerned with whether a test appears to be valid on its face rather than what the test actually measures. Face validity assesses whether a test "seems to work," usually to the test takers who take the test, the administrators who decide to use the test, and other non-technical observers.
Content validity requires the use of recognized experts in the professional field to evaluate whether the test items can fully reflect the defined content, and conduct more rigorous statistical tests than face validity. Content validity is commonly applied in academic and vocational tests where test items must reflect a specific subject area, such as history, or a vocational skill, such as accounting.
In a clinical setting, content validity is concerned with the correspondence between test items and the content of a syndrome.
C.H. Lawshe proposed a widely used method of measuring content validity. This method is basically used to assess the degree of agreement between evaluators or judges on a particular item. In an article on pre-employment testing, Lawshe (1975) recommended that each panelist answer the following question for each item: "Is the skill or knowledge measured by the item 'essential' to job performance?" 'Useful but not necessary' or 'unnecessary'?"
According to Lawshe's hypothesis, if more than half of the expert panel members believe that an item is "necessary," the item has at least some content validity. The degree of content validity increases as more raters agree that an item is necessary.
Based on these assumptions, Lawshe developed a formula called the Content Validity Ratio (CVR).
The calculation of this formula is as follows: CVR = (ne - N/2) / (N/2)
, where CVR
means Content validity ratio, ne
is the number of experts who consider the item "necessary", and N
is the total number of experts on the panel. Values for this formula range from +1 to -1, with positive values indicating that at least half of the experts believe the project is necessary. The average CVR of all items can also be regarded as an indicator of the overall content validity of the test.
Lawshe (1975) also provides a table of critical values for CVR so that test evaluators can judge whether the calculated CVR value exceeds chance expectations based on the number of expert panels. The table was calculated by Lawshe's friend Lowell Schipper. Close inspection of this public table revealed an anomaly. In Schipper's table, the critical value of CVR gradually increases when the number of experts is reduced from 40 (minimum = .29) to 9 (minimum = .78), but unexpectedly decreases at 8 experts (minimum = .75), then reaches its upper limit (minimum = .99) with 7 experts.
However, when this formula is applied to 8 raters, the result of 7 "required" and 1 "other" reviews will generate a CVR value of .75. If .75 was not the critical value, then all 8 evaluators would be required to rate it "essential," which would result in a CVR of 1.00. In this case, to keep the CVR in increasing order, its value would necessarily be 1.00 in the case of 8 evaluators, which would violate the same principle since you would have the "perfect" value required for 8 evaluators , but there are no corresponding values for other ratings above or below 8 evaluators.
Wilson, Pan, and Schumsky (2012) attempted to correct this error, but found no explanation in Lawshe's work, and no publication by Schipper describing how to calculate the critical value table. Wilson and colleagues determined that Schipper's value is close to a normal approximation of the binomial distribution. By comparing Schipper's values with newly calculated binomial values, they found that Lawshe and Schipper mislabeled the tables they published as one-tailed tests, when in fact the values mirrored the two-tailed test values of the binomial distribution. Subsequently, Wilson and colleagues published a recalculation of critical values for content validity ratios and provided a table of unit-step critical values at multiple alpha levels.
Content validity plays a vital role in psychological testing as it ensures that the test accurately assesses the constructs required to reflect real-life situations. As the use of tests in a variety of settings becomes more common, the emphasis on content validity in the professional and academic world becomes more evident. When considering future test designs, we should ask ourselves: How can the content validity of psychological tests be more effectively improved to promote more accurate assessments?