From Surface to Substance: Facing the Surprising Difference Between Validity and Content Validity, Do You Know?

In psychometrics, "content validity" (content validity), or logical validity, refers to the extent to which a measurement tool can cover all aspects of a specific construct. For example, a depression scale that only assesses the affective dimension but ignores the behavioral dimension may be considered to lack content validity. There is a certain amount of subjectivity in judging content validity, which requires a certain degree of consensus on a particular personality trait (such as extraversion). If there is disagreement about a particular personality trait, high content validity cannot be achieved.

Content validity differs from face validity, which refers to what a test appears to measure rather than what the test actually measures.

In test applications, face validity assesses whether a test "seems valid" to participants, administrators, and other technically nonexpert observers. Content validity, on the other hand, requires the mobilization of recognized subject matter experts to evaluate whether the test items assess the defined content, and this evaluation process requires more rigorous statistical testing than face validity. Content validity is most commonly found in academic and vocational testing, where test items need to reflect the actual knowledge required for a subject area (such as history) or occupational skill (such as accounting).

In clinical applications, content validity refers to the correspondence between test items and the symptom content of a syndrome.

Methods of measuring content validity

A widely used method for assessing content validity was proposed by C. H. Lawshe. This is essentially a way to assess the consistency of importance that reviewers or raters place on an item. Lawshe (1975) recommended that the subject matter experts (SMEs) participating in the review answer the following question for each item: "Is the skill or knowledge measured by the item 'essential' to the performance of the job, 'useful but not essential', or 'useful but not necessary' to the performance of the job?" According to Lawshe, if more than half of the reviewers say an item is necessary, then the item has at least some content validity. When more reviewers agree that an item is necessary, the degree of content validity is higher.

Using these assumptions, Lawshe developed a formula called the content validity ratio.

The expression of this formula is:

CVR = (ne - N/2) / (N/2)

Where CVR stands for content validity ratio, ne is the number of subject matter experts marked as “necessary”, and N is the total number of subject matter experts. This formula produces values ​​ranging from +1 to -1, with positive values ​​indicating that at least half of the experts rated the item as necessary. The average CVR for the items can be used to indicate the content validity of the overall test.

Lawshe (1975) provided a table of CVR thresholds that test evaluators can use to determine the size of the calculated CVR required to achieve the expected probability of a breakthrough given a given number of subject matter experts. A closer look at this published table reveals an anomaly. In Schipper's table, the critical value of CVR increases monotonically from 40 experts (minimum = 0.29) to 9 experts (minimum = 0.78), but suddenly decreases at 8 experts ( The minimum value is 0.75), and the highest value is reached in the case of 7 experts (minimum value is 0.99). However, when the formula was applied to 8 reviewers, the data from 7 reviewers who marked it as necessary and 1 reviewer who marked it differently resulted in a CVR of 0.75. If 0.75 is not the critical value, then 8 people would be needed to mark them as necessary in order to derive a CVR of 1.00. In this case, in ascending order of CVR, 8 reviewers and their value would need to be 1.00, which would violate the same principle, as the "perfect" value for 8 reviewers would not apply to other numbers of reviewers.

Whether this deviation from the monotonically increasing pattern of the rest of the table is due to a calculation error by Schipper or a typing or typesetting error is unclear. Wilson, Pan, and Schumsky attempted to correct the error in 2012, but could not find an explanation in Lawshe's paper, nor was there a publication by Schipper to explain the calculation process of the critical value table. The researchers believe that Schipper's value is close to the normal approximation of the binomial distribution. Comparing Schipper's values ​​with the newly calculated binomial values, they discovered that Lawshe and Schipper had incorrectly labeled the public tables as one-tailed tests, when in fact the values ​​reflected the binomial values ​​of a two-tailed test. Subsequently, Wilson and colleagues published a table of recalculated critical values ​​for content validity ratios, providing critical values ​​at multiple significance levels.

The in-depth discussion of content validity not only has important implications for test design, but also promotes the emergence of new thinking patterns in psychometrics. In the process, should we rethink how to effectively measure the effectiveness of tests so that they can be applied more wisely in various real-life situations?

Trending Knowledge

The Myth of Content Validity Calculation: Why is Lawshe's Formula So Crucial and Fascinating?
<header> </header> In psychometrics, content validity is the assessment of the extent to which a measurement instrument adequately represents all aspects of a construct. F
The Power of Expert Assessment: Why Content Validity is Crucial for Psychological Testing?
In psychometrics, content validity (also known as logistic validity) refers to the extent to which a measurement tool can represent all aspects of a specific construct. For example, if a depression as
The Secret of Content Validity: How to Ensure Your Test Items Actually Reflect Psychological Characteristics?
In psychometrics, content validity (also called logical validity) refers to the extent to which a measurement instrument represents all aspects of a particular psychological trait. For example, if a d

Responses