Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Edith M. L. A. van Krimpen-Stoop is active.

Publication


Featured researches published by Edith M. L. A. van Krimpen-Stoop.


Applied Psychological Measurement | 1999

The Null Distribution of Person-Fit Statistics for Conventional and Adaptive Tests.

Edith M. L. A. van Krimpen-Stoop; Rob R. Meijer

Several person-fit statistics have been proposed to detect item score patterns that do not fit an item response theory model. To classify response patterns as misfitting, the distribution of a person-fit statistic is needed. The theoretical null distributions of several fit statistics have been derived for paper-and-pencil (P&P) tests. However, it is unknown whether these distributions also hold for computerized adaptive tests (CAT). A three-part simulation study was conducted. In the first study, the theoretical distribution of the l z statistic across trait. θlevels for CAT and P&P tests was investigated. The distribution of the l* z statistic proposed by Snijders (in press) was also investigated. Results indicated that the distribution of both l z and l* z differed from the theoretical distribution in CAT. The second study examined the distributions of l z and l* z using simulation. These simulated distributions, when based on O [UNKNOWN], were found to be problematic in CAT. In the third study, the detection rates of l* z and l z were compared. The rates for both statistics were found to be similar in most cases.


Psychometrika | 2003

USING RESPONSE TIMES TO DETECT ABERRANT RESPONSES IN COMPUTERIZED ADAPTIVE TESTING

Wim J. van der Linden; Edith M. L. A. van Krimpen-Stoop

A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.


Journal of Educational and Behavioral Statistics | 2001

CUSUM-Based Person-Fit Statistics for Adaptive Testing.

Edith M. L. A. van Krimpen-Stoop; Rob R. Meijer

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. Several person-fit statistics for detecting nonfitting score patterns for paper-and-pencil tests have been proposed. In the context of computerized adaptive tests (CAT), the use of person-fit analysis has hardly been explored. Because it has been shown that the distribution of existing person-fit statistics is not applicable in a CAT, in this study new person-fit statistics are proposed and critical values for these statistics are derived from existing statistical theory. Statistics are proposed that are sensitive to runs of correct or incorrect item scores and are based on all items administered in a CAT or based on subsets of items, using observed and expected item scores and using cumulative sum (CUSUM) procedures. The theoretical and empirical distributions of the statistics are compared and detection rates are investigated. Results showed that the nominal and empirical Type I error rates were comparable for CUSUM procedures when the number of items in each subset and the number of measurement points were not too small. Detection rates of CUSUM procedures were superior to other fit statistics. Applications of the statistics are discussed.


Computer adaptive testing: Theory and practice | 2000

Detecting Person Misfit in Adaptive Testing Using Statistical Process Control Techniques

Edith M. L. A. van Krimpen-Stoop; Rob R. Meijer

The aim of a computerized adaptive test (CAT) is to construct an optimal test for an individual examinee. To achieve this, the ability of the examinee is estimated during test administration and items are selected that match the current ability estimate. This is done using an item response theory (IRT) model that is assumed to describe an examinee’s response behavior. It is questionable, however, whether the assumed IRT model gives a good description of each examinee’s test behavior. For those examinees for whom this is not the case, the current ability estimate may be inadequate as a measure of the ability level and as a result the construction of an optimal test may be flawed. There are all sorts of factors that may cause an ability estimate to be invalidated. For example, examinees may take a CAT to familiarize themselves with the questions and randomly guess the correct answers on almost all items in the test. Or examinees may have preknowledge of some of the items in the item pool and correctly answer these items independent of their trait level and the item characteristics. These types of aberrant behavior invalidate the ability estimate and it therefore seems useful to investigate the fit of an item score pattern to the test model. Research with respect to methods that provide information about the fit of an individual item score pattern to a test model is usually referred to as appropriateness measurement or person-fit measurement. Most studies in this area are, however, in the context of paper-and-pencil (P&P) tests. As will be argued below, the application of person-fit theory presented in the context of P&P tests cannot simply be generalized to CAT. The aim of this article is first to give an introduction to existing person-fit research in the context of P&P tests, then to discuss some


Applied Psychological Measurement | 2002

Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items.

Edith M. L. A. van Krimpen-Stoop; Rob R. Meijer

Item scores that do not fit an assumed item response theory model may cause the latent trait value to be inaccurately estimated. For a computerized adaptive test (CAT) using dichotomous items, several person-fit statistics for detecting mis.tting item score patterns have been proposed. Both for paper-and-pencil (P&P) tests and CATs, detection ofperson mis.t with polytomous items is hardly explored. In this study, the nominal and empirical null distributions ofthe standardized log-likelihood statistic for polytomous items are compared both for P&P tests and CATs. Results showed that the empirical distribution of this statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Second, a new person-fit statistic based on the cumulative sum (CUSUM) procedure from statistical process control was proposed. By means ofsimulated data, critical values were determined that can be used to classify a pattern as fitting or misfitting. The effectiveness of the CUSUM to detect simulees with item preknowledge was investigated. Detection rates using the CUSUM were high for realistic numbers ofdisclosed items.


Essays on item response theory | 2001

Person Fit Across Subgroups: An Achievement Testing Example

Rob R. Meijer; Edith M. L. A. van Krimpen-Stoop

Item response theory (IRT) models are used to describe answering behavior on tests and examinations. Although items may fit an IRT model, some persons may produce misfitting item score patterns, for example, as a result of cheating or lack of motivation. Several statistics have been proposed to detect deviant item score patterns. Misfitting item score patterns may be related to group characteristics such as gender or race. Investigating misfitting item score patterns across different groups is strongly related to differential item functioning (DIF). In this study the usefulness of person fit to compare item score patterns for different groups was investigated. In particular, the effect of misspecification of a model due to DIF on person fit was explored. Empirical data of a math test were analyzed with respect to misfitting item score patterns and DIF for men and women and blacks and whites. Results indicated that there were small differences between subgroups with respect to the number of misfitting item score patterns. Also, the influence of DIF on the fit of a score pattern was small for both gender and ethnic groups. The results imply that person-fit analysis is not very sensitive to model misspecification on the item level.


OMD research report | 1998

Simulating the null distribution of person-fit statistics for conventional and adaptive tests

Rob R. Meijer; Edith M. L. A. van Krimpen-Stoop


OMD Research Report | 1998

Statistical tests for person misfit in computerized adaptive testing

Cees A. W. Glas; Rob R. Meijer; Edith M. L. A. van Krimpen-Stoop


LSAC research report series | 2006

Exploring new methods to detect person misfit in CAT

Rob R. Meijer; Edith M. L. A. van Krimpen-Stoop


LSAC research report series | 2005

The use of person-fit statistics in computerized adaptive testing

Rob R. Meijer; Edith M. L. A. van Krimpen-Stoop

Collaboration


Dive into the Edith M. L. A. van Krimpen-Stoop's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge