Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jesper Tijmstra is active.

Publication


Featured researches published by Jesper Tijmstra.


Journal of Educational and Behavioral Statistics | 2014

Testing for aberrant behavior in response time modeling

Sukaesi Marianti; Jean-Paul Fox; Marianna Avetisyan; Bernard P. Veldkamp; Jesper Tijmstra

Many standardized tests are now administered via computer rather than paper-and-pencil format. In a computer-based testing environment, it is possible to record not only the test taker’s response to each question (item) but also the amount of time spent by the test taker in considering and answering each item. Response times (RTs) provide information not only about the test taker’s ability and response behavior but also about item and test characteristics. This study focuses on the use of RTs to detect aberrant test-taker responses. An example of such aberrance is a correct answer with a short RT on a difficult question. Such aberrance may be displayed when a test taker or test takers have preknowledge of the items. Another example is rapid guessing, wherein the test taker displays unusually short RTs for a series of items. When rapid guessing occurs at the end of a timed test, it often indicates that the test taker has run out of time before completing the test. In this study, Bayesian tests of significance for detecting various types of aberrant RT patterns are proposed and evaluated. In a simulation study, the tests were successful in identifying aberrant response patterns. A real data example is given to illustrate the use of the proposed person-fit tests for RTs.


Psychometrika | 2017

Modelling Conditional Dependence Between Response Time and Accuracy

Maria Bolsinova; Paul De Boeck; Jesper Tijmstra

The assumption of conditional independence between response time and accuracy given speed and ability is commonly made in response time modelling. However, this assumption might be violated in some cases, meaning that the relationship between the response time and the response accuracy of the same item cannot be fully explained by the correlation between the overall speed and ability. We propose to explicitly model the residual dependence between time and accuracy by incorporating the effects of the residual response time on the intercept and the slope parameter of the IRT model for response accuracy. We present an empirical example of a violation of conditional independence from a low-stakes educational test and show that our new model reveals interesting phenomena about the dependence of the item properties on whether the response is relatively fast or slow. For more difficult items responding slowly is associated with a higher probability of a correct response, whereas for the easier items responding slower is associated with a lower probability of a correct response. Moreover, for many of the items slower responses were less informative for the ability because their discrimination parameters decrease with residual response time.


Journal of Educational and Behavioral Statistics | 2016

Posterior predictive checks for conditional independence between response time and accuracy

Maria Bolsinova; Jesper Tijmstra

Conditional independence (CI) between response time and response accuracy is a fundamental assumption of many joint models for time and accuracy used in educational measurement. In this study, posterior predictive checks (PPCs) are proposed for testing this assumption. These PPCs are based on three discrepancy measures reflecting different observable consequences of different types of violations of CI. Simulation studies are performed to evaluate the specificity of the procedure, its robustness, and its sensitivity to detect different types of conditional dependence and to compare it to existing methods. The new procedure outperforms the existing methods in most of the simulation conditions. The use of the PPCs is illustrated using arithmetics test data.


British Journal of Mathematical and Statistical Psychology | 2017

Response moderation models for conditional dependence between response time and response accuracy

Maria Bolsinova; Jesper Tijmstra; Dylan Molenaar

It is becoming more feasible and common to register response times in the application of psychometric tests. Researchers thus have the opportunity to jointly model response accuracy and response time, which provides users with more relevant information. The most common choice is to use the hierarchical model (van der Linden, 2007, Psychometrika, 72, 287), which assumes conditional independence between response time and accuracy, given a persons speed and ability. However, this assumption may be violated in practice if, for example, persons vary their speed or differ in their response strategies, leading to conditional dependence between response time and accuracy and confounding measurement. We propose six nested hierarchical models for response time and accuracy that allow for conditional dependence, and discuss their relationship to existing models. Unlike existing approaches, the proposed hierarchical models allow for various forms of conditional dependence in the model and allow the effect of continuous residual response time on response accuracy to be item-specific, person-specific, or both. Estimation procedures for the models are proposed, as well as two information criteria that can be used for model selection. Parameter recovery and usefulness of the information criteria are investigated using simulation, indicating that the procedure works well and is likely to select the appropriate model. Two empirical applications are discussed to illustrate the different types of conditional dependence that may occur in practice and how these can be captured using the proposed hierarchical models.


Frontiers in Psychology | 2017

Conditional Dependence between Response Time and Accuracy: An Overview of its Possible Sources and Directions for Distinguishing between Them

Maria Bolsinova; Jesper Tijmstra; Dylan Molenaar; Paul De Boeck

With the widespread use of computerized tests in educational measurement and cognitive psychology, registration of response times has become feasible in many applications. Considering these response times helps provide a more complete picture of the performance and characteristics of persons beyond what is available based on response accuracy alone. Statistical models such as the hierarchical model (van der Linden, 2007) have been proposed that jointly model response time and accuracy. However, these models make restrictive assumptions about the response processes (RPs) that may not be realistic in practice, such as the assumption that the association between response time and accuracy is fully explained by taking speed and ability into account (conditional independence). Assuming conditional independence forces one to ignore that many relevant individual differences may play a role in the RPs beyond overall speed and ability. In this paper, we critically consider the assumption of conditional independence and the important ways in which it may be violated in practice from a substantive perspective. We consider both conditional dependences that may arise when all persons attempt to solve the items in similar ways (homogeneous RPs) and those that may be due to persons differing in fundamental ways in how they deal with the items (heterogeneous processes). The paper provides an overview of what we can learn from observed conditional dependences. We argue that explaining and modeling these differences in the RPs is crucial to increase both the validity of measurement and our understanding of the relevant RPs.


Measurement: Interdisciplinary Research & Perspective | 2015

Can Response Speed Be Fixed Experimentally, and Does This Lead to Unconfounded Measurement of Ability?.

Maria Bolsinova; Jesper Tijmstra

Goldhammer (this issue) proposes an interesting approach to dealing with the speededness of item responses. Rather than modeling speed as a latent variable that varies from person to person, he proposes to use experimental conditions that are expected to fix the speed, thereby eliminating individual differences on this dimension in order to make unconfounded comparisons of a person’s ability possible. We applaud his efforts for considering the gains that can be obtained by changing the test conditions to better match the measurement aims of ability tests, rather than just considering altering the measurement model. We agree that the model provides an interesting theoretical exploration into possible conditions under which the measurement of speed and ability would not be confounded by the speed ability compromise. However, the model is only able to achieve this unconfounded measurement of ability by imposing a number of restrictive assumptions. We believe that the merit of the approach will depend on the extent to which these assumptions are likely to be met in practice. We will discuss two main concerns: issues with the practical realizability of fixing effective speed and consequences of fixing speed for the measurement of ability.


Educational and Psychological Measurement | 2017

Item-Score Reliability in Empirical-Data Sets and Its Relationship With Other Item Indices:

Eva A. O. Zijlmans; Jesper Tijmstra; Klaas Sijtsma

Reliability is usually estimated for a total score, but it can also be estimated for item scores. Item-score reliability can be useful to assess the repeatability of an individual item score in a group. Three methods to estimate item-score reliability are discussed, known as method MS, method λ 6 , and method CA. The item-score reliability methods are compared with four well-known and widely accepted item indices, which are the item-rest correlation, the item-factor loading, the item scalability, and the item discrimination. Realistic values for item-score reliability in empirical-data sets are monitored to obtain an impression of the values to be expected in other empirical-data sets. The relation between the three item-score reliability methods and the four well-known item indices are investigated. Tentatively, a minimum value for the item-score reliability methods to be used in item analysis is recommended.


Psychometrika | 2013

Testing Manifest Monotonicity Using Order-Constrained Statistical Inference

Jesper Tijmstra; David J. Hessen; Peter G. M. van der Heijden; Klaas Sijtsma

Most dichotomous item response models share the assumption of latent monotonicity, which states that the probability of a positive response to an item is a nondecreasing function of a latent variable intended to be measured. Latent monotonicity cannot be evaluated directly, but it implies manifest monotonicity across a variety of observed scores, such as the restscore, a single item score, and in some cases the total score. In this study, we show that manifest monotonicity can be tested by means of the order-constrained statistical inference framework. We propose a procedure that uses this framework to determine whether manifest monotonicity should be rejected for specific items. This approach provides a likelihood ratio test for which the p-value can be approximated through simulation. A simulation study is presented that evaluates the Type I error rate and power of the test, and the procedure is applied to empirical data.


Psychometrika | 2015

Evaluating Manifest Monotonicity Using Bayes Factors

Jesper Tijmstra; Herbert Hoijtink; Klaas Sijtsma

The assumption of latent monotonicity in item response theory models for dichotomous data cannot be evaluated directly, but observable consequences such as manifest monotonicity facilitate the assessment of latent monotonicity in real data. Standard methods for evaluating manifest monotonicity typically produce a test statistic that is geared toward falsification, which can only provide indirect support in favor of manifest monotonicity. We propose the use of Bayes factors to quantify the degree of support available in the data in favor of manifest monotonicity or against manifest monotonicity. Through the use of informative hypotheses, this procedure can also be used to determine the support for manifest monotonicity over substantively or statistically relevant alternatives to manifest monotonicity, rendering the procedure highly flexible. The performance of the procedure is evaluated using a simulation study, and the application of the procedure is illustrated using empirical data.


Psychonomic Bulletin & Review | 2018

Why checking model assumptions using null hypothesis significance tests does not suffice: A plea for plausibility

Jesper Tijmstra

This article explores whether the null hypothesis significance testing (NHST) framework provides a sufficient basis for the evaluation of statistical model assumptions. It is argued that while NHST-based tests can provide some degree of confirmation for the model assumption that is evaluated—formulated as the null hypothesis—these tests do not inform us of the degree of support that the data provide for the null hypothesis and to what extent the null hypothesis should be considered to be plausible after having taken the data into account. Addressing the prior plausibility of the model assumption is unavoidable if the goal is to determine how plausible it is that the model assumption holds. Without assessing the prior plausibility of the model assumptions, it remains fully uncertain whether the model of interest gives an adequate description of the data and thus whether it can be considered valid for the application at hand. Although addressing the prior plausibility is difficult, ignoring the prior plausibility is not an option if we want to claim that the inferences of our statistical model can be relied upon.

Collaboration


Dive into the Jesper Tijmstra's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge