Maria Bolsinova
University of Amsterdam
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maria Bolsinova.
British Journal of Mathematical and Statistical Psychology | 2016
Maria Bolsinova; Gunter Maris
An important distinction between different models for response time and accuracy is whether conditional independence (CI) between response time and accuracy is assumed. In the present study, a test for CI given an exponential family model for accuracy (for example, the Rasch model or the one-parameter logistic model) is proposed and evaluated in a simulation study. The procedure is based on the non-parametric Kolmogorov-Smirnov tests. As an illustrative example, the CI test was applied to data from an arithmetics test for secondary education.
Psychometrika | 2017
Maria Bolsinova; Paul De Boeck; Jesper Tijmstra
The assumption of conditional independence between response time and accuracy given speed and ability is commonly made in response time modelling. However, this assumption might be violated in some cases, meaning that the relationship between the response time and the response accuracy of the same item cannot be fully explained by the correlation between the overall speed and ability. We propose to explicitly model the residual dependence between time and accuracy by incorporating the effects of the residual response time on the intercept and the slope parameter of the IRT model for response accuracy. We present an empirical example of a violation of conditional independence from a low-stakes educational test and show that our new model reveals interesting phenomena about the dependence of the item properties on whether the response is relatively fast or slow. For more difficult items responding slowly is associated with a higher probability of a correct response, whereas for the easier items responding slower is associated with a lower probability of a correct response. Moreover, for many of the items slower responses were less informative for the ability because their discrimination parameters decrease with residual response time.
Journal of Educational and Behavioral Statistics | 2016
Maria Bolsinova; Jesper Tijmstra
Conditional independence (CI) between response time and response accuracy is a fundamental assumption of many joint models for time and accuracy used in educational measurement. In this study, posterior predictive checks (PPCs) are proposed for testing this assumption. These PPCs are based on three discrepancy measures reflecting different observable consequences of different types of violations of CI. Simulation studies are performed to evaluate the specificity of the procedure, its robustness, and its sensitivity to detect different types of conditional dependence and to compare it to existing methods. The new procedure outperforms the existing methods in most of the simulation conditions. The use of the PPCs is illustrated using arithmetics test data.
British Journal of Mathematical and Statistical Psychology | 2017
Maria Bolsinova; Jesper Tijmstra; Dylan Molenaar
It is becoming more feasible and common to register response times in the application of psychometric tests. Researchers thus have the opportunity to jointly model response accuracy and response time, which provides users with more relevant information. The most common choice is to use the hierarchical model (van der Linden, 2007, Psychometrika, 72, 287), which assumes conditional independence between response time and accuracy, given a persons speed and ability. However, this assumption may be violated in practice if, for example, persons vary their speed or differ in their response strategies, leading to conditional dependence between response time and accuracy and confounding measurement. We propose six nested hierarchical models for response time and accuracy that allow for conditional dependence, and discuss their relationship to existing models. Unlike existing approaches, the proposed hierarchical models allow for various forms of conditional dependence in the model and allow the effect of continuous residual response time on response accuracy to be item-specific, person-specific, or both. Estimation procedures for the models are proposed, as well as two information criteria that can be used for model selection. Parameter recovery and usefulness of the information criteria are investigated using simulation, indicating that the procedure works well and is likely to select the appropriate model. Two empirical applications are discussed to illustrate the different types of conditional dependence that may occur in practice and how these can be captured using the proposed hierarchical models.
Frontiers in Psychology | 2017
Maria Bolsinova; Jesper Tijmstra; Dylan Molenaar; Paul De Boeck
With the widespread use of computerized tests in educational measurement and cognitive psychology, registration of response times has become feasible in many applications. Considering these response times helps provide a more complete picture of the performance and characteristics of persons beyond what is available based on response accuracy alone. Statistical models such as the hierarchical model (van der Linden, 2007) have been proposed that jointly model response time and accuracy. However, these models make restrictive assumptions about the response processes (RPs) that may not be realistic in practice, such as the assumption that the association between response time and accuracy is fully explained by taking speed and ability into account (conditional independence). Assuming conditional independence forces one to ignore that many relevant individual differences may play a role in the RPs beyond overall speed and ability. In this paper, we critically consider the assumption of conditional independence and the important ways in which it may be violated in practice from a substantive perspective. We consider both conditional dependences that may arise when all persons attempt to solve the items in similar ways (homogeneous RPs) and those that may be due to persons differing in fundamental ways in how they deal with the items (heterogeneous processes). The paper provides an overview of what we can learn from observed conditional dependences. We argue that explaining and modeling these differences in the RPs is crucial to increase both the validity of measurement and our understanding of the relevant RPs.
Measurement: Interdisciplinary Research & Perspective | 2015
Maria Bolsinova; Jesper Tijmstra
Goldhammer (this issue) proposes an interesting approach to dealing with the speededness of item responses. Rather than modeling speed as a latent variable that varies from person to person, he proposes to use experimental conditions that are expected to fix the speed, thereby eliminating individual differences on this dimension in order to make unconfounded comparisons of a person’s ability possible. We applaud his efforts for considering the gains that can be obtained by changing the test conditions to better match the measurement aims of ability tests, rather than just considering altering the measurement model. We agree that the model provides an interesting theoretical exploration into possible conditions under which the measurement of speed and ability would not be confounded by the speed ability compromise. However, the model is only able to achieve this unconfounded measurement of ability by imposing a number of restrictive assumptions. We believe that the merit of the approach will depend on the extent to which these assumptions are likely to be met in practice. We will discuss two main concerns: issues with the practical realizability of fixing effective speed and consequences of fixing speed for the measurement of ability.
British Journal of Mathematical and Statistical Psychology | 2018
Dylan Molenaar; Maria Bolsinova; Jeroen K. Vermunt
In item response theory, modelling the item response times in addition to the item responses may improve the detection of possible between- and within-subject differences in the process that resulted in the responses. For instance, if respondents rely on rapid guessing on some items but not on all, the joint distribution of the responses and response times will be a multivariate within-subject mixture distribution. Suitable parametric methods to detect these within-subject differences have been proposed. In these approaches, a distribution needs to be assumed for the within-class response times. In this paper, it is demonstrated that these parametric within-subject approaches may produce false positives and biased parameter estimates if the assumption concerning the response time distribution is violated. A semi-parametric approach is proposed which resorts to categorized response times. This approach is shown to hardly produce false positives and parameter bias. In addition, the semi-parametric approach results in approximately the same power as the parametric approach.
Psychological Methods | 2017
Maria Bolsinova; Herbert Hoijtink; Jorine Adinda Vermeulen; Anton Béguin
Linking and equating procedures are used to make the results of different test forms comparable. In the cases where no assumption of random equivalent groups can be made some form of linking design is used. In practice the amount of data available to link the two tests is often very limited due to logistic and security reasons, which affects the precision of linking procedures. This study proposes to enhance the quality of linking procedures based on sparse data by using Bayesian methods which combine the information in the linking data with background information captured in informative prior distributions. We propose two methods for the elicitation of prior knowledge about the difference in difficulty of two tests from subject-matter experts and explain how these results can be used in the specification of priors. To illustrate the proposed methods and evaluate the quality of linking with and without informative priors, an empirical example of linking primary school mathematics tests is presented. The results suggest that informative priors can increase the precision of linking without decreasing the accuracy.
The Annals of Applied Statistics | 2016
Maria Bolsinova; Gunter Maris; Herbert Hoijtink
One of the important questions in the practice of educational testing is how a particular test should be scored. In this paper we consider what an appropriate simple scoring rule should be for the Dutch as a second language test consisting of listening and reading items. As in many other applications, here the Rasch model which allows to score the test with a simple sumscore is too restrictive to adequately represent the data. In this study we propose an exploratory algorithm which clusters the items into subscales each fitting a Rasch model and thus provides a scoring rule based on observed data. The scoring rule produces either a weighted sumscore based on equal weights within each subscale or a set of sumscores (one for each of the subscales). An MCMC algorithm which enables to determine the number of Rasch scales constituting the test and to unmix these scales is introduced and evaluated in simulations. Using the results of unmixing, we conclude that the Dutch language test can be scored with a weighted sumscore with three different weights.
Journal of traffic and transportation engineering | 2015
Erik Roelofs; Maria Bolsinova; Angela Verschoor; Jan Vissers
According to changed views on driver training and driver instructor preparation, a competence-based instructor exam was introduced in the Netherlands. The exam consists of two parts: (1) multimedia theory tests; (2) a performance lesson for driving instruction and coaching. An implicit idea behind the innovated exam is that it can have a positive backwash effect on the quality of driver instructor preparation programs. This study aims to evaluate the reliability, validity and fairness of the theoretical tests, which appear in different versions for successive groups of PDIs (prospective driving instructors). Data of 4,741 PDIs, enrolled during the period between January 2010 and October 2012, were used for analysis. The results of psychometric analyses show that the theory tests yielded reliable and fair decisions about instructor certification. The predictive validity of the theory tests for the final performance assessment was low. Implications for the design and on-the-fly maintenance of exam item banks are discussed. Follow-up studies will focus on the question, whether the improved instructor exam produces safer drivers in the end.