Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wim J. van der Linden is active.

Publication


Featured researches published by Wim J. van der Linden.


Applied Psychological Measurement | 1995

IRT-based internal measures of differential functioning of items and tests

Nambury S. Raju; Wim J. van der Linden; Paul F. Fleer

Internal measures of differential functioning of items and tests (DHFIT) based on item response theory (IRT) are proposed. Within the DFIT context, the new differential test functioning (DTF) index leads to two new measures of differential item functioning (DIF) with the following properties: (1) The compensatory DIF (CDIF) indexes for all items in a test sum to the DTF index for that test and, unlike current DIF procedures, the CDIF index for an item does not assume that the other items in the test are unbi ased ; (2) the noncompensatory DIF (NCDIF) index, which assumes that the other items in the test are unbiased, is comparable to some of the IRT-based DIP indexes; and (3) COIF and NCDIF, as well as DTF, are equally valid for polytomous and multidimensional IRT models. Monte carlo study results, comparing these indexes with Lords χ2 test, the signed area measure, and the unsigned area measure, demonstrate that the DFIT framework is accu rate in assessing DTF, COIF, and NCDIF.


Applied Psychological Measurement | 1998

Optimal Assembly of Psychological and Educational Tests

Wim J. van der Linden

The advent of computers in psychological and educational measurement has led to the need for algorithms for optimal assembly of tests from item banks. This paper reviews optimal test assembly literature and introduces the contributions to this Special Issue. Four approaches to computerized test assembly are discussed: heuristic-based test assembly, 0-1 linear programming, network-flow programming, and an optimal design approach. Ap-plications of these methods to a variety of problems are examined, including IRT-based test assembly, classical test assembly, assembling multiple test forms, item matching, observed-score equating, constrained adaptive testing, assembling tests with item sets, item bank design, and assembling tests with multiple traits. A bibliography on optimal test assembly is provided.The advent of computers in psychological and educational measurement has led to the need for algorithms for optimal assembly of tests from item banks. This paper reviews optimal test assembly literature and introduces the contributions to this Special Issue. Four approaches to computerized test assembly are discussed: heuristic-based test assembly, 0-1 linear programming, network-flow programming, and an optimal design approach. Ap-plications of these methods to a variety of problems are examined, including IRT-based test assembly, classical test assembly, assembling multiple test forms, item matching, observed-score equating, constrained adaptive testing, assembling tests with item sets, item bank design, and assembling tests with multiple traits. A bibliography on optimal test assembly is provided.


Computer adaptive testing: Theory and practice | 2000

Item selection and ability estimation in adaptive testing

Wim J. van der Linden; Peter J. Pashley

The last century saw a tremendous progression in the refinement and use of standardized linear tests. The first administered College Board exam occurred in 1901 and the first Scholastic Assessment Test (SAT) was given in 1926. Since then, progressively more sophisticated standardized linear tests have been developed for a multitude of assessment purposes, such as college placement, professional licensure, higher-education admissions, and tracking educational standing or progress. Standardized linear tests are now administered around the world. For example, the Test of English as a Foreign Language (TOEFL) has been delivered in approximately 88 countries.


Psychometrika | 2009

Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection

Joris Mulder; Wim J. van der Linden

Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing. In particular, it is examined what criteria are appropriate for adaptive testing in which all abilities are intentional, some should be considered as a nuisance, or the interest is in the testing of a composite of the abilities. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. The criterion of E-optimality showed occasional erratic behavior for this case of adaptive testing, and its use is not recommended. If some of the abilities are nuisances, application of the criterion of As-optimality (or Ds-optimality), which focuses on the subset of intentional abilities is recommended. For the measurement of a linear combination of abilities, the criterion of c-optimality yielded the best results. The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters.


Psychometrika | 2003

USING RESPONSE TIMES TO DETECT ABERRANT RESPONSES IN COMPUTERIZED ADAPTIVE TESTING

Wim J. van der Linden; Edith M. L. A. van Krimpen-Stoop

A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.


Applied Psychological Measurement | 2010

IRT parameter estimation with response times as collateral information

Wim J. van der Linden; Rinke Klein Entink; Jean-Paul Fox

Hierarchical modeling of responses and response times on test items facilitates the use of response times as collateral information in the estimation of the response parameters. In addition to the regular information in the response data, two sources of collateral information are identified: (a) the joint information in the responses and the response times summarized in the estimates of the second-level parameters and (b) the information in the posterior distribution of the response parameters given the response times. The latter is shown to be a natural empirical prior distribution for the estimation of the response parameters. Unlike traditional hierarchical item response theory (IRT) modeling, where the gain in estimation accuracy is typically paid for by an increase in bias, use of this posterior predictive distribution improves both the accuracy and the bias of IRT parameter estimates. In an empirical study, the improvements are demonstrated for the estimation of the person and item parameters in a three-parameter response model.


European Journal of Operational Research | 1991

Achievement test construction using 0-1 linear programming

Jos J. Adema; Ellen Boekkooi-Timminga; Wim J. van der Linden

In educational testing the work of professional test agencies has shown a trend towards item banking. Achievement test construction is viewed as selecting items from a test item bank such that certain specifications are met. As the number of possible tests is large and practice usually imposes various constraints on the selection process, a mathematical programming approach is obvious. In this paper it is shown how to formulate achievement test construction as a 0?1 linear programming problem. A heuristic for solving the problem is proposed and two examples are given. It is concluded that a 0?1 linear programming approach fits the problem of test construction in an appropriate way and offers test agencies the possibility of computerizing their services.


Journal of Educational and Behavioral Statistics | 2006

Assembling a computerized adaptive testing item pool as a set of linear tests

Wim J. van der Linden; Adelaide Ariel; Bernard P. Veldkamp

Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool.


Zeitschrift Fur Psychologie-journal of Psychology | 2008

Some New Developments in Adaptive Testing Technology

Wim J. van der Linden

In an ironic twist of history, modern psychological testing has returned to an adaptive format quite common when testing was not yet standardized. Important stimuli to the renewed interest in adaptive testing have been the development of item-response theory in psychometrics, which models the responses on test items using separate parameters for the items and test takers, and the use of computers in test administration, which enables us to estimate the parameter for a test taker and select the items in real time. This article reviews a selection from the latest developments in the technology of adaptive testing, such as constrained adaptive item selection, adaptive testing using rule-based item generation, multidimensional adaptive testing, adaptive use of test batteries, and the use of response times in adaptive testing.


Applied Psychological Measurement | 2006

Detecting Answer Copying Using the Kappa Statistic

Leonardo S. Sotaridona; Wim J. van der Linden; Rob R. Meijer

A statistical test for detecting answer copying on multiple-choice tests based on Cohens kappa is proposed. The test is free of any assumptions on the response processes of the examinees suspected of copying and having served as the source, except for the usual assumption that these processes are probabilistic. Because the asymptotic null and alternative distributions of the kappa statistic are derived under the assumption of common marginal probabilities for all items, a recoding of the item alternatives is proposed to approximate this case. The results from a simulation study in this article show that under this recoding, the test approximates its nominal Type I error rates and has promising power functions.

Collaboration


Dive into the Wim J. van der Linden's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge