Marilyn S. Wingersky
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marilyn S. Wingersky.
Applied Psychological Measurement | 1984
Marilyn S. Wingersky; Frederic M. Lord
The sampling errors of maximum likelihood esti mates of item response theory parameters are studied in the case when both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is investigated. Finally, the ef fect of anchor-test length on the standard error of item parameters is studied numerically for the situation, common in equating studies, when two groups of ex aminees each take a different test form together with the same anchor test. The results encourage the use of rectangular or bimodal ability distributions, and also the use of very short anchor tests.
Applied Psychological Measurement | 1994
Rebecca Zwick; Dorothy T. Thayer; Marilyn S. Wingersky
Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel method of differential item functioning (DIF) analysis in computerized adaptive tests (CATs). Each simulated examinee received 25 items from a 75-item pool. A three-parameter logistic item response theory (IRT) model was assumed, and examinees were matched on expected true scores based on their CAT responses and estimated item parameters. The CAT-based DIF statistics were found to be highly correlated with DIF statistics based on nonadaptive administration of all 75 pool items and with the true magnitudes of DIF in the simulation. Average DIF statistics and average standard errors also were examined for items with various characteristics. Finally, a study was conducted of the accuracy with which the modified Mantel-Haenszel procedure could identify CAT items with substantial DIF using a classification system now implemented by some testing programs. These additional analyses provided further evidence that the CAT-based DIF procedures performed well. More generally, the results supported the use of IRT-based matching variables in DIF analysis. Index terms: adaptive testing, computerized adaptive testing, differential item functioning, item bias, item response theory.
Applied Psychological Measurement | 1982
Isaac I. Bejar; Marilyn S. Wingersky
The study reports a feasibility study using item response theory (IRT) as a means of equating the Test of Standard Written English (TSWE). The study focused on the possibility of pre-equating, that is, deriving the equating transformation prior to the final administration of the test. The three-parameter logistic model was postulated as the response model and its fit was assessed at the item, subscore, and total score level. Minor problems were found at each of these levels; but, on the whole, the three-parameter model was found to portray the data well. The adequacy of the equating provided by IRT procedures was investigated in two TSWE forms. It was concluded that pre-equating does not appear to present problems beyond those inherent to IRT-equating.
ETS Research Report Series | 1993
Rebecca Zwick; Dorothy T. Thayer; Marilyn S. Wingersky
Simulated data were used to investigate the performance of modified versions of the Mantel-Haenszel and standardization methods of differential item functioning (DIF) analysis in computer-adaptive tests (CATs). Each “examinee” received 25 items out of a 75-item pool. A three-parameter logistic item response model was assumed, and examinees were matched on expected true scores based on their CAT responses and on estimated item parameters. Both DIF methods performed well. The CAT-based DIF statistics were highly correlated with DIF statistics based on nonadaptive administration of all 75 pool items and with the true magnitudes of DIF in the simulation. DIF methods were also investigated for “pretest items,” for which item parameter estimates were assumed to be unavailable. The pretest DIF statistics were generally well-behaved and also had high correlations with the true DIF. The pretest DIF measures, however, tended to be slightly smaller in magnitude than their CAT-based counterparts. Also, in the case of the Mantel-Haenszel approach, the pretest DIF statistics tended to have somewhat larger standard errors than the CAT DIF statistics.
Educational and Psychological Measurement | 1969
Marilyn S. Wingersky; Diana M. Lees; Virginia Lennon; Frederic M. Lord
Abstract : The program takes a frequency distribution of number-right test scores and produces (1) an estimated distribution of true scores for the group tested, computed on the assumption that the errors of measurement have a certain compound binomial distribution, (2) the corresponding smoothed distribution of actual scores, and (3) a chi-square for comparing the smoothed and the actual distributions. All instructions and background information necessary for practical use of the program are given. (Author)
Applied Psychological Measurement | 1984
Frederic M. Lord; Marilyn S. Wingersky
Journal of Educational Measurement | 1993
Robert J. Mislevy; Kathleen M. Sheehan; Marilyn S. Wingersky
Journal of Educational Measurement | 1995
Rebecca Zwick; Dorothy T. Thayer; Marilyn S. Wingersky
ETS Research Report Series | 1982
Frederic M. Lord; Marilyn S. Wingersky
British Journal of Mathematical and Statistical Psychology | 1991
Robert J. Mislevy; Marilyn S. Wingersky; Sidney H. Irvine; Peter L. Dann