Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tim Moses is active.

Publication


Featured researches published by Tim Moses.


British Journal of Mathematical and Statistical Psychology | 2010

A comparison of statistical selection strategies for univariate and bivariate log-linear models

Tim Moses; Paul W. Holland

In this study, eight statistical selection strategies were evaluated for selecting the parameterizations of log-linear models used to model the distributions of psychometric tests. The selection strategies included significance tests based on four chi-squared statistics (likelihood ratio, Pearson, Freeman-Tukey, and Cressie-Read) and four additional strategies (Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC), and a measure attributed to Goodman). The strategies were evaluated in simulations for different log-linear models of univariate and bivariate test-score distributions and two sample sizes. Results showed that all eight selection strategies were most accurate for the largest sample size considered. For univariate distributions, the AIC selection strategy was especially accurate for selecting the correct parameterization of a complex log-linear model and the likelihood ratio chi-squared selection strategy was the most accurate strategy for selecting the correct parameterization of a relatively simple log-linear model. For bivariate distributions, the likelihood ratio chi-squared, Freeman-Tukey chi-squared, BIC, and CAIC selection strategies had similarly high selection accuracies.


Archive | 2009

Equating Test Scores: Toward Best Practices

Neil J. Dorans; Tim Moses; Daniel R. Eignor

Score equating is essential for any testing program that continually produces new editions of a test and for which the expectation is that scores from these editions have the same meaning over time. Different editions may be built to a common blueprint and designed to measure the same constructs, but they almost invariably differ somewhat in their psychometric properties. If one edition is more difficult than another, examinees would be expected to receive lower scores on the harder form. Score equating seeks to eliminate the effects on scores of these unintended differences in test form difficulty. Score equating is necessary to be fair to examinees and to provide score users with scores that mean the same thing across different editions or forms of the test.


Applied Psychological Measurement | 2011

Two Approaches for Using Multiple Anchors in NEAT Equating: A Description and Demonstration.

Tim Moses; Weiling Deng; Yu-Li Zhang

Nonequivalent groups with anchor test (NEAT) equating functions that use a single anchor can have accuracy problems when the groups are extremely different and/or when the anchor weakly correlates with the tests being equated. Proposals have been made to address these issues by incorporating more than one anchor into NEAT equating functions. These proposals have not been extensively considered or comparatively evaluated. This study evaluates two proposed approaches for incorporating more than one anchor into NEAT equating functions, poststratification and missing data imputation. The approaches are studied and compared in an example of equating mixed format tests where the use of multiple equating is expected to improve equating. The results show that both approaches produced nearly equivalent equating results but that the poststratification approach has some flexibility and accuracy advantages over imputation in terms of standard errors.


Journal of Educational and Behavioral Statistics | 2010

A Comparison of Strategies for Estimating Conditional DIF

Tim Moses; Jing Miao; Neil J. Dorans

In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size combinations for the reference and focal group data. Results showed that logistic regression was the most recommended strategy in terms of the bias and variability of its estimates. The log-linear models strategy had flexibility advantages, but these advantages only offset the greater variability of its estimates when sample sizes were large. Kernel smoothing was the least accurate of the considered strategies due to estimation problems when the reference and focal groups differed in overall ability.


Journal of Educational and Behavioral Statistics | 2011

Standard Errors of Equating Differences

Tim Moses; Wenmin Zhang

The purpose of this article was to extend the use of standard errors for equated score differences (SEEDs) to traditional equating functions. The SEEDs are described in terms of their original proposal for kernel equating functions and extended so that SEEDs for traditional linear and traditional equipercentile equating functions can be computed. These developments provide new understandings of the relationships between kernel and traditional equating functions that expand on prior developments of SEEDs and of standard errors of equating functions. The developments are demonstrated for an equivalent groups equating situation. The accuracies of the SEEDs are evaluated in simulations conducted using an equivalent groups equating example.


Applied Psychological Measurement | 2011

A SAS IML Macro for Loglinear Smoothing.

Tim Moses; Alina A. von Davier

Polynomial loglinear models for one-, two-, and higher-way contingency tables (Bock & Yates, 1973; Haberman, 1974, 1978-1979) have important applications to measurement and assessment (Hanson, 1991; Holland & Thayer, 2000; Rosenbaum & Thayer, 1987). Two such applications are test score distribution estimation (Kolen, 1991) and distribution comparison (Hanson, 1996). Another application is the estimation of stable equipercentile equating functions (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004; Livingston, 1993). In these applications, the polynomial loglinear models are essentially regarded as a smoothing technique, which is commonly referred to as loglinear smoothing. Although routines exist in standard statistical analysis software that can be used to implement loglinear smoothing (e.g., Hanson, 1996; Moses, von Davier, & Casabianca, 2004), these routines are limited with respect to their convergence rates for complex smoothing problems and the output they produce. A SAS IML (SAS Institute, 2002a) macro was therefore created to implement loglinear smoothing according to Holland and Thayer’s (2000) specifications. This macro is flexible enough to handle complicated smoothing problems, has a higher convergence rate than SAS PROC GENMOD (SAS Institute, 2002b), and produces a variety of model fit statistics. The macro also produces ‘‘C-matrices,’’ the low-rank matrix factors of the covariance matrix of estimated probabilities that can be used to compute standard errors and confidence intervals for the estimated probabilities.


Journal of Educational and Behavioral Statistics | 2008

Using the Kernel Method of Test Equating for Estimating the Standard Errors of Population Invariance Measures

Tim Moses

Equating functions are supposed to be population invariant, meaning that the choice of subpopulation used to compute the equating function should not matter. The extent to which equating functions are population invariant is typically assessed in terms of practical difference criteria that do not account for equating functions’ sampling variability. This article shows how to extend the framework of kernel equating so that the standard errors of the root mean square difference (RMSD) and of the difference between two subpopulations’ equated scores can be estimated. An investigation of population invariance for the equivalent groups design is discussed. The accuracies of the derived standard errors are evaluated with respect to empirical standard errors. This evaluation shows that the accuracy of the standard error estimates for the equated score differences is better than for the RMSD and that accuracy for both standard error estimates is best when sample sizes are large.


International Journal of Testing | 2013

Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

Sooyeon Kim; Tim Moses

The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the manipulation of two psychometric conditions, namely the proportion of CR components in a test and the magnitudes of correlations between two raters. In general, examinees were classified into the same Pass/Fail category when the contribution of the CR component was low and the interrater correlation was substantial. Under these conditions, the use of single scoring would reduce scoring time and cost without increasing classification inconsistency.


Educational and Psychological Measurement | 2014

Alternative Smoothing and Scaling Strategies for Weighted Composite Scores

Tim Moses

In this study, smoothing and scaling approaches are compared for estimating subscore-to-composite scaling results involving composites computed as rounded and weighted combinations of subscores. The considered smoothing and scaling approaches included those based on raw data, on smoothing the bivariate distribution of the subscores, on smoothing the bivariate distribution of the subscore and weighted composite, and two weighted averages of the raw and smoothed marginal distributions. Results from simulations showed that the approaches differed in terms of their estimation accuracy for scaling situations with smaller and larger sample sizes, and on weighted composite distributions of varied complexity.


Educational and Psychological Measurement | 2012

Evaluating Ranking Strategies in Assessing Change When the Measures Differ Across Time

Tim Moses; Sooyeon Kim

In this study, a ranking strategy was evaluated for comparing subgroups’ change using identical, equated, and nonidentical measures. Four empirical data sets were evaluated, each of which contained examinees’ scores on two occasions, where the two occasions’ scores were obtained on a single identical measure, on two equated tests, and on two nonidentical measures. The two subgroups’ rates of change were compared based on ranked nonidentical measures, on raw and ranked equated measures, and on a raw and ranked identical measure. The results of comparing subgroups’ change were similar when based on the nonidentical measures and on the identical and equated measures. Additional evaluations using simulated data demonstrated that the ranking strategy proposed for nonidentical measures is accurate, especially when the subgroups were large, the difference between the subgroups’ change was large, and scores obtained on the measure(s) were highly correlated across occasions. The statistical power of the proposed ranking method is slightly reduced because of the tendency of the nonidentical measures to have relatively low correlations.

Collaboration


Dive into the Tim Moses's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lei Yu

Princeton University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge