James A. Wollack | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James A. Wollack is active.

Explore More

Publication

Featured researches published by James A. Wollack.

Journal of Educational and Behavioral Statistics | 2001

A Mixture Item Response Model for Multiple-Choice Data.

Daniel M. Bolt; Allan S. Cohen; James A. Wollack

A mixture item response model is proposed for investigating individual differences in the selection of response categories in multiple-choice items. The model accounts for local dependence among response categories by assuming that examinees belong to discrete latent classes that have different propensities towards those responses. Varying response category propensities are captured by allowing the category intercept parameters in a nominal response model (Bock, 1972) to assume different values across classes. A Markov Chain Monte Carlo algorithm for the estimation of model parameters and classification of examinees is described. A real-data example illustrates how the model can be used to distinguish examinees that are disproportionately attracted to different types of distractors in a test of English usage. A simulation study evaluates item parameter recovery and classification accuracy in a hypothetical multiple-choice test designed to be diagnostic. Implications for test construction and the use of multiple-choice tests to perform cognitive diagnoses of item response patterns are discussed.

Applied Psychological Measurement | 1996

An Investigation of the Likelihood Ratio Test For Detection of Differential Item Functioning

Allan S. Cohen; Seock-Ho Kim; James A. Wollack

Type I error rates for the likelihood ratio test for de tecting differential item functioning (DIF) were investi gated using monte carlo simulations. Two- and three-parameter item response theory (IRT) models were used to generate 100 datasets of a 50-item test for samples of 250 and 1,000 simulated examinees for each IRT model. Item parameters were estimated by marginal maximum likelihood for three IRT models: the three-parameter model, the three-parameter model with a fixed guessing parameter, and the two-parameter model. All DIF comparisons were simulated by ran domly pairing two samples from each sample size and IRT model condition so that, for each sample size and IRT model condition, there were 50 pairs of reference and focal groups. Type I error rates for the two-param eter model were within theoretically expected values at each of the α levels considered. Type I error rates for the three-parameter and three-parameter model with a fixed guessing parameter, however, were different from the theoretically expected values at the α levels considered.

Applied Psychological Measurement | 2002

Recovery of Item Parameters in the Nominal Response Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.

James A. Wollack; Daniel M. Bolt; Allan S. Cohen; Young-Sun Lee

Markov chain Monte Carlo (MCMC) methods, such as Gibbs sampling, present an alternative to marginal maximum likelihood (MML) estimation, which offers some promise for parameter estimation particularly with complex models, in small sample situations, and for other applications where MML algorithms have not been established. MCMC circumvents the problems associated with implementing an estimation algorithm for complex, multidimensional probability distributions by sampling the parameters from each of the one-dimensional conditional posterior distributions at each stage of the Markov chain. In this article, the authors compared the quality of item parameter estimates for MML and MCMC with one type of complex item response theory model, the nominal response model. The quality of item parameter recovery was nearly identical for both MML and MCMC. Both methods tended to produce good estimates, even for short tests and relatively small sample sizes. Parameter recovery was best for items of moderate dif.culty (i.e., items matched to the latent trait distribution); recovery was worst for items that were extremely easy or dif.cult. The quality of item parameter recovery improved as test length increased from 10 to 30 items, but did not change as test length increased from 20 to 30 items. MCMC estimation takes substantially longer but appears to be a good surrogate for MML for those situations for which an MML algorithm has not been developed.

Applied Psychological Measurement | 1997

A Nominal Response Model Approach for Detecting Answer Copying

James A. Wollack

When examinees copy answers to test questions from other examinees, the validity of the test is compromised. Most available statistical procedures for detecting copying were developed out of classical test theory (CrT); hence, they suffer from sampledependent score and item statistics, and biased estimates of the expected number of answer matches between a pair of examinees. Item response theory (IRT) based procedures alleviate these problems; however, because they fail to compare the similarity of responses between neighboring examinees, they have relatively poor power for detecting copiers. A new IRT-based test statistic, wo, was compared with the best CUT-based index g2 under various copying conditions, amounts of copying, test lengths, and sample sizes. w consistently held the Type I error rate at or below the nominal level; g2 yielded substantially inflated Type I error rates. The power of w varied as a function of both test length and the percentage of items copied. w demonstrated good power to detect copiers, provided that at least 20% of the items were copied on an 80-item test and at least 30% were copied on a 40-item test. Based on these results, with regard to both Tbype I error rate and power, c appears to be more useful than g2 as a copying index.

Applied Psychological Measurement | 1998

Detection of Answer Copying with Unknown Item and Trait Parameters

James A. Wollack; Allan S. Cohen

Previous work on the use of w for detection of answer copying was based on the assumption that item parameters for the nominal response model were known. Such an assumption limits the usefulness of a, particularly in the classroom, because most teachers do not have a set of precalibrated items. This study investigated empirical Type I error rates and the power of w when the item and trait (0) parameters were unknown and estimated from datasets of 100 and 500 examinees. rlype I error rate was unaffected by estimating item parameters from the data. Power was slightly lower in the 100-examinee datasets, but was almost identical in the 500-examinee datasets.

Applied Psychological Measurement | 2001

Defining Error Rates and Power for Detecting Answer Copying

James A. Wollack; Allan S. Cohen; Ronald C. Serlin

A family wise approach is described for evaluating the significance of copying indices designed to hold the Type I error rate constant for each examinee. The empirical Type I error rate and power of two indices, ω (Wollack, 1997) and g2 (Frary, Tideman, & Watts, 1977), are examined under a variety of copying situations. Results indicated that the traditional pairwise approach falsely detected examinees almost three times more often than the nominal α level. Familywise Type I error rates were substantially smaller, although they also tended to be somewhat inflated at small α levels as the percentage of items copied increased. Eliminating the indices detecting a source from the copier, in situations where the copier was also detected from the source, helped control the familywise Type I error rates for all α .001. Lack of Type I error control meant power could not be evaluated for g2 under any of the simulated familywise conditions. Familywise power for ω was reasonable when at least 30% of the items were copied.

Aphasiology | 2005

Functional outcomes in patients with right hemisphere brain damage

Katharine H. Odell; James A. Wollack; Marge Flynn

Background : In this era of accountability in health care, the need to document treatment-related changes in health status is critical. However, few studies report outcomes in people with right cerebral hemisphere damage (RHD). Aims : The objective of this study was to document, in a single population of patients with RHD, selected functional outcomes at the termination of inpatient treatment. Of particular interest were cognitive performance and its influence on motor and overall recovery. Methods & Procedures : Functional outcomes were retrospectively examined in 101 RHD patients, at discharge from an in-patient rehabilitation programme. The Functional Independence Measure (FIM; Center for Functional Assessment Research, 1993) was the measurement tool. The five outcomes examined were: final functional status, amount of gain, efficiency of gain, length of stay (LOS), and discharge placement. FIM scores, produced on an ordinal rating scale, were statistically transformed by the Rasch method (Rasch, 1960) to generate interval-level data for regression analyses. Outcomes & Results : Summary by outcomes: Gains were evident in cognitive and motor realms, with greater and more efficient improvement in the latter. Regression analysis indicated that final functional status was best predicted by age, initial motor severity (FIM motor score), and initial total cognitive severity (FIM cognitive scores); amount of gain was best predicted by age, evidence of previous neurological incident, and gender; efficiency of gain by initial cognitive item scores, initial motor severity (FIM score) and age; LOS by initial motor severity (FIM score); and discharge placement by age, marital status, and initial severity (FIM status). Major predictors tended to be age and the family of cognitive FIM scores, especially Problem Solving (PS). Memory and PS were the most challenging cognitive items for these patients, as indicated by scores on admission and discharge reflecting less than functional ability. A sizeable number of patients began and ended rehabilitation with functional levels of ability in comprehension, expression, and social interaction. Significant differences existed between patients with neglect and those without, but neglect was not a significant predictor of any outcome measure. Low initial cognitive FIM scores, presence of neglect, and older age were associated with poorer performance in motor and cognitive realms. Previous neurological episodes were negatively associated with amount of gain. Number of comorbidities was not statistically associated with outcomes. Conclusions : Initial severity levels and age were the most influential factors on these outcomes. The presence of neglect had a relatively minor impact on most outcomes. Performance on the cognitive items was less impaired than motor items, and registered less gain and less efficient gain than motor items, but did predict various final status and gain-related measurements in overall and motor realms. Analyses in this study revealed that the FIM scale is less sensitive to cognitive change than to motor change.

Educational and Psychological Measurement | 2009

On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit.

Young-Sun Lee; James A. Wollack; Jeff Douglas

The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items, especially nonmonotone items, the greatest difference is between the 2PL and kernel smoothing procedures. In general, the differences between ICCs from the nonparametric procedures and the 2PL are reduced as both sample size and test length increase. The false positive rate of the test for model fit is promising for nonparametric ICC estimation methods. Power to detect misfitting items simulated with 4PL is low. Power to detect nonmonotone items is generally much higher. Power is best for kernel smoothing but also good for isotonic regression in the medium to large sample sizes and longer test length conditions. Power for the smoothed isotonic regression is uniformly low.

Journal of Psychoeducational Assessment | 1995

A Reply to Kishor: Choosing the Right Metric

Jeffery P. Braden; James A. Wollack; Thomas E. Allen

Kishor (this issue) claims that the Stanford Achievement Test normed on deaf children (SAT-d) scaled scores offer the best metric for estimating IQ-Achievement correlations in samples of deaf children. We argue that he is partly right and mainly wrong. We agree with Kishor that SAT-d grade equivalents are inappropriate, although we could not replicate his results. However, Kishors simulations are logically and empirically flawed and, therefore, cannot address the relative value of SAT-d metrics. More appropriate simulations show (a) that in homogeneous age samples, all SAT-d metrics yield similar results, and (b) in heterogeneous age samples, age-referenced scores are superior to scaled scores and grade equivalents for estimating “true” IQ-Achievement correlations. Our results suggest two practical implications: (1) researchers should use age-based percentiles for achievement if they also use age-referenced IQs, and (b) available studies underestimate IQ-Achievement correlations in samples of deaf children.

Educational and Psychological Measurement | 2015

Detecting Test Tampering Using Item Response Theory

James A. Wollack; Allan S. Cohen; Carol A. Eckerly

Test tampering, especially on tests for educational accountability, is an unfortunate reality, necessitating that the state (or its testing vendor) perform data forensic analyses, such as erasure analyses, to look for signs of possible malfeasance. Few statistical approaches exist for detecting fraudulent erasures, and those that do largely do not lend themselves to making probabilistic statements about the likelihood of the observations. In this article, a new erasure detection index, EDI, is developed, which uses item response theory to compare the number of observed wrong-to-right erasures to the number expected due to chance, conditional on the examinee’s ability-level and number of erased items. A simulation study is presented to evaluate the Type I error rate and power of EDI under various types of fraudulent and benign erasures. Results show that EDI with a correction for continuity yields Type I error rates that are less than or equal to nominal levels for every condition studied, and has high power to detect even small amounts of tampering among the students for whom tampering is most likely.

Explore More