Linda L. Cook
University of Massachusetts Amherst
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Linda L. Cook.
Applied Psychological Measurement | 1987
Linda L. Cook; Nancy S. Paterson
This paper focuses on a discussion of how various equating methods are affected by (1) sampling error, (2) sample characteristics, and (3) characteristics of anchor test items. Studies that examine the effect of analytic techniques for smoothing or modeling mar ginal and bivariate frequency distributions on the ac curacy of equipercentile equating are reviewed. A need for simulation and empirical studies designed to evaluate the effectiveness of analytic smoothing tech niques for recovering the underlying distribution when sample size, test length, and distributional shape are varied is identified. Studies that examine the question of whether an equating transformation remains the same regardless of the group used to define it are also reviewed. The results of some studies suggested that this may not be a problem for forms of a homogene ous test constructed to be similar in all respects. Re sults of other studies indicated that examinees who take a test on different administration dates may vary in systematic ways and thus affect equating results. Finally, studies which examine the characteristics of anchor test items are reviewed. It is concluded that whenever groups differ in level and dispersion of abil ity, special care must be taken to assure that the an chor test is a miniature of the total test.
New Horizons in Testing#R##N#Latent Trait Test Theory and Computerized Adaptive Testing | 1983
Ronald K. Hambleton; Linda L. Cook
Publisher Summary This chapter presents two studies to discuss the robustness of item response models and effects of test length and sample size on the precision of ability estimates. The purpose of Study 1 was to study systematically the goodness of fit of the one-, two-, and three-parameter logistic models. Using computer-simulated test data, the effects of four variables were studied: (1) the variation in item discrimination parameters; (2) the average value of the pseudochance-level parameters; (3) the test length; and (4) the shape of the ability distribution. Artificial or simulated data representing the departures of varying degrees from the assumptions of the three-parameter logistic test model were generated and the goodness of fit of the three test models to the data was studied. The goodness-of-fit measures used were chosen for their practical significance. The Study 2 was designed to investigate two practical questions that are important to test developers: (1) the effects of examinee sample size and test length on the standard errors of ability estimation (SEE) curves and (2) the effects that the statistical characteristics of an item pool have on the precision of standard errors of ability estimation curves. A study of the use of SEE curves and factors that affect their stability was motivated by item response model test development procedures. When item statistics are available they are commonly used by test developers for selecting test items from a pool of test items to produce a test with a desired SEE curve.
Journal of Educational and Behavioral Statistics | 1988
Linda L. Cook; Neil J. Dorans; Daniel R. Eignor
A strong assumption made by most commonly used item response theory (IRT) models is that the data are unidimensional, that is, statistical dependence among item scores can be explained by a single ability dimension. First-order and second-order factor analyses were conducted on correlation matrices among item parcels of SAT-Verbal items. The item parcels were constructed to yield correlation matrices that were amenable to linear factor analyses. The first-order analyses were employed to assess the effective dimensionality of the item parcel data. Second-order analyses were employed to test meaningful hypotheses about the structure of the data. Parcels were constructed for three SAT-Verbal editions. The dimensionality analyses revealed that one SAT-Verbal test edition was less parallel to the other two editions than these other editions were to each other. Refinements in the dimensionality methodology and a more systematic dimensionality assessment are logical extensions of the present research.
Applied Measurement in Education | 2010
Elizabeth Stone; Linda L. Cook; Cara Cahalan Laitusis; Frederick Cline
This validity study examined differential item functioning (DIF) results on large-scale state standards–based English-language arts assessments at grades 4 and 8 for students without disabilities taking the test under standard conditions and students who are blind or visually impaired taking the test with either a large print or braille form. Using the Mantel-Haenszel method, only one item at each grade was flagged as displaying large DIF, in each case favoring students without disabilities. Additional items were flagged as exhibiting intermediate DIF, with some items found to favor each group. A priori hypothesis coding and attempts to predict the effects of large print or braille accommodations on DIF were not found to have a relationship with the actual flagging of items, although some a posteriori explanations could be made. The results are seen as supporting the accessibility and validity of the current test for students who are blind or visually impaired while also identifying areas for improvement consisting mainly of attention to formatting and consistency.
Applied Measurement in Education | 2010
Linda L. Cook; Daniel R. Eignor; Yasuyo Sawaki; Jonathan Steinberg; Frederick Cline
This study compared the underlying factors measured by a state standards-based grade 4 English-Language Arts (ELA) assessment given to several groups of students. The focus of the research was to gather evidence regarding whether or not the tests measured the same construct or constructs for students without disabilities who took the test under standard conditions, students with learning disabilities who took the test under standard conditions, students with learning disabilities who took the test with accommodations as specified in their Individualized Educational Program (IEP) or 504 plan, and students with learning disabilities who took the test with a read-aloud accommodation/modification. The ELA assessment contained both reading and writing portions. A total of 75 multiple-choice items were analyzed. A series of nested hypotheses were tested to determine if the ELA measured the same factors for students with disabilities who took the assessment with and without accommodations and students without disabilities who took the test without accommodations. The results of these analyses, although not conclusive, indicated that the assessment had a similar factor structure for all groups included in the study.
Applied Psychological Measurement | 1981
Daniel R. Eignor; Linda L. Cook
For some time, measurement theorists have been concerned about models and methods that account for extraneous variables such as guessing, forgetting, and carelessness in the making of decisions using criterion-referenced test data. For theorists espousing a &dquo;state model&dquo; conceptualization of mastery (Meskauskas, 1976), Macready and Dayton (1977) have presented a useful model that accounts for guessing and forgetting. For those theorists who feel a &dquo;continuum model&dquo; conceptualization of mastery better describes performance on a criterion-referenced test, a model comparable to Macready and Dayton’s does not exist, except perhaps in those instances when an item response theory (IRT) approach using the three-parameter logistic model is warranted (see Lord, in press). More often than not, an IRT approach is not used, either because of model assumptions or practical constraints. The concern about accounting for extraneous variables then takes the form-particularly when considering guessing-of adjusting the cutoff score after it has been set though use of one of a variety of procedures amenable to a continuum model; when guessing is considered, the standard correction for guessing formula is typically used to make this adjustment (Davis & Diamond, 1974; Educational Testing Service, 1976). What has been needed is a continum model that (1) corrects for extraneous variables in the actual standard setting process, that (2) is easier to implement than IRT, and that (3) considers factors in addition to guessing. In A Criterion -Referenced Model with Corrections for Guessing and Careless-
Journal of Experimental Education | 1977
Hariharan Swaminathan; Linda L. Cook; Laurence Cadorette
Multiple measures taken on subjects are usually classified along two dimensions: (1) measures on the same dependent variable taken at different periods of time or occasions; and (2) measures on different dependent variables taken at one testing or observation period. In this paper an appropriate procedure for the analysis of “multivariate repeated measures” designs, i.e. designs in which measures are taken along both dimensions simultaneously, is discussed. Examples are given of the application of the procedure to quasi-experimental time-series designs and to the problem of determining rater agreement when a group of individuals are rated on more than one variable.
Journal of Educational Measurement | 1977
Ronald K. Hambleton; Linda L. Cook
Journal of Educational Measurement | 1988
Linda L. Cook; Daniel R. Eignor; Hessy L. Taft
Review of Educational Research | 1978
Ronald K. Hambleton; Hariharan Swaminathan; Linda L. Cook; Daniel R. Eignor; Janice A. Gifford