Ronald K. Hambleton
University of Massachusetts Amherst
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ronald K. Hambleton.
Biometrics | 1997
Wim J. van der Linden; Ronald K. Hambleton
Item response theory has become an essential component in the toolkit of every researcher in the behavioral sciences. It provides a powerful means to study individual responses to a variety of stimuli, and the methodology has been extended and developed to cover many different models of interaction. This volume presents a wide-ranging handbook to item response theory - and its applications to educational and psychological testing. It will serve as both an introduction to the subject and also as a comprehensive reference volume for practitioners and researchers. It is organized into six major sections: the nominal categories model, models for response time or multiple attempts on items, models for multiple abilities or cognitive components, nonparametric models, models for nonmonotone items, and models with special assumptions. Each chapter in the book has been written by an expert of that particular topic, and the chapters have been carefully edited to ensure that a uniform style of notation and presentation is used throughout. As a result, all researchers whose work uses item response theory will find this an indispensable companion to their work and it will be the subjects reference volume for many years to come.
Medical Care | 2007
Bryce B. Reeve; Ron D. Hays; Jakob B. Bjorner; Karon F. Cook; Paul K. Crane; Jeanne A. Teresi; David Thissen; Dennis A. Revicki; David J. Weiss; Ronald K. Hambleton; Honghu Liu; Richard Gershon; Steven P. Reise; Jin Shei Lai; David Cella
Background:The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. Objectives:Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. Analyses:Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. Recommendations:Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.
European Journal of Psychological Assessment | 2001
Ronald K. Hambleton
Summary: The ITC Test Translation and Adaptation Guidelines have been available for nearly 7 years, and numerous researchers and practitioners have provided comments on their strengths and weaknesses. This paper addresses three goals: First, comments on the 22 guidelines that have resulted from the numerous field-tests and reviews are presented. Second, where possible, specific suggestions for revising the ITC Guidelines are described. Finally, three suggestions for essential research to improve the methodology associated with translating and adapting tests are presented.
Psicothema | 2013
José Muñiz; Paula Elosua; Ronald K. Hambleton
BACKGROUND Adapting tests across cultures is a common practice that has increased in all evaluation areas in recent years. We live in an increasingly multicultural and multilingual world in which the tests are used to support decision-making in the educational, clinical, organizational and other areas, so the adaptation of tests becomes a necessity. The main goal of this paper is to present the second edition of the guidelines of the International Test Commission (ITC) for adapting tests across cultures. METHOD A task force of six international experts reviewed the original guidelines proposed by the International Test Commission, taking into account the advances and developments of the field. RESULTS As a result of the revision this new edition consists of twenty guidelines grouped into six sections: Precondition, test development, confirmation, administration, score scales and interpretation, and document. The different sections are reviewed, and the possible sources of error influencing the tests translation and adaptation analyzed. CONCLUSIONS Twenty guidelines are proposed for translating and adapting tests across cultures. Finally we discuss the future perspectives of the guidelines in relation to the new developments in the field of psychological and educational assessment.
Review of Educational Research | 1978
Ronald K. Hambleton; Hariharan Swaminathan; James Algina; Douglas Bill Coulson
Glaser (1963) and Popham and Husek (1969) were the first to introduce and to popularize the field of criterion-referenced testing. Their motive was to provide the kind of test score information needed to make a variety of individual and programmatic decisions arising in objectivesbased instructional programs. Norm-referenced tests were seen as less than ideal for providing the desired kind of test score information. At present students at all levels of education are taking criterion-
Social Indicators Research | 1998
Ronald K. Hambleton; Liane Patsula
There is a growing interest in using tests constructed and validated for use in one language and culture in other languages and cultures. Sometimes these tests when adapted for use in a second language and culture can further research and meet informational needs, and other times, cross-cultural comparative studies can be carried out. But, whatever the purpose for the test adaptations, questions arise concerning the validity of inferences from these adapted tests.The purposes of this paper are (1) to consider several advantages and disadvantages of adapting tests from one language and culture to another, (2) to review several sources of error or invalidity associated with adapting tests and to suggest ways to reduce those errors, and (3) to consider test adaptation advances in one rapidly emerging area of social research – quality of life measures.
Medical Care | 2006
Ronald K. Hambleton
The articles addressing differential item functioning (DIF) and factorial invariance in this special issue of Medical Care1–9 are uniformly excellent and readers will find that each article makes an important contribution to the measurement literature. The suggestion to have researchers apply variou
Applied Psychological Measurement | 1986
Ronald K. Hambleton; Richard Rovinelli
This study compared four methods of determining the dimensionality of a set of test items: linear factor analysis, nonlinear factor analysis, residual analysis, and a method developed by Bejar (1980). Five artifi cial test datasets (for 40 items and 1,500 examinees) were generated to be consistent with the three-parame ter logistic model and the assumption of either a one- or a two-dimensional latent space. Two variables were manipulated: (1) the correlation between the traits (r = .10 or r = .60) and (2) the percent of test items measuring each trait (50% measuring each trait, or 75% measuring the first trait and 25% measuring the second trait). While linear factor analysis in all instances over estimated the number of underlying dimensions in the data, nonlinear factor analysis with linear and quad ratic terms led to correct determination of the item di mensionality in the three datasets where it was used. Both the residual analysis method and Bejars method proved disappointing. These results suggest the need for extreme caution in using linear factor analysis, re sidual analysis, and Bejars method until more investi gations of these methods can confirm their adequacy. Nonlinear factor analysis appears to be the most prom ising of the four methods, but more experience in ap plying the method seems necessary before wide-scale use can be recommended.
Applied Measurement in Education | 2004
Dean P. Goodman; Ronald K. Hambleton
A critical, but often neglected, component of any large-scale assessment program is the reporting of test results. In the past decade, a body of evidence has been compiled that raises concerns over the ways in which these results are reported to and understood by their intended audiences. In this study, current approaches for reporting student-level results on large-scale assessments were investigated. Recent student test score reports and interpretive guides from 11 states, three U.S. commercial testing companies, and two Canadian provinces were reviewed. On the basis of past score-reporting research, testing standards, and the requirements of the No Child Left Behind Act of 2001, a number of promising and potentially problematic features of these reports and guides are identified, and recommendations are offered to help enhance future score-reporting designs and to inform future research in this important area.
Educational and Psychological Measurement | 1992
Kathleen M. Mazor; Brian E. Clauser; Ronald K. Hambleton
The Mantel-Haenszel (MH) procedure has become one of the most popular procedures for detecting differential item functioning. Valid results with relatively small numbers of examinees are one of the advantages typically attributed to this procedure. In this study, examinee item responses were simulated to contain differentially functioning items, and then were analyzed at five sample sizes to compare detection rates. Results showed the MH procedure missed 25 to 30% of the differentially functioning items when groups of 2000 were used. When 500 or fewer examinees were retained in each group, more than 50% of the differentially functioning items were missed. The items most likely to be undetected were those which were most difficult, those with a small difference in item difficulty between the two groups, and poorly discriminating items.