Thomas M. Haladyna
Arizona State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas M. Haladyna.
Applied Measurement in Education | 2002
Thomas M. Haladyna; Steven M. Downing
A taxonomy of 31 multiple-choice item-writing guidelines was validated through a logical process that included two sources of evidence: the consensus achieved from reviewing what was found in 27 textbooks on educational testing and the results of 27 research studies and reviews published since 1990. This taxonomy is mainly intended for classroom assessment. Because textbooks have potential to educate teachers and future teachers, textbook writers are encouraged to consider these findings in future editions of their textbooks. This taxonomy may also have usefulness for developing test items for large-scale assessments. Finally, research on multiple-choice item writing is discussed both from substantive and methodological viewpoints.
Educational Researcher | 1991
Thomas M. Haladyna; Susan Bobbit Nolen; Nancy Haas
In the current climate of dissatisfaction with public education, the standardized achievement test score has been the operational definition for educational-achievement, and raising test scores has been equated with educational improvement. The pressure to raise test scores has resulted in practices which pollute the inferences we make from these scores. We examine two major sources of test score pollution: (a) how public school personnel prepare students to take the standardized test and (b) nonstandard practices and conditions under which tests are administered. We also examine the apparent causes of this pollution and its effects on testing practices in American education.
Medical Education | 2004
Steven M. Downing; Thomas M. Haladyna
Context Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity.
Journal of Experimental Education | 1979
Thomas M. Haladyna; Greg Thomas
Research on attitudes of students toward school and subject matters has been limited by a lack of suitable instrumentation, among other problems. The present study was designed to ascertain the attitudes of elementary school students toward school and seven subject matter areas as a function of grade level and sex. Approximately 3,000 students were administered a non-verbal attitude inventory. Results indicated a sizable decline in attitudes toward school as a function of grade level, extremely low attitudes toward social studies, a sizable decline in attitudes between grades six and grades seven and eight in most areas measured, and predictable boy-girl differences, the most serious of which indicates that attitudes toward school decline more drastically for boys than for girls.
Educational and Psychological Measurement | 1993
Thomas M. Haladyna; Steven M. Downing
Textbook writers often recommend four or five options per multiple-choice item, and most, if not all, testing programs in the United States also employ four or five options. Recent reviews of research on the desirable number of options for a multiple-choice test item reveal that three options may be suitable for most ability and achievement tests. A study of the frequency of acceptably performing distractors is reported. Results from three different testing programs support the conclusion that test items seldom contain more than three useful options. Consequently, testing program personnel and classroom teachers may be better served by using 2-or 3-option items instead of the typically recommended 4- or 5-option items.
Educational and Psychological Measurement | 1993
Kevin D. Crehan; Thomas M. Haladyna; Britton W. Brewer
This study examined the validity of two item-writing rules in the design of test items: (a) the desirable number of options for a multiple-choice test item and (b) use of the inclusive none of these option. An experimental repeated measures design found that items with three options were more difficult than those with four options and items employing the none of these option were more difficult than those not using this inclusive option format. Neither format manipulation affected item discrimination. Therefore, evidence allows no recommendation for the none of these option but suggests an advantage for multiple-choice items with fewer than the traditional four or five options.
Evaluation & the Health Professions | 2004
Thomas M. Haladyna; Gene A. Kramer
Subscores resulting fromthe administration of high-stakes tests to candidates for credentials in the health professions are desirable for two reasons. First, failing candidates want a profile of performance to plan future remedial studies. Second, training institutions want a profile of performance for their graduates to better evaluate their training. The validity of the interpretation or use of subscores depends on a summative judgment based on a combination of reasoning and empirical analyses, known as validation. We describe this reasoning process and show that with a large credentialing test the validity of any subscore interpretation or use can and should be studied systematically. Validity evidence should be established to support the interpretation and use of subscores that we intend to report. Some principles arise in this study related to the validity of subscores, and some procedures are proposed to help testing program personnel better validate the use of subscores.
Research in Higher Education | 1994
Thomas M. Haladyna; Robert K. Hess
With the use of surveys of instructional effectiveness that use Likert rating scales, bias is a potential threat to the validity of interpretations. Simple summation of ratings or the use of larger samples are not methods for removing bias. In this study, a new model for scaling ratings is examined. The method both identifies and corrects for bias. Working with a database of student ratings of college instruction, the model was tested in terms of a variety of criteria. Results indicated that bias was detected and that it was large enough to warrant our concern. The statistical corrections were significant both in terms of order and magnitude of class means. Implications for future studies include the specification of more potential sources of bias, the interaction of some of these factors, and the development of more systematic evidence supporting the need to be attentive to bias. The many-faceted Rasch model used in this study needs more evaluation before we are convinced of its utility to study and correct for bias, but preliminary evidence is encouraging. Recommendations were offered for a theoretical rationale for studying bias in student ratings of instructional effectiveness and a program of research leading to the use of this model for reporting results for use in improving instruction and for promotion, tenure, and merit decisions.
Evaluation & the Health Professions | 1989
Thomas M. Haladyna; Russelyn Roby Shindoll
Writing multiple-choice test items has been typically characterized as more of an art than a science. Textbooks commonly offer advice on how to write items, but most inexperienced item writers, despite having expertise in a content area, have difficulty phrasing the stem. A technique is described that has been successfully used in several testing programs in the health professions. This technique, item shell, provides a basis for getting item writers started in the difficult process of w, iting the effective multiple-choice item.
Review of Educational Research | 1980
Gale Roid; Thomas M. Haladyna
The emerging technology of item writing for achievement tests is reviewed. Several different approaches to item development are discussed. A continuum of item-writing methods is proposed ranging from informal-subjective methods to algorithmic-objective methods. Examples of techniques include objective-based item writing, amplified objectives, item forms, facet design, domain-referenced concept testing, and computerized techniques. Each item-writing technique is critically reviewed, and empirical studies of methods are described. Recommendations for further research and for applications to achievement testing are presented.