Richard J. Tannenbaum
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard J. Tannenbaum.
Educational and Psychological Measurement | 1994
Richard J. Tannenbaum; Michael Rosenfeld
The purpose of this study was to conduct a job analysis of the basic skills judged to be important for all entry-level teachers regardless of subject area taught or grade level. Committees of content experts defined a domain consisting of 134 basic skill statements clustered within six basic skill dimensions: Reading, Writing, Mathematics, Listening, Speaking, and Interactive Communication Skills. A national survey of 6,120 practicing teachers was then conducted to verify the judgments of the committees and to identify a core of basic skills judged to be important by relevant subgroups of teachers (race/ethnicity, sex, teaching experience, subject area, school setting, school level, and geographic region). The results of the survey verified the importance of the basic skills domain. The subgroup analysis indicated that 113 statements (84% of the domain) were judged to be important by all subgroups of teachers.
Journal of Applied Psychology | 1993
Richard J. Tannenbaum; Scott Wesley
The authors examined agreement between importance ratings used in job analysis. The ratings were obtained from small committees of content experts and from field-survey respondents. Three measures of agreement were used: a relative index (product-moment correlation), an absolute index (intraclass correlation), and a dichotomous index (cutpoint). Data were obtained from 2 job analysis studies conducted for purposes of developing teacher-licensure tests in Spanish and in chemistry. In both studies there was a high level of agreement as measured by the relative and dichotomous indexes and a moderately high level of agreement as measured by the absolute index. The implications of these findings for job analysts and test developers are discussed
Language Assessment Quarterly | 2014
Richard J. Tannenbaum; Yeonsuk Cho
In this article, we consolidate and present in one place what is known about quality indicators for setting standards so that stakeholders may be able to recognize the signs of standard-setting quality. We use the context of setting standards to associate English language test scores with language proficiency descriptions such as those presented in the Common European Framework of Reference (Council of Europe, 2001). The criteria we discuss, however, apply to any language test and language framework. Standard-setting practices are often used to associate test scores with descriptions of language proficiency, as that is one way to provide meaning to the scores. However, there are a large number of standard-setting methods and procedures, and although there are suggested guidelines on how to implement such studies, not all studies necessarily follow the guidelines or follow them to the same extent. This makes it difficult to evaluate the quality of standard-setting results. Our intention is to offer guidance to policymakers, decision makers, and researchers about important factors they may consider to evaluate the rigor and credibility of standard-setting studies conducted to associate language test scores with language frameworks.
Educational Assessment | 2015
Richard J. Tannenbaum; Priya Kannan
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments and passing scores is necessary to justify using the passing scores for consequential decisions. Few studies, however, have directly evaluated consistency across different standard-setting panels. The purpose of this study was to investigate consistency of Angoff-based standard-setting judgments and passing scores across 9 different educator licensure assessments. Two independent, multistate panels of educators were formed to recommend the passing score for each assessment, with each panel engaging in 2 rounds of judgments. Multiple measures of consistency were applied to each round of judgments. The results provide positive evidence of the consistency in judgments and passing scores.
Language Assessment Quarterly | 2016
Spiros Papageorgiou; Richard J. Tannenbaum
ABSTRACT Although there has been substantial work on argument-based approaches to validation as well as standard-setting methodologies, it might not always be clear how standard setting fits into argument-based validity. The purpose of this article is to address this lack in the literature, with a specific focus on topics related to argument-based approaches to validation in language assessment contexts. We first argue that standard setting is an essential part of test development and validation because of the important consequences cut scores might have for decision-making. We then present the Assessment Use Argument (AUA) framework and explain how evidence from standard setting can support claims about consequences, decisions, and interpretations. We finally identify several challenges in setting cut scores in relation to the levels of the Common European Framework of Reference (CEFR) and argue that despite these challenges, standard setting is a critical component of any claim focusing on the interpretation and use of test scores in relation to the CEFR levels. We conclude that standard setting should be an integral part of the validity argument supporting score use and interpretation and should not be treated as an isolated event between the completion of test development and the reporting of scores.
Applied Measurement in Education | 2015
Priya Kannan; Adrienne Sgammato; Richard J. Tannenbaum; Irvin R. Katz
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e., probability judgments) from previously conducted standard setting studies were used first in a re-sampling study, followed by D-studies. For the re-sampling study, proportionally stratified subsets of items were extracted under various sampling and test-length conditions. The mean cut score, variance components, expected standard error (SE) around the mean cut score, and root-mean-squared deviation (RMSD) across 1,000 replications were estimated at each study condition. The SE and the RMSD decreased as the number of items increased, but this reduction tapered off after approximately 45 items. Subsequently, D-studies were performed on the same datasets. The expected SE was computed at various test lengths. Results from both studies are consistent with previous research indicating that between 40–50 items are sufficient to make generalizable cut score recommendations.
Archive | 1993
Richard J. Tannenbaum
ETS Research Report Series | 1991
Michael Rosenfeld; Richard J. Tannenbaum
ETS Research Report Series | 2008
Richard J. Tannenbaum; E. Caroline Wylie
ETS Research Report Series | 2005
Richard J. Tannenbaum; E. Caroline Wylie