Chad W. Buckendahl | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chad W. Buckendahl is active.

Explore More

Publication

Featured researches published by Chad W. Buckendahl.

Applied Measurement in Education | 2002

A Review of Strategies for Validating Computer-Automated Scoring

Yongwei Yang; Chad W. Buckendahl; Piotr J. Juszkiewicz; Dennison S. Bhola

Computer-automated scoring (CAS) is becoming a popular tool in assessment. Various studies have been conducted to assess the quality of CAS-system-generated scores. There is then a need for a systematic examination of these studies in the context of contemporary validity concepts and current practice. This article starts with a brief introduction to current CAS systems. Next, we review the current practice of validating CAS-system-generated scores. Finally, we present a conceptual framework and general recommendations for designing validation studies for CAS procedures.

Child Maltreatment | 1999

Parent Attitudes and Discipline Practices: Profiles and Correlates in a Nationally Representative Sample

Ross A. Thompson; Elaine H. Christiansen; Shelly Jackson; Jennifer M. Wyatt; Rebecca A. Colman; Reece L. Peterson; Brian L. Wilcox; Chad W. Buckendahl

The responses of a nationally representative sample of 1,000 parents to a survey concerning parent attitudes, disciplinary practices, and other predictors of competent parenting were analyzed. Cluster analysis identified three subgroups based on their profiles of parenting attitudes and discipline. The first was high on physical discipline, neglect, verbal abuse, and attitudes that devalue children. They reported childhood abuse and domestic violence, marital difficulty, and problems managing anger. The second group was high on nonphysical as well as physical discipline, and had a more positive attitude toward children but also had a profile of psychosocial risk. The third group had low scores on all disciplinary practices, low perceived disciplinary efficacy, and a healthy marital and personal history. These groups are different from traditional parenting typologies, and the findings confirm theoretical predictions concerning the correlates of parenting problems and raise new questions concerning the convergence of physically punitive with nonpunitive discipline practices.

Language Assessment Quarterly | 2007

Recommending a Nursing-Specific Passing Standard for the IELTS Examination

Thomas R. O'Neill; Chad W. Buckendahl; Barbara S. Plake; Lynda Taylor

Licensure testing programs in the United States (e.g., nursing) face an increasing challenge of measuring the competency of internationally trained candidates, both in relation to their clinical competence and their English language competence. To assist with the latter, professional licensing bodies often adopt well-established and widely available international English language proficiency measures. In this context, the National Council of State Boards of Nursing (NCSBN) sought to develop a nursing-specific passing standard on the International English Language Testing System that U.S. jurisdictions could consider in their licensure decisions for internationally trained candidates. Findings from a standard setting exercise were considered by NCSBNs Examination Committee in conjunction with other relevant information to produce a legally defensible passing standard on the test. This article reports in detail on the standard setting exercise conducted as part of this policy-making process; it describes the techniques adopted, the procedures followed, and the outcomes obtained. The study is contextualized within the current literature on standard setting. The latter part of the article describes the nature of the policy-making process to which the study contributed and discusses some of the implications of including a language literacy test as part of a licensure testing program.

International Journal of Testing | 2013

Standard Setting to an International Reference Framework: Implications for Theory and Practice

Gad S. Lim; Ardeshir Geranpayeh; Hanan Khalifa; Chad W. Buckendahl

Standard setting theory has largely developed with reference to a typical situation, determining a level or levels of performance for one exam for one context. However, standard setting is now being used with international reference frameworks, where some parameters and assumptions of classical standard setting do not hold. We consider the challenges standard setting poses to reference frameworks and vice versa, focusing on the acceptance within standard setting theory of divergent outcomes. We argue that the justification for it does not hold in the context of reference frameworks; convergent outcomes should be expected and divergences investigated. The argument is illustrated using work relating the International English Language Testing System, an English language proficiency examination, to the Common European Framework of Reference for Languages (CEFR), a reference framework of language ability. We describe a standard setting study and a criterion validation study, show how their results agree, and reconcile findings with those from other studies. Implications for standard setting and for the CEFR are discussed.

International Journal of Testing | 2011

Evaluating the Bookmark Standard Setting Method: The Impact of Random Item Ordering.

Susan L. Davis-Becker; Chad W. Buckendahl; Jack D. Gerrow

Throughout the world, cut scores are an important aspect of a high-stakes testing program because they are a key operational component of the interpretation of test scores. One method for setting standards that is prevalent in educational testing programs—the Bookmark method—is intended to be a less cognitively complex alternative to methods such as the modified Angoff (1971) approach. In this study, we explored that assertion for a licensure examination program where two independent panels applied the Bookmark method to recommend a cut score on its Written Exam. One panel initially made their ratings using an ordered item booklet (OIB) in which items were randomly ordered with respect to empirically estimated difficulty followed by judgments on a correctly ordered OIB. A second panel applied the Bookmark process with only the correctly ordered OIB. Results revealed striking similarities among judgments, calling into question panelists’ ability to appropriately engage in the Bookmark method. In addition, under the random-ordering condition, approximately one-third of the panelists placed their bookmarks in a manner inconsistent with the new item difficulties. Implications of these results for the Bookmark standard setting method are also discussed.

Applied Measurement in Education | 2005

Guest Editor's Introduction: Qualitative Inquiries of Participants' Experiences With Standard Setting

Chad W. Buckendahl

Shepard, Glaser, Linn, and Bohrnstedt (1993) suggested that participants may not understand their judgments when they engage in the Angoff method because of the complexity of the underlying tasks. The perceived cognitive complexity of the tasks involved in judgmental standard-setting methodologies also led to a reiterated characterization of the Angoff (1971) procedure as “fundamentally flawed” in an evaluation of the standard-setting methodology that was used for the National Assessment of Educational Progress (Pellegrino, Jones, & Mitchell, 1999). In response to these criticisms and their experience conducting standard setting, Mitzel, Lewis, Patz, and Green (2001) developed the bookmark method in an effort to reduce the cognitive complexity of the judgmental tasks. They suggest that by ordering the items or score points to provide information on relative difficulty and by reducing the number of judgments, the cognitive demand on the participants is minimized. However, an empirical study that compared the results of Angoff and bookmark methodologies using independent panels yielded similar cut score recommendations (Buckendahl, Smith, Impara, & Plake, 2002). Standard-setting researchers have generally spent more effort developing methods or variations of methods that seek to transform a policy-adopted conceptual definition of performance into a numerical value that relates to both the definition of performance and the content specifications of a given test. The challenge for all standard-setting methodologies is to effectively translate a participant’s mental model of the target examinee (e.g., barely proficient student) into judgments that communicate the participant’s recommendation of a value that characterizes the point of separation between one or more categories. Although researchers have speculated on the cognitive demand for participants, few studies have examined participants’ thought processes or experiences during these judgmental standard-setting tasks. APPLIED MEASUREMENT IN EDUCATION, 18(3), 219–221 Copyright

Applied Measurement in Education | 2005

A Case Study of Vertically Moderated Standard Setting for a State Science Assessment Program.

Chad W. Buckendahl; Huynh Huynh; Theresa Siskind; Joseph C. Saunders

Under the adequate yearly progress requirements of the No Child Left Behind (NCLB) Act (2001), states are currently faced with the challenge of demonstrating continuous improvement in student performance in reading and mathematics. Beginning in 2007 to 2008, science will be required as a component of the NCLB Act. This article describes South Carolinas elementary science assessments and its approach to setting achievement levels on those tests. A description of how the state developed a system of vertically moderated standards across the range of grades covered by the tests is provided. Included in the process are standard-setting activities, Technical Advisory Committee deliberations, State Department of Education final decisions, and data provided to the states Board of Education for information purposes. Recommendations for practice are also provided.

Educational Assessment | 2000

Making the Cut in School Districts: Alternative Methods for Setting Cutscores

Gerald Giraud; James C. Impara; Chad W. Buckendahl

School districts are under increasing pressure to demonstrate that students are competent in various skills, such as reading and mathematics. Often, demonstrating competence involves comparing performance on assessments to a standard of performance, as embodied in a test score. These scores, called cutscores, separate competent and noncompetent examinees. Because school districts have varied sources of data to inform cutscore decisions, various methods are available for suggesting cutscores. In 2 studies, we examine a selection of methods for arriving at rational and defensible cutscores in school districts. Methods examined are the Angoff (1971) method; the borderline and contrasting groups methods; and 2 new methods, 1 based on course enrollment and 1 based on expert expectations. In Study 1, the Angoff, borderline group, and course enrollment results were consistent, whereas in Study 2, the Angoff and professional judgment methods yielded suggested cutscores that were lower than the borderline group method. Suggestions for further study include the reaction of teachers to the cutscore-setting methods, the effect of different teacher attributes on the results of cutscore-setting methods, and the efficiency of and most effective order for employing the various methods.

International Journal of Testing | 2013

Identifying and Evaluating External Validity Evidence for Passing Scores

Susan L. Davis-Becker; Chad W. Buckendahl

A critical component of the standard setting process is collecting evidence to evaluate the recommended cut scores and their use for making decisions and classifying students based on test performance. Kane (1994, 2001) proposed a framework by which practitioners can identify and evaluate evidence of the results of the standard setting from (1) the procedural elements of the study, (2) the internal consistency of the recommendations, and (3) the external consistency of the impact or results of other measures of examinee performance. For many programs, the availability of external validity evidence is limited due the nature of the testing program. This is particularly the case for national testing programs in developing nations or international programs that span diverse populations across the world. In this article, we review two plausible approaches for identifying and evaluating external validity evidence in settings where other national or international benchmarks may not be available to guide policymakers. Each approach is presented along with a demonstration of how it could be applied in a case study from a national testing program.

Applied Measurement in Education | 2009

Conducting a Lifecycle Audit of the National Assessment of Educational Progress.

Chad W. Buckendahl; Barbara S. Plake; Susan L. Davis

The National Assessment of Educational Progress (NAEP) program is a series of periodic assessments administered nationally to samples of students and designed to measure different content areas. This article describes a multi-year study that focused on the breadth of the development, administration, maintenance, and renewal of the assessments in the program. The methodology targeted data collection through documentation and onsite interviews with key personnel relative to evaluation criteria developed from the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). We present summary results of this study and discuss one of the overarching recommendations that can inform practice for all testing programs.

Explore More