Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen P. Klein is active.

Publication


Featured researches published by Stephen P. Klein.


American Journal of Public Health | 1985

The cost and effectiveness of school-based preventive dental care.

Stephen P. Klein; Harry M. Bohannan; Robert M. Bell; Judith A. Disney; Craig B. Foch; Richard C. Graves

The National Preventive Dentistry Demonstration Program assessed the cost and effectiveness of various types and combinations of school-based preventive dental care procedures. The program involved 20,052 first, second, and fifth graders from five fluoridated and five nonfluoridated communities. These children were examined at baseline and assigned to one of six treatment regimens. Four years later, 9,566 members of this group were examined again. Analyses of their dental examination data showed that dental health lessons, brushing and flossing, fluoride tablets and mouthrinsing, and professionally applied topical fluorides were not effective in reducing a substantial amount of dental decay, even when all of these procedures were used together. Occlusal sealants prevented one to two carious surfaces in four years. Children who were especially susceptible to decay did not benefit appreciably more from any of the preventive measures than did children in general. Annual direct per capita costs were


Educational Evaluation and Policy Analysis | 2003

Studying Large-Scale Reforms of Instructional Practice: An Example from Mathematics and Science

Laura S. Hamilton; Daniel F. McCaffrey; Brian M. Stecher; Stephen P. Klein; Abby Robyn; Delia Bugliari

23 for sealant or fluoride prophy/gel applications and


Educational Evaluation and Policy Analysis | 1997

The Cost of Science Performance Assessments in Large-Scale Testing Programs:

Brian M. Stecher; Stephen P. Klein

3.29 for fluoride mouthrinsing. Communal water fluoridation was reaffirmed as the most cost-effective means of reducing tooth decay in children.


Educational Evaluation and Policy Analysis | 1997

Gender and Racial/Ethnic Differences on Performance Assessments in Science

Stephen P. Klein; Jasna Jovanovic; Brian M. Stecher; Dan McCaffrey; Richard J. Shavelson; Edward H. Haertel; Guillermo Solano-Flores; Kathy Comfort

A number of challenges are encountered when evaluating a large-scale, multisite educational reform aimed at changing classroom practice. The challenges include substantial variability in implementation with little information on actual practice, lack of common, appropriate outcome measures, and the need to synthesize evaluation results across multiple study sites. This article describes an approach to addressing these challenges in the context of a study of the relationships between student achievement and instructional practices in the National Science Foundation’s Systemic Initiatives (SI) program. We gathered data from eleven SI sites and investigated relationships at the site level and pooled across sites using a planned meta-analytic approach. We found small but consistent positive relationships between teachers’ reported use of standards-based instruction and student achievement. The article also describes the ways in which we addressed the challenges discussed, as well as a number of additional obstacles that need to be addressed to improve future evaluations of large-scale reforms.


Journal of Personnel Evaluation in Education | 1998

Standards for Teacher Tests

Stephen P. Klein

Estimates of the costs of including hands-on measures of science skills in large-scale assessment programs are drawn from a field trial involving more than 2,000 fifth- and sixth-grade students. These estimates include the resources needed to develop, administer, and score the tasks. They suggest that performance measures are far more expensive than typical multiple-choice tests for an equal amount of testing time, and the cost increases even further for an equally reliable score on an individual student. Because of the complexities of equipment and materials, hands-on measures in science are about three times more expensive than open-ended writing assessments. Alternative approaches to development and administration (such as using less expensive equipment and having the tasks administered by classroom teachers rather than trained Exercise Administrators) could reduce costs by up to 50%, but these practices may reduce the quality of the data obtained. However, including performance assessments in a state’s testing program may have many positive effects, including fostering standards-based educational reform and encouraging more effective teaching methods. The challenge is to determine whether these potential benefits actually exist and if they do, how they can be realized within the budget constraints of most testing programs.


Evaluation Review | 1993

Adjusting the Census of 1990 The Smoothing Model

David A. Freedman; Kenneth W. Wachter; Daniel C. Coster; D. Richard Cutler; Stephen P. Klein

We examined whether the differences in mean scores among gender and racial/ethnic groups on science performance assessments are comparable to the differences that are typically found among these groups on traditional multiple-choice tests. To do this, several hands-on science performance assessments and other measures were administered to over 2,000 students in grades five, six, and nine as part of a field test of California’s statewide testing program. Girls tended to have higher overall mean scores than boys on the performance measures, but boys tended to score higher than girls on certain types of questions within a performance task. In contrast, differences in mean scores among racial/ethnic groups on one type of test (or question) were comparable to the differences among these groups on the other measures studied. Overall, the results suggest that the type of science test used is unlikely to have much effect on gender or racial/ethnic differences in scores.


Research in Higher Education | 2006

STUDENT ENGAGEMENT AND STUDENT LEARNING: Testing the Linkages*

Robert M. Carini; George D. Kuh; Stephen P. Klein

States are increasingly requiring that public school teachers pass one or more tests as a condition for permanent employment. As a result of a recent federal court decision, these tests must now satisfy the same legal standards as other employment tests. Moreover, some of the measures used to assess teacher competence no longer rely on multiple-choice items. They now utilize various types of open-ended performance assessments. This article discusses how these developments may affect the adverse impact, reliability, validity, and pass-fail standards of teacher certification tests. The article concludes by recommending that such tests combine multiple-choice questions with open-end tasks that focus on the common or critical situations that are likely to arise across the full range of practice setting for which the teacher is being certified or licensed.


Education Policy Analysis Archives | 2000

What Do Test Scores in Texas Tell Us

Stephen P. Klein; Laura S. Hamilton; Daniel F. McCaffrey; Brian M. Stecher

Considering the difficulties, the Census Bureau does a remarkably good job at counting people. This article discusses techniques for adjusting the census. If there is a large undercount, these techniques may be accurate enough for adjustment. With a small undercount, they are unlikely to improve on the census; instead, adjustment could easily degrade the accuracy of the data. The focus will be sampling error, that is, uncertainty in estimates due to the luck of the draw in choosing the sample. Sampling error is a major obstacle to adjusting the 1990 census, even at the state level. To control sampling error, the Census Bureau used a smoothing model. However, the model does not solve the problem, because its effects are strongly dependent on unverified and implausible assumptions. This story has a broader moral. Statistical models are often defended on grounds of robustness, that is, estimates do not depend strongly on assumptions. But the standard errors, which are internally generated measures of precision, may be critical. Then caution is in order. If the model is at all complicated, the standard errors may turn out to be driven by assumptions not data—the antithesis of robustness.


Educational Measurement: Issues and Practice | 2005

The Vemont Portfolio Assessment Program: Findings and Implications

Daniel Koretz; Brian M. Stecher; Stephen P. Klein; Daniel F. McCaffrey


Archive | 2002

Making sense of test-based accountability in education

Laura S. Hamilton; Brian M. Stecher; Stephen P. Klein

Collaboration


Dive into the Stephen P. Klein's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard C. Graves

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge