Mary E. Lunz
American Society for Clinical Pathology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mary E. Lunz.
Evaluation & the Health Professions | 1990
Mary E. Lunz; John A. Stahl
The purpose of this research project was to confirm that differences in the severity of judges and the stringency of grading periods occur, regardless of the nature of the assessment or the examination materials used. Three rather different examinations that require judges were analyzed, using an extended Rasch model to determine whether differences in judge severity and grading-period stringency were observable for all three examinations. Significant variation in judge severity and some variation across grading periods were found on all three examinations. This implies that regardless of the nature of the examination, items, or judges, examinee/measures cannot be considered independent of the particular judges involved unless correction for severity is made systematically. Accounting for judge severity and gradinig-period stringency is extremely important when pass/fail decisions that are meant to generalize to competence are made, as in certification examinations.
Applied Psychological Measurement | 1992
Mary E. Lunz; Betty A. Bergstrom; Benjamin D. Wright
The effect of reviewing items and altering responses on the efficiency of computerized adap tive tests and the resultant ability estimates of examinees were explored. 220 students were ran domly assigned to a review condition; their test instructions indicated that each item must be answered when presented, but that the responses could be reviewed and altered at the end of the test. A sample of 492 students did not have the opportunity to review and alter responses. Within the review condition, examinee ability estimates before and after review were correlated .98. The average efficiency of the test was decreased by 1% after review. Approximately 32% of the examinees improved their ability estimates after review, but did not change their pass/fail status. Disallowing review on adaptive tests administered under these rules is not supported by these data.
Educational and Psychological Measurement | 1994
Mary E. Lunz; John A. Stahl; Benjamin D. Wright
The purpose of this article is to discuss the importance of decision reproducibility for performance assessments. When decisions from two judges about a students performance using comparable tasks correlate, decisions have been considered reproducible. However, when judges differ in expectations and tasks differ in difficulty, decisions may not be independent of the particular judges or tasks encountered unless appropriate adjustments for the observable differences are made. In this study, data were analyzed with the Facets model and provided evidence that judges grade differently, whether or not the scores given correlate well. This outcome suggests that adjustments for differences among judge severities should be made before student measures are estimated to produce reproducible decisions for certification, achievement, or promotion.
Evaluation & the Health Professions | 1989
Mary E. Lunz; John A. Stahl; Karen James
This article describes a process for linking job analysis data to test specifications. Rasch rating scale analysis is used to order the ratings of importance assigned to task and content items by practicing laboratory managers. This ordering produces variables representing the range of task and content items from most to least important. The variables also provide an objective frame of reference for review of the data by the experts. The Rasch calibrations for each task and content item are transformed to percentages based on useful limits. From the transformed percentages, test specifications reflecting practice patterns in the field of laboratory management are developed.
Teaching and Learning in Medicine | 1993
Mary E. Lunz; John A. Stahl
This article presents an introduction to Rasch model analysis and its application to examinations that require examiners. The introduction begins with a discussion of the Rasch model and its assumptions and then presents the multifacet Rasch model to include examiners. The article concludes with an example of an application to an oral examination administered by a medical specialty board.
International Journal of Educational Research | 1994
Mary E. Lunz; Betty A. Bergstrom; Richard Gershon
The creation of item response theory (IRT) and Rasch models, inexpensive accessibility to high speed desktop computers, and the growth of the Internet, has led to the creation and growth of computerized adaptive testing or CAT. This form of assessment is applicable for both high stakes tests such as certification or licensure exams, as well as health related quality of life surveys. This article discusses the historical background of CAT including its many advantages over conventional (typically paper and pencil) alternatives. The process of CAT is then described including descriptions of the specific differences of using CAT based upon 1-, 2- and 3-parameter IRT and various Rasch models. Numerous specific topics describing CAT in practice are described including: initial item selection, content balancing, test difficulty, test length and stopping rules. The article concludes with the authors reflections regarding the future of CAT.
Evaluation & the Health Professions | 1992
Betity A. Bergstrom; Mary E. Lunz
The purpose of this study is to compare the level of confidence in pass/fail decisions obtained with computer adaptive tests and pencil and paper tests. Examinees took a variablelength computer adaptive test and two fixedlength pencil and paper tests. The computer adaptive test was stopped when the examinee ability estimate was either 1.3 times the standard error of measurement above or below the pass/fail point (one-tailed 90% confidence interval) or when a maximum test length was reached. Results show that greater confidence in the accuracy of the pass/fail decisions is obtained for more examinees when the computer adaptive test implements a 90% confidence stopping rule than with paper and pencil tests of comparable test length.
Teaching and Learning in Medicine | 1996
Mary E. Lunz; Craig W. Deville
Background: As computerized adaptive testing (CAT) becomes more prevalent, it is important to confirm that the computerized adaptive and paper‐and‐pencil (P&P) examinations offer comparable validity and statistical performance. Purpose: The purpose of this study was to investigate the validity and statistical properties of automated item selection (CAT) compared to manual item selection (P&P). Methods: A committee of specialists rated computerized adaptive tests (CATs) and P&P examinations with regard to face validity, adherence to test specifications, ordering of items, and cognitive skill distribution. The psychometric properties were compared. Results: Results indicated that the CATs and P&P examinations were comparable for face, content, and construct validity, as well as psychometric characteristics. Conclusions: Tests constructed automatically by the computer or manually for P&P can meet the criteria for validity and statistical performance. These findings generalize to any carefully developed exami...
Journal of Educational Computing Research | 1995
Mary E. Lunz; Betty A. Bergstrom
Computerized adaptive testing (CAT) uses a computer algorithm to construct and score the best possible individualized or tailored tests for each candidate. The computer also provides an absolute record of all responses and changes to responses, as well as their effects on candidate performance. The detail of the data from computerized adaptive tests makes it possible to track initial responses and response alterations, and their effect on candidate estimated ability measures, as well as the statistical performance of the examination. The purpose of this study was to track the effect of candidate response patterns on a computerized adaptive test. A ninety-item certification examination was divided into nine units of ten items each to track the pattern of initial responses and response alterations on ability estimates and test precision across the nine test units. The precision of the test was affected most by response alterations during early segments of the test. While generally, candidates benefit from altering responses, individual candidates showed different patterns of response alterations across test segments. Test precision is minimally affected, suggesting that the tailoring of CAT is minimally affected by response alterations.
International Journal of Educational Research | 1994
Richard Smith; Ellen R. Julian; Mary E. Lunz; John A. Stahl; Matthew Schulz; Benjamin D. Wright
Abstract The establishment and management of standards is at the heart of professional education and certification. Usually paper and pencil tests are not sufficient and further evidence must be collected by observing and judging instances of professional performance. The normative methods that have served as defaults for this purpose are devoid of competence specifics. For some years now a few professional associations have been using probabilistic conjoint measurement as the means for developing, applying and maintaining their standards. This chapter describes how this has been done and how well it has worked in practice.