Catherine Trapani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catherine Trapani is active.

Explore More

Publication

Featured researches published by Catherine Trapani.

Applied Measurement in Education | 2012

Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country

Brent Bridgeman; Catherine Trapani; Yigal Attali

Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related on average, this does not eliminate the possibility that machine and human scores may differ significantly for certain gender, ethnic, or country groups. Such differences were explored with essay data from two large-scale high-stakes testing programs: the Test of English as a Foreign Language and the Graduate Record Examination. Human and machine scores were very similar across most subgroups, but there were some notable exceptions. Policies were developed so that any differences between humans and machines would have a minimal impact on final reported scores.

Journal of Educational Measurement | 2004

Impact of Fewer Questions per Section on SAT I Scores.

Brent Bridgeman; Catherine Trapani; Edward Curley

The impact of allowing more time for each question on the SAT I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time.

International Journal of Testing | 2012

Comparison of e-rater® Automated Essay Scoring Model Calibration Methods Based on Distributional Targets

Mo Zhang; David M. Williamson; F. Jay Breyer; Catherine Trapani

This article describes two separate, related studies that provide insight into the effectiveness of e-rater score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of e-rater scoring model that was cost-effective and applicable under conditions of absent human rating and small candidate volume. This new model type, called the Scale Midpoint Model, outperformed an existing e-rater scoring model that is often adopted by certain e-rater system users without modification. In the second study, we examined the impact of three distributional score calibration approaches on existing models’ performance. These approaches included percentile calibrations on e-rater scores in accordance with a human rating distribution, normal distribution, and uniform distribution. Results indicated that these score calibration approaches did not have overall positive effects on the performance of existing e-rater scoring models.

ETS Research Report Series | 2003

EFFECT OF FEWER QUESTIONS PER SECTION ON SAT® I SCORES

Brent Bridgeman; Catherine Trapani; Edward Curley

The impact of allowing more time for each question on SAT® I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time.

The Journal of Technology, Learning and Assessment | 2010