Catherine Trapani
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Catherine Trapani.
Applied Measurement in Education | 2012
Brent Bridgeman; Catherine Trapani; Yigal Attali
Essay scores generated by machine and by human raters are generally comparable; that is, they can produce scores with similar means and standard deviations, and machine scores generally correlate as highly with human scores as scores from one human correlate with scores from another human. Although human and machine essay scores are highly related on average, this does not eliminate the possibility that machine and human scores may differ significantly for certain gender, ethnic, or country groups. Such differences were explored with essay data from two large-scale high-stakes testing programs: the Test of English as a Foreign Language and the Graduate Record Examination. Human and machine scores were very similar across most subgroups, but there were some notable exceptions. Policies were developed so that any differences between humans and machines would have a minimal impact on final reported scores.
Journal of Educational Measurement | 2004
Brent Bridgeman; Catherine Trapani; Edward Curley
The impact of allowing more time for each question on the SAT I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time.
International Journal of Testing | 2012
Mo Zhang; David M. Williamson; F. Jay Breyer; Catherine Trapani
This article describes two separate, related studies that provide insight into the effectiveness of e-rater score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of e-rater scoring model that was cost-effective and applicable under conditions of absent human rating and small candidate volume. This new model type, called the Scale Midpoint Model, outperformed an existing e-rater scoring model that is often adopted by certain e-rater system users without modification. In the second study, we examined the impact of three distributional score calibration approaches on existing models’ performance. These approaches included percentile calibrations on e-rater scores in accordance with a human rating distribution, normal distribution, and uniform distribution. Results indicated that these score calibration approaches did not have overall positive effects on the performance of existing e-rater scoring models.
ETS Research Report Series | 2003
Brent Bridgeman; Catherine Trapani; Edward Curley
The impact of allowing more time for each question on SAT® I: Reasoning Test scores was estimated by embedding sections with a reduced number of questions into the standard 30-minute equating section of two national test administrations. Thus, for example, questions were deleted from a verbal section that contained 35 questions to produce forms that contained 27 or 23 questions. Scores on the 23-question section could then be compared to scores on the same 23 questions when they were embedded in a section that contained 27 or 35 questions. Similarly, questions were deleted from a 25-question math section to form sections of 20 and 17 questions. Allowing more time per question had a minimal impact on verbal scores, producing gains of less than 10 points on the 200–800 SAT scale. Gains for the math score were less than 30 points. High-scoring students tended to benefit more than lower-scoring students, with extra time creating no increase in scores for students with SAT scores of 400 or lower. Ethnic/racial and gender differences were neither increased nor reduced with extra time.
The Journal of Technology, Learning and Assessment | 2010
Yigal Attali; Brent Bridgeman; Catherine Trapani
Journal of Research in Personality | 2006
Walter Emmerich; Donald A. Rock; Catherine Trapani
ETS Research Report Series | 2005
Ellen B. Mandinach; Brent Bridgeman; Cara Cahalan-Laitusis; Catherine Trapani
ETS Research Report Series | 2012
Chaitanya Ramineni; Catherine Trapani; David M. Williamson; Tim Davey; Brent Bridgeman
Educational Testing Service | 2009
Henry Braun; Richard J. Coley; Yue Jia; Catherine Trapani
ETS Research Report Series | 2003
Donald E. Powers; Carsten Roever; Kristin L. Huff; Catherine Trapani