Mo Zhang
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mo Zhang.
International Journal of Testing | 2012
Mo Zhang; David M. Williamson; F. Jay Breyer; Catherine Trapani
This article describes two separate, related studies that provide insight into the effectiveness of e-rater score calibration methods based on different distributional targets. In the first study, we developed and evaluated a new type of e-rater scoring model that was cost-effective and applicable under conditions of absent human rating and small candidate volume. This new model type, called the Scale Midpoint Model, outperformed an existing e-rater scoring model that is often adopted by certain e-rater system users without modification. In the second study, we examined the impact of three distributional score calibration approaches on existing models’ performance. These approaches included percentile calibrations on e-rater scores in accordance with a human rating distribution, normal distribution, and uniform distribution. Results indicated that these score calibration approaches did not have overall positive effects on the performance of existing e-rater scoring models.
Archive | 2016
Mo Zhang; Jiangang Hao; Chen Li; Paul Deane
Keystroke logs are a valuable tool for writing research. Using large samples of student responses to two prompts targeting different writing purposes, we analyzed the longest 25 inter-word intervals in each keystroke log. The logs were extracted using the ETS keystroke logging engine. We found two distinct patterns of student writing processes associated with stronger and weaker writers, and an overall moderate association between the inter-word interval information and the quality of final product. The results suggest promise for the use of keystroke log analysis as a tool for describing patterns or styles of student writing processes.
Archive | 2015
Mo Zhang; Jing Chen; Chunyi Ruan
As automated essay scoring grows in popularity, the measurement issues associated with it take on greater importance. One such issue is the detection of aberrant responses. In this study, we considered aberrant responses as those that were not suitable for machine scoring because the responses have characteristics that the scoring system cannot process. Since no such system can yet understand language in a way that a human rater does, the detection of aberrant responses is important for all automated essay scoring systems. Successful identification of aberrant responses can happen before and after machine scoring is attempted (i.e., pre-screening and post-hoc screening). Such identification is essential if the technology is to be used as the primary scoring method. In this study, we investigated the functioning of a set of pre-screening advisory flags that have been used in different automated essay scoring systems. In addition, we evaluated whether the size of the human–machine discrepancy could be predicted as a precursor to developing a general post-hoc screening method. These analyses were conducted using one scoring system as a case example. Empirical results suggested that some pre-screening advisories were operating more effectively than others were. With respect to post-hoc screening, relatively little scoring difficulty was found overall, thereby reducing the ability to predict human–machine discrepancy for those responses that passed through pre-screening. Limitations of the study and suggestions for future studies are also provided.
Archive | 2017
Mo Zhang; Danjie Zou; Amery D. Wu; Paul Deane; Chen Li
We evaluated whether a scenario-based assessment structure with a theoretically determined task order has an impact on the writing processes that students execute in responding to an essay task. Students’ writing processes were recorded via keystroke logs. Two testing conditions were compared, one with the original task ordering and the other with reversed task ordering, using more than two dozen variables extracted from keystroke logs. The results showed clear effects of task ordering in the context of scenario-based assessment. The scaffolded form reduced the dependency of performance on general writing fluency. Compared to the non-scaffolded form, students taking the scaffolded form appeared to allocate more cognitive effort to editing and revising, as well as to the higher-level cognitively demanding processes important to the quality of argumentation. Students taking the non-scaffolded form appeared to expend greater effort at the pre-writing stage. Limitations and future directions for research are also discussed.
ETS Research Report Series | 2015
Mo Zhang; Paul Deane
ETS Research Report Series | 2015
Paul Deane; Mo Zhang
Archive | 2016
Paul Deane; Gary Feng; Mo Zhang; Jiangang Hao; Yoav Bergner; Michael Flor; Michael E. Wagner; Nathan Lederer
Archive | 2016
Isaac I. Bejar; Robert J. Mislevy; Mo Zhang
Reading and Writing | 2018
Paul Deane; Yi Song; Peter W. van Rijn; Tenaha O’Reilly; Mary E. Fowles; Randy Elliot Bennett; John Sabatini; Mo Zhang
ETS Research Report Series | 2013
Mo Zhang; F. Jay Breyer; Florian Lorenz