Language Testing | 2019

A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting

 

Abstract


Summative assessment of interpretation is widely conducted in interpreting courses/programs to inform high-stakes decision making, such as the selection, certification, and conferral of academic degrees. Yet there has been very limited empirical research to investigate the score dependability of summative interpretation assessment. The present study therefore sets out to explore the optimal measurement design(s) for a locally created summative assessment of English/Chinese consecutive interpretation, based on multiple fully crossed generalizability studies. Major findings include the following: (a) overall the raters behaved more consistently by using the information completeness (InfoCom) scale rather than the fluency of delivery (FluDel) or target language quality (TLQual) scales; (b) the raters displayed greater variability in evaluating the Chinese-to-English interpretation rather than the English-to-Chinese interpretation; (c) although adding tasks worked more effectively in raising score dependability than using additional raters for the InfoCom ratings in the English-to-Chinese interpretation, the pattern was reversed for the other observations; and (d) two potentially optimal designs were identified for the English-to-Chinese direction, and one design for the other direction. These results are discussed, highlighting the complex nature of relationships among the assessment criterion, the interpreting directionality, the raters’ dominant language and score dependability, together with the need to ensure score dependability for summative interpretation assessment.

Volume 36
Pages 419 - 438
DOI 10.1177/0265532218809396
Language English
Journal Language Testing

Full Text