Is this you? Create Your Porfile

Xun Yan

University of Illinois at Urbana–Champaign

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xun Yan is active.

Explore More

Publication

Featured researches published by Xun Yan.

Language Testing | 2014

An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach.

Xun Yan

This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of rater performance and to inform rater training. Specifically, both quantitative (6338 valid rating scores) and qualitative data (506 sets of rater comments) were analyzed with respect to rater consistency, rater consensus, rater severity, rater interaction, and raters’ use of rating scale. While raters achieved overall satisfactory inter-rater reliability (r = .73), they differed in severity and achieved relatively low exact score agreement. Disagreement of rating scores was largely explained by two significant main effects: (1) examinees’ oral English proficiency level, that is, raters tend to agree more on higher score levels than on lower score levels; (2) raters’ differential severity due to raters’ varied perceptions of speech intelligibility toward Indian and low-proficient Chinese examinees. However, effect sizes of raters’ differential severity effect on overall rater agreement were rather small, suggesting that varied perceptions among trained raters of second language (L2) intelligibility, though possible, are not likely to have a large impact on the overall evaluation of oral English proficiency. In contrast, at the lower score levels, examinees’ varied language proficiency profiles generated difficulty for rater alignment. Rater disagreement at these levels accounted for most of the overall rater disagreement and thus should be focused on during rater training. Implication of this study is that interpretation of rater performance should not just focus on identifying interactions between raters’ and examinees’ linguistic background but also examine the impact of rater interactions across examinees’ language proficiency levels. Findings of this study also indicate effectiveness of triangulating different sources of data on rater performance using a mixed-methods approach, especially in local testing contexts.

Language Testing | 2016

Elicited Imitation as a Measure of Second Language Proficiency: A Narrative Review and Meta-Analysis.

Xun Yan; Yukiko Maeda; Jing Lv; April Ginther

Elicited imitation (EI) has been widely used to examine second language (L2) proficiency and development and was an especially popular method in the 1970s and early 1980s. However, as the field embraced more communicative approaches to both instruction and assessment, the use of EI diminished, and the construct-related validity of EI scores as a representation of language proficiency was called into question. Current uses of EI, while not discounting the importance of communicative activities and assessments, tend to focus on the importance of processing and automaticity. This study presents a systematic review of EI in an effort to clarify the construct and usefulness of EI tasks in L2 research. The review underwent two phases: a narrative review and a meta-analysis. We surveyed 76 theoretical and empirical studies from 1970 to 2014, to investigate the use of EI in particular with respect to the research/assessment context and task features. The results of the narrative review provided a theoretical basis for the meta-analysis. The meta-analysis utilized 24 independent effect sizes based on 1089 participants obtained from 21 studies. To investigate evidence of construct-related validity for EI, we examined the following: (1) the ability of EI scores to distinguish speakers across proficiency levels; (2) correlations between scores on EI and other measures of language proficiency; and (3) key task features that moderate the sensitivity of EI. Results of the review demonstrate that EI tasks vary greatly in terms of task features; however, EI tasks in general have a strong ability to discriminate between speakers across proficiency levels (Hedges’ g = 1.34). Additionally, construct, sentence length, and scoring method were identified as moderators for the sensitivity of EI. Findings of this study provide supportive construct-related validity evidence for EI as a measure of L2 proficiency and inform appropriate EI task development and administration in L2 research and assessment.

Language Testing | 2018

Interpreting the Relationships between TOEFL iBT Scores and GPA: Language Proficiency, Policy, and Profiles.

April Ginther; Xun Yan

This study examines the predictive validity of the TOEFL iBT with respect to academic achievement as measured by the first-year grade point average (GPA) of Chinese students at Purdue University, a large, public, Research I institution in Indiana, USA. Correlations between GPA, TOEFL iBT total and subsection scores were examined on 1990 mainland Chinese students enrolled across three academic years (N2011 = 740, N2012 = 554, N2013 = 696). Subsequently, cluster analyses on the three cohorts’ TOEFL subsection scores were conducted to determine whether different score profiles might help explain the correlational patterns found between TOEFL subscale scores and GPA across the three student cohorts. For the 2011 and 2012 cohorts, speaking and writing subscale scores were positively correlated with GPA; however, negative correlations were observed for listening and reading. In contrast, for the 2013 cohort, the writing, reading, and total subscale scores were positively correlated with GPA, and the negative correlations disappeared. Results of cluster analyses suggest that the negative correlations in the 2011 and 2012 cohorts were associated with a distinctive Reading/Listening versus Speaking/Writing discrepant score profile of a single Chinese subgroup. In 2013, this subgroup disappeared in the incoming class because of changes made to the University’s international undergraduate admissions policy. The uneven score profile has important implications for admissions policy, the provision of English language support, and broader effects on academic achievement.

Chinese Journal of Applied Linguistics | 2018

Assessment literacy of secondary EFL teachers: Evidence from a regional EFL test

Cong Zhang; Xun Yan

Abstract This study investigated the assessment skills of secondary EFL teachers through analyzing the quality of test items on a municipal English examination for eighth graders in a city located in northern China. Data included students’answers to test items as well as a post-test questionnaire and teachers’ responses to semi-structured interviews. It was found that overall, the test functioned satisfactorily, with most items moderately easy yet discriminating, and high internal consistency and strong correlations among subscale scores. However, item analysis and content review of the test identified some items with incorrect or multiple keys, which may result from a combination of EFL teachers’ inadequate language proficiency and their attempts to write attractive distractors.

Archive | 2016

What Do Test-Takers Say? Test-Taker Feedback as Input for Quality Management of a Local Oral English Proficiency Test

Xun Yan; Suthathip Thirakunkovit; Nancy Kauper; April Ginther

The Oral English Proficiency Test (OEPT) is a computer-administered, semi-direct test of oral English proficiency used to screen the oral English proficiency of prospective international teaching assistants (ITAs) at Purdue University. This paper reports on information gathered from the post-test questionnaire (PTQ), which is completed by all examinees who take the OEPT. PTQ data are used to monitor access to the OEPT orientation video and practice test, to evaluate examinee perceptions of OEPT characteristics and administration, and to identify any problems examinees may encounter during test administration. Responses to the PTQ are examined after each test administration (1) to ensure that no undue or unexpected difficulties are encountered by examinees and (2) to provide a basis for modifications to our administrative procedures when necessary. In this study, we analyzed 1440 responses to both closed-ended and open-ended questions of the PTQ from 1342 test-takers who took the OEPT between August 2009 and July 2012. Responses to these open-ended questions on the OEPT PTQ provided an opportunity to examine an unexpectedly wide variety of response categories. The analysis of the 3-year data set of open-ended items allowed us to better identify and evaluate the effectiveness of changes we had introduced to the test administration process during that same period of time. Carefully considering these responses has contributed substantially to our quality control processes.

Language Testing | 2018

Factor analysis for fairness: Examining the impact of task type and examinee L1 background on scores of an ITA speaking test:

Xun Yan; Lixia Cheng; April Ginther

This study investigated the construct validity of a local speaking test for international teaching assistants (ITAs) from a fairness perspective, by employing a multi-group confirmatory factor analysis (CFA) to examine the impact of task type and examinee first language (L1) background on the internal structure of the test. The test consists of three types of integrated speaking tasks (i.e., text-speaking, graph-speaking, and listening-speaking) and the three L1s that are most represented among the examinees are Mandarin, Hindi, and Korean. Using scores of 1804 examinees across three years, the CFA indicated a two-factor model with a general speaking factor and a listening task factor as the best-fitting internal structure for the test. The factor structure was invariant for examinees across academic disciplines and L1 backgrounds, although the three examinee L1 groups demonstrated different factor variances and factor means. Specifically, while Korean examinees showed a larger variance in oral English proficiency, Hindi examinees demonstrated a higher level of oral proficiency than did Mandarin and Korean examinees. Overall, the lack of significance for multiple task factors and the invariance of factor structure suggest that the test measures the same set of oral English skills for all examinees. Although the factor variances and factor means for oral proficiency differed across examinee L1 subgroups, they reflect the general oral proficiency profiles of English speakers from these selected L1 backgrounds in the university and therefore do not pose serious threats to the fairness of the test. Findings of this study have useful implications for fairness investigations on ITA speaking tests.

Assessing Writing | 2015