Yong-Won Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yong-Won Lee is active.

Explore More

Publication

Featured researches published by Yong-Won Lee.

Language Testing | 2006

Dependability of scores for a new ESL speaking assessment consisting of integrated and independent tasks

Yong-Won Lee

A multitask speaking measure consisting of both integrated and independent tasks is expected to be an important component of a new version of the TOEFL test. This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (1) combining scores on different task types into a composite score and (2) rating each task only once? To answer these questions, generalizability theory (G-theory) procedures were used to examine the impact of the numbers of tasks and raters per speech sample and of subsection lengths on the dependability of speaking scores. Univariate and multivariate G-theory analyses were conducted on rating data collected for 261 examinees for the study. The finding in the univariate analyses was that it would be more efficient to increase the number of tasks rather than the number of ratings per speech sample in maximizing the score dependability. The multivariate G-theory analyses also revealed that (1) the universe (or true) scores among the task-type subsections were very highly correlated and that (2) slightly larger gains in composite score reliability would result from increasing the number of listening - speaking tasks for the fixed section lengths.

Educational and Psychological Measurement | 2005

Comparability of TOEFL CBT Essay Prompts: Response-Mode Analyses

Hunter M. Breland; Yong-Won Lee; Eiji Muraki

Eighty-three Test of English as a Foreign Language (TOEFL) writing prompts administered via computer-based testing between July 1998 and August 2000 were examined for differences attributable to the response mode (handwriting or word processing) chosen by examinees. Differences were examined statistically using polytomous logistic regression. A variable measuring English-language ability (ELA) was developed from the multiple-choice components of the TOEFL and used as a matching variable. Although there was little observed difference in mean writing scores, when examinees were matched on ELA, small differences were observed in effect sizes consistently favoring the handwriting response mode. This difference favoring the handwriting response mode occurred for all of the writing prompts analyzed, suggesting a general effect for response mode. Differences for individual writing prompts were small, however.

International Journal of Testing | 2005

Comparability of TOEFL CBT Writing Prompts for Different Native Language Groups

Yong-Won Lee; Hunter M. Breland; Eiji Muraki

This study has investigated the comparability of computer-based testing writing prompts in the Test of English as a Foreign LanguageTM (TOEFL) for examinees of different native language backgrounds. A total of 81 writing prompts introduced from July 1998 through August 2000 were examined using a 3-step logistic regression procedure for ordinal items. An English language ability (ELA) variable was created by summing the standardized TOEFL Reading, Listening, and Structure scale scores. This ELA variable was used to match examinees of East Asian (Chinese, Japanese, and Korean) and European (German, French, and Spanish) language groups. Although about one third of the 81 prompts were initially flagged because of statistically significant group effects; the effect sizes were too small for any of those flagged prompts to be classified as having an important group effect.

International Journal of Testing | 2007

Evaluating Prototype Tasks and Alternative Rating Schemes for a New ESL Writing Test through G-theory

Yong-Won Lee; Robert Kantor

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL® (Test of English as a Foreign Language™). This study examines the impact of various rating designs and of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the perspective of generalizability theory (G-theory). Both univariate and multivariate G-theory analyses were conducted. It was found that (a) in terms of maximizing the score dependability, it would be more efficient to increase the number of tasks rather than the number of raters per essay; (b) two particular single-rating designs of “having different tasks for the same examinee rated by different raters” [p × (R:T), R:(p × T)] achieved relatively higher score dependability than other single-rating designs; and (c) a somewhat larger gain in composite score reliability was achieved when the number of listening–writing tasks was larger than that of reading–writing tasks.

Language Testing | 2015

Diagnosing diagnostic language assessment

Yong-Won Lee

Diagnostic language assessment (DLA) is gaining a lot of attention from language teachers, testers, and applied linguists. With a recent surge of interest in DLA, there seems to be an urgent need to assess where the field of DLA stands at the moment and develop a general sense of where it should be moving in the future. The current article, as the first article in this special issue, aims to provide a general theoretical background for discussion of DLA and address some fundamental issues surrounding DLA. More specifically, the article (a) examines some of the defining characteristics of DLA and its major components, (b) reviews the current state of DLA in conjunction with these components, and (c) identifies some promising areas of future research and development of DLA where important breakthroughs can be made in the future. Some of the major obstacles and challenges facing DLA are identified and discussed, along with some possible solutions to them.

Applied Measurement in Education | 2007

Investigating Uniform and Non-Uniform Gender DIF in Computer-Based ESL Writing Assessment

Hunter M. Breland; Yong-Won Lee

The objective of the present investigation was to examine the comparability of writing prompts for different gender groups in the context of the computer-based Test of English as a Foreign Language™ (TOEFL®-CBT). A total of 87 prompts administered from July 1998 through March 2000 were analyzed. An extended version of logistic regression for polytomous items was used to investigate both uniform and non-uniform gender effects. An English Language Ability variable was developed from the multiple-choice components of the TOEFL®-CBT examination and used as a matching variable. Initially, most of the prompts were flagged because of statistically significant uniform gender effects, with some prompts displaying non-uniform effects as well. Nevertheless, the effect sizes were too small for any of those flagged prompts to be classified as having an important group effect. These findings are discussed in relation to prompt content review, gender format differences, and second language learning theories.

Journal of Psycholinguistic Research | 2014

Animacy Effect and Language Specificity: Judgment of Unaccusative Verbs by Korean Learners of English as a Foreign Language

Hye K. Pae; Brian Schanding; Yeon-Jin Kwon; Yong-Won Lee

This study investigated the tendency of overpassivization of unaccusative verbs by Korean learners of English as a foreign language (FL). Sixty Korean native college students participated in the study, along with 17 English-speaking counterparts serving as a comparison group. Consistent with the findings of previous research, this study found Korean students’ tendency to incorrectly accept passive-voice with inanimate subjects. The results of this study highlighted the role of lexical animacy, the hierarchy of agentivity, and language-specific effects on FL judgment. The findings of this study suggest a robust language-specific L1 effect on L2 acquisition and a greater involvement of cognition in FL use than language input.

Journal of Psycholinguistic Research | 2015

The Resolution of Visual Noise in Word Recognition.

Hye K. Pae; Yong-Won Lee

This study examined lexical processing in English by native speakers of Korean and Chinese, compared to that of native speakers of English, using normal, alternated, and inverse fonts. Sixty four adult students participated in a lexical decision task. The findings demonstrated similarities and differences in accuracy and latency among the three L1 groups. The participants, regardless of L1, had a greater advantage in nonwords than words for the normal fonts because they were able to efficiently detect the illegal letter strings. However, word advantages were observed in the visually distorted stimuli (i.e., alternated and inverse fonts). These results were explained from the perspectives of the theory of psycholinguistic grain size, L1–L2 distance, and the mechanism of familiarity discrimination. The native speakers of Chinese were more sensitive to visual distortions than the Korean counterpart, suggesting that the linguistic template established in L1 might play a role in word processing in English.

Language Testing | 2015

Future of diagnostic language assessment

Yong-Won Lee

Diagnostic language assessment (DLA) has emerged as a major topic of interest among language testers, as evidenced by recent publications of journal articles and special journal issues on this topic (Alderson, Brunfaut, & Harding, 2014; Lee & Sawaki, 2009). Two fundamental goals of DLA are to identify language learners’ weaknesses and deficiencies, as well as their strengths, in the targeted language domains and provide useful diagnostic feedback and guidance for remedial learning and instruction. In other words, DLA seeks to promote further learning designed to address the test-takers’ weaknesses and increase their overall growth potential. Thus, it is important to create meaningful linkages between outcomes of diagnosis and subsequent learning and instruction when designing the DLA system. At the moment, DLA, as a subfield of language assessment, is at an important juncture on its course of development and evolution. In order to advance the field beyond where it stands, breakthroughs can be made on multiple fronts, which include, but are not limited to, refining frameworks and methodology for diagnosis, feedback, and guidance for remedial instruction. One of the urgent issues is to come up with workable frameworks of DLA (whether they are theoretical or practical) that can guide the whole process of designing, developing, implementing, and validating DLA. These require us not only to review previous work and ongoing developments in DLA but also to look into, and gain insights from, various related fields of inquiry, such as the following: (a) fields where diagnosis is frequently practiced; (b) dynamic language assessment; (c) cognitive diagnostic assessment models; (d) technological innovations in assessment and scoring; and (e) feedback research in second language acquisition and writing. The primary goal of this special issue is to bring together expertise and insights from various fields related to DLA, with a view to providing a focused forum through which current thinking and ideas about DLA are actively shared among researchers and practitioners and thereby facilitating development of a shared understanding among language

Foreign Languages Education | 2016

Investigating Patterns of Writing Errors for Different L1 Groups through Error-Coded ESL Learners’ Essays

Yong-Won Lee; Martin Chodorow; Claudia Gentile

Automated error detection and feedback systems are becoming an important component of online writing practice services for ESL/EFL (English as a second/foreign language) learners. The main purposes of the study are to: (a) collect samples of essays written by ESL learners with different native language (or L1) backgrounds that are error-coded by an early version of an automated error-detection system (CritiqueTM) and trained human coders; and (b) identify some unique patterns of writing errors for different first language (L1) groups. Data analyzed in this study included 18, 439 TOEFL◯R CBT essays error-coded by CritiqueTM and a much smaller, combined sample of 480 TOEFL◯R CBT/TOEFL iBT◯R essays error-coded by trained human coders. A comparison of error rates across five different language groups showed some unique patterns: (a) the Arabic and Spanish groups were the highest on both spelling and punctuation errors; (b) the Korean and Japanese groups had the highest article error frequency; and (c) the Chinese group had the highest number of errors related to verb conjugations or adjective and noun inflections. The implications of these findings are discussed in terms of understanding the nature of L1-related writing errors and enhancing the automated error detection and feedback systems.

Explore More