Rie Koizumi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rie Koizumi is active.

Explore More

Publication

Featured researches published by Rie Koizumi.

Language Testing | 2009

A Meta-Analysis of Test Format Effects on Reading and Listening Test Performance: Focus on Multiple-Choice and Open-Ended Formats.

Yo In'nami; Rie Koizumi

A meta-analysis was conducted on the effects of multiple-choice and open-ended formats on L1 reading, L2 reading, and L2 listening test performance. Fifty-six data sources located in an extensive search of the literature were the basis for the estimates of the mean effect sizes of test format effects. The results using the mixed effects model of meta-analysis indicate that multiple-choice formats are easier than open-ended formats in L1 reading and L2 listening, with the degree of format effect ranging from small to large in L1 reading and medium to large in L2 listening. Overall, format effects in L2 reading are not found, although multiple-choice formats are found to be easier than open-ended formats when any one of the following four conditions is met: the studies involve between-subjects designs, random assignment, stem-equivalent items, or learners with a high L2 proficiency level. Format effects favoring multiple-choice formats across the three domains are consistently observed when studies employ between-subjects designs, random assignment, or stem-equivalent items.

Language Testing | 2016

Task and Rater Effects in L2 Speaking and Writing: A Synthesis of Generalizability Studies.

Yo In’nami; Rie Koizumi

We addressed Deville and Chalhoub-Deville’s (2006), Schoonen’s (2012), and Xi and Mollaun’s (2006) call for research into the contextual features that are considered related to person-by-task interactions in the framework of generalizability theory in two ways. First, we quantitatively synthesized the generalizability studies to determine the percentage of variation in L2 speaking and L2 writing performance that was accounted for by tasks, raters, and their interaction. Second, we examined the relationships between person-by-task interactions and moderator variables. We used 28 datasets from 21 studies for L2 speaking, and 22 datasets from 17 studies for L2 writing. Across modalities, most of the score variation was explained by examinees’ performance; the interaction effects of tasks or raters were greater than the independent effects of tasks or raters. Task and task-related interaction effects explained a greater percentage of the score variances, than did the rater and rater-related interaction effects. The variances associated with the person-by-task interactions were larger for assessments based on both general and academic contexts, than for those based only on academic contexts. Further, large person-by-task interactions were related to analytic scoring and scoring criteria with task-specific language features. These findings derived from L2 speaking studies indicate that contexts, scoring methods, and scoring criteria might lead to varied performance over tasks. Consequently, this particularly requires us to define constructs carefully.

Archive | 2013

Structural Equation Modeling in Educational Research

Yo In’nami; Rie Koizumi

Structural equation modeling (SEM) is a collection of statistical methods for modeling the multivariate relationship between variables. It is also called covariance structure analysis or simultaneous equation modeling and is often considered an integration of regression and factor analysis.

International Journal of Testing | 2010

Can Structural Equation Models in Second Language Testing and Learning Research be Successfully Replicated

Yo In'nami; Rie Koizumi

Because structural equation models are widely used in testing and assessment, investigation into the accuracy of such models may help raise awareness of the value of reanalysis or replication. We focused on second language testing and learning studies and examined: (a) To what extent is information necessary for replication provided by authors? (b) To what extent can the original models be successfully replicated? Regarding (a), we e-mailed authors of 31 articles that did not contain information needed to replicate the study and asked them for the missing information. We obtained data from only four authors. Regarding (b), we succeeded in replicating 89% of the models in the preliminary analysis, 87% to 100% of fit indices, and 94% of parameter estimates. The results suggest that for the most part, structural equation modeling research reported in second language testing and learning research is accurate.

Assessment in Education: Principles, Policy & Practice | 2016

Current issues in large-scale educational assessment in Japan: focus on national assessment of academic ability and university entrance examinations

Naoki Kuramoto; Rie Koizumi

Abstract Currently, large-scale testing in Japan faces conflicting requirements derived from principles of education on the one hand, and measurement, on the other. Issues of affective ambivalence towards tests (i.e. test aversion and dependence) are also observed. The seemingly conflicting government discussions regarding the national assessment of academic achievement at primary and middle schools, and reforms to university entrance examinations, are discussed here in terms of these issues.

Language Assessment Quarterly | 2009

Development of a Practical Speaking Test with a Positive Impact on Learning Using a Story Retelling Technique

Akiyo Hirai; Rie Koizumi

This article presents a test development project for classroom speaking assessment. With the aim of enhancing and specifically easing the process of test preparation and administration and generating positive washback effects on learning, we developed a semi-direct speaking test called the Story Retelling Speaking Test (SRST). Although a story retelling technique has already been widely recognized as a teaching activity, its use for speaking assessment has not been fully studied. Thus, the article discusses the potentiality of using this technique for the SRST and reports its pilot administration to 43 examinees. As a result, the high practicality of the test was confirmed at the test construction and implementation stages. In addition, the questionnaire distributed to the examinees yielded generally positive results regarding their perception toward the test usefulness and the appropriateness of the test procedures and task difficulty. With regard to the appropriateness of the texts, the examinees perceived that the retelling of stories was influenced most by text content and then by text length; however, these two factors appear to be interrelated. On the basis of these responses, we have suggested some revisions of the SRST and future validation and reliability studies.

International Journal of Testing | 2013

Review of Sample Size for Structural Equation Models in Second Language Testing and Learning Research: A Monte Carlo Approach

Yo In’nami; Rie Koizumi

The importance of sample size, although widely discussed in the literature on structural equation modeling (SEM), has not been widely recognized among applied SEM researchers. To narrow this gap, we focus on second language testing and learning studies and examine the following: (a) Is the sample size sufficient in terms of precision and power of parameters in a model using Monte Carlo analysis? (b) How are the results from Monte Carlo sample size analysis comparable with those from the N ≥ 100 rule and from the N: q ≥ 10 (sample size–free parameter ratio) rule? Regarding (a), parameter bias, standard error bias, coverage, and power were overall satisfactory, suggesting that sample size for SEM models in second language testing and learning studies is generally appropriate. Regarding (b), both rules were often inconsistent with the Monte Carlo analysis, suggesting that they do not serve as guidelines for sample size. We encourage applied SEM researchers to perform Monte Carlo analyses to estimate the requisite sample size of a model.

Language Assessment Quarterly | 2011

Development and Validation of a Diagnostic Grammar Test for Japanese Learners of English

Rie Koizumi; Hideki Sakai; Takahiro Ido; Hiroshi Ota; Megumi Hayama; Masatoshi Sato; Akiko Nemoto

This article reports on the development and validation of the English Diagnostic Test of Grammar (EDiT Grammar) for Japanese learners of English. From among the many aspects of grammar, this test focuses on the knowledge of basic English noun phrases (NPs), especially their internal structures, because previous research has indicated the difficulty faced by Japanese learners of English in acquiring these phrases. The results of the examination of the pilot and revised tests suggest that the revised test is generally appropriate in terms of item discrimination, distractor function, reliability, and the difficulty of NP groups. Overall, the EDiT Grammar provided favorable evidence for the validity in interpreting the test scores in the case of a fairly low-stakes diagnostic test.

Language Assessment Quarterly | 2013

Validation of Empirically Derived Rating Scales for a Story Retelling Speaking Test

Akiyo Hirai; Rie Koizumi

In recognition of the rating scale as a crucial tool of performance assessment, this study aims to establish a rating scale suitable for a Story Retelling Speaking Test (SRST), which is a semidirect test of speaking ability in English as a foreign language for classroom use. To identify an appropriate scale, three rating scales, all of which have been designed to have diagnostic functions, were developed for the SRST and compared in terms of their reliability, validity, and practicality. The three scales were (a) an empirically derived, binary-choice, boundary-definition (called EBB1) scale, which has four criteria (Communicative Efficiency, Content, Grammar & Vocabulary, and Pronunciation); (b) an EBB2 scale that was modified from the EBB1 scale and has three criteria (Communicative Efficiency, Grammar & Vocabulary, and Pronunciation); and (c) a multiple-trait (MT) scale that was modified from the EBB2 but has a conventional analytic scale format. The results of the comparison revealed that the EBB2 was the most reliable and valid measure for assessing speech performance in the context of story retelling. However, the MT scale was shown to be the most practical, even though the EBB2 permits more careful scoring, which suggests the influence of the rating scale format on test qualities.

英語教學期刊 | 2014

Research Synthesis and Meta-Analysis in Second Language Learning and Testing

Rie Koizumi

Research synthesis and meta-analysis are useful techniques for summarizing studies to understand what is known/unknown and to accumulate knowledge in an academic domain. In this paper, we attempt to provide an overview of the status quo of research synthesis and meta-analysis in second language learning and testing. We compare research synthesis with traditional methods of literature review (i.e., narrative and vote-counting methods), describe qualitative and quantitative research syntheses, and report a quantitative synthetic approach that simply counts the number of studies for each category without using any statistical significance. Also included are the strengths (e.g., transparency, effect sizes, moderator variable analysis) and limitations (e.g., apples-and-oranges, study quality, and file-drawer problems) of research synthesis. Finally, we review whether and how the four challenges Norris and Ortega (2006) raised for synthesists to overcome have been since addressed (i.e., reporting inconsistency, study quality, non-English-language studies, and integration of qualitative and quantitative syntheses). We find that while steady progress has been made, these challenges have not yet been fully handled.

Explore More