Guillermo Solano-Flores

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guillermo Solano-Flores is active.

Explore More

Publication

Featured researches published by Guillermo Solano-Flores.

Educational Researcher | 2008

Who Is Given Tests in What Language by Whom, When, and Where? The Need for Probabilistic Views of Language in the Testing of English Language Learners

Guillermo Solano-Flores

The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment systems to serve ELLs. The question Who is given tests in what language by whom, when, and where? provides a conceptual framework for examining testing as a communication process between assessment systems and ELLs. Probabilistic approaches based on generalizability theory—a psychometric theory of measurement error—allow examination of the extent to which assessment systems’ inability to effectively communicate with ELLs affects the dependability of academic achievement measures.

International Journal of Testing | 2009

Theory of Test Translation Error.

Guillermo Solano-Flores; Eduardo Backhoff; Luis A. Contreras-Niño

In this article, we present a theory of test translation whose intent is to provide the conceptual foundation for effective, systematic work in the process of test translation and test translation review. According to the theory, translation error is multidimensional; it is not simply the consequence of defective translation but an inevitable fact derived, among many other reasons, from the tension between translation error dimensions—broad categories of translation errors such as those related to semantics, register, or the construct being measured—and the fact that languages encode meaning in different ways. While it cannot be eliminated, translation error can be minimized. The extent to which the translation of a test item is acceptable or objectionable can be understood as a probabilistic space defined by the frequency and the severity of translation errors. Accordingly, the translation of an item can be objectionable because it has few severe errors, many mild errors, or many severe errors. To illustrate the theory, we discuss the methods used in an investigation that examined the quality of items translated in México in the TIMSS-1995 international test comparison and present results from that investigation. The theory can contribute to ensuring proper implementation of test adaptation guidelines in international comparisons and to improving activities related to test translation and test translation review.

Educational Evaluation and Policy Analysis | 1997

Gender and Racial/Ethnic Differences on Performance Assessments in Science

Stephen P. Klein; Jasna Jovanovic; Brian M. Stecher; Dan McCaffrey; Richard J. Shavelson; Edward H. Haertel; Guillermo Solano-Flores; Kathy Comfort

We examined whether the differences in mean scores among gender and racial/ethnic groups on science performance assessments are comparable to the differences that are typically found among these groups on traditional multiple-choice tests. To do this, several hands-on science performance assessments and other measures were administered to over 2,000 students in grades five, six, and nine as part of a field test of California’s statewide testing program. Girls tended to have higher overall mean scores than boys on the performance measures, but boys tended to score higher than girls on certain types of questions within a performance task. In contrast, differences in mean scores among racial/ethnic groups on one type of test (or question) were comparable to the differences among these groups on the other measures studied. Overall, the results suggest that the type of science test used is unlikely to have much effect on gender or racial/ethnic differences in scores.

International Journal of Science Education | 1999

On the Development and Evaluation of a Shell for Generating Science Performance Assessments.

Guillermo Solano-Flores; Jasna Jovanovic; Richard J. Shavelson; Marilyn Bachman

We constructed a shell (blueprint) for generating science performance assessments, and evaluated the characteristics of the assessments produced with it. The shell addressed four tasks: Planning, HandsOn, Analysis, and Application. Two parallel assessments were developed, Inclines (IN) and Friction (FR). Two groups of fifth graders who differed in both science curriculum experience and socioeconomic status took the assessments consecutively in either of two sequences, IN FR or FR IN. We obtained high interrater reliabilities for both assessments, statistically significant score differences due to assessment administration sequence, and a considerable task-sampling measurement error. For both assessments, the magnitude of score variation due to the hands-on task indicated that it tapped a kind of knowledge not addressed by the other three tasks. Although IN and FR were similar in difficulty, they correlated differently with an external measure of science achievement. Moreover, measurement error differed de...

International Journal of Testing | 2002

Concurrent Development of Dual Language Assessments: An Alternative to Translating Tests for Linguistic Minorities.

Guillermo Solano-Flores; Elise Trumbull; Sharon Nelson-Barber

This article describes a model for the concurrent development of assessments in two language versions, as an approach intended to promote more equitable testing. The model is offered as an alternative to the traditional approach of translating tests originally created for a mainstream population of native English speakers. We contend that serious theoretical, methodological, and practical limitations of test translation result from two facts. First, simple translation procedures are not entirely sensitive to the fact that culture is a phenomenon that cannot be dissociated from language. Second, student performance is extremely sensitive to wording, and the wording used in the translated version of an assessment does not undergo the same process of refinement as the wording used in the assessment written in the original language. We report how we used our model to develop mathematics exercises in a school district with a high enrollment of English language learners (ELLs). Seven bilingual teachers from that school district were trained to use the model and developed the English and Spanish versions of the same sets of items. We provide evidence that the model allows assessment developers to give deeper consideration to culture as part of their discussion throughout the entire process of assessment development.

Educational Assessment | 2009

Language Variation and Score Variation in the Testing of English Language Learners, Native Spanish Speakers.

Guillermo Solano-Flores; Min Li

We investigated language variation and score variation in the testing of English language learners, native Spanish speakers. We gave students the same set of National Assessment of Educational Progress mathematics items in both their first language and their second language. We examined the amount of score variation due to the main and interaction effect of student (s), item (i), rater (r), and language (l). We observed a considerable score variation due to the interaction, s × i × l, which indicates that each item has a unique set of linguistic challenges in each language and each student has a unique set of strengths and weaknesses in each language. Parallel results were obtained with samples of students tested across dialects of their native language.

Applied Measurement in Education | 2014

Probabilistic Approaches to Examining Linguistic Features of Test Items and Their Effect on the Performance of English Language Learners

Guillermo Solano-Flores

This article addresses validity and fairness in the testing of English language learners (ELLs)—students in the United States who are developing English as a second language. It discusses limitations of current approaches to examining the linguistic features of items and their effect on the performance of ELL students. The article submits that these limitations stem from the fact that current ELL testing practices are not effective in addressing three basic notions on the nature of language and the linguistic features of test items: (a) language is a probabilistic phenomenon, (b) the linguistic features of test items are multidimensional and interconnected, and (c) each test item has a unique set of linguistic affordances and constraints. Along with the limitations of current testing practices, for each notion, the article discusses evidence of the effectiveness of several probabilistic approaches to examining the linguistic features of test items in ELL testing.

Educational Assessment | 2014

Developing Testing Accommodations for English Language Learners: Illustrations as Visual Supports for Item Accessibility.

Guillermo Solano-Flores; Chao Wang; Rachel Kachchaf; Lucinda Soltero-González; Khanh Nguyen-Le

We address valid testing for English language learners (ELLs)—students in the United States who are schooled in English while they are still acquiring English as a second language. Also, we address the need for procedures for systematically developing ELL testing accommodations—changes in tests intended to support ELLs to gain access to the content of items. We report our experience developing and investigating illustration-based accommodations for ELLs in terms of five main activities: specification, design, production, evaluation, and revision. Our findings indicate that, to benefit from a given accommodation, ELLs need to possess certain skills that enable them to use it but that are not related to the knowledge assessed. Critical to developing effective accommodations is examining how their properties interact with the characteristics of the students and the characteristics of test items.

Educational Assessment | 2013

Semiotic Structure and Meaning Making: The Performance of English Language Learners on Mathematics Tests

Guillermo Solano-Flores; Carne Barnett-Clarke; Rachel Kachchaf

We examined the performance of English language learners (ELLs) and non-ELLs on Grade 4 and Grade 5 mathematics content knowledge (CK) and academic language (AL) tests. CK and AL items had different semiotic loads (numbers of different types of semiotic features) and different semiotic structures (relative frequencies of different semiotic modalities). For both linguistic groups, the percentage of items correct was higher for the AL than the CK tests. However, the score gains attributable to instruction were smaller for the AL than the CK tests. CK and AL test scores correlated more highly for non-ELLs than ELLs before instruction. This suggests that, before instruction, the meaning-making system of ELLs was less consolidated than that of non-ELLs, who dealt with the interpretive demands of CK and AL items with similar effectiveness. We discuss the importance of using a semiotic perspective in test design and in interpreting ELL student performance.

Applied Measurement in Education | 2012

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners.

Rachel Kachchaf; Guillermo Solano-Flores

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in English and Spanish. We examined the mean scores given by raters of different language backgrounds. Also, using generalizability theory, we examined the amount of score variation due to student (the object of measurement) and four sources of measurement error—item, language of testing, rater language background, and rater nested in rater language background. We observed a small, statistically significant difference between mean scores given by raters of different language background and a negligible score variation due to the main and interaction effect of rater. Provided that they are certified bilingual teachers, and regardless of language of testing, raters of different language backgrounds can score ELL responses to short-answer, open-ended items with comparable reliability.

Explore More