Maria Elena Oliveri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maria Elena Oliveri is active.

Explore More

Publication

Featured researches published by Maria Elena Oliveri.

International Journal of Testing | 2014

Toward Increasing Fairness in Score Scale Calibrations Employed in International Large-Scale Assessments

Maria Elena Oliveri; Matthias von Davier

In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often ignore item misfit in score scale calibrations. We also seek to obtain improved model-data fit estimates when calibrating international score scales. To this end, we examine the use of two alternative score scale calibration procedures: (a) a language-based score scale and (b) a more parsimonious international scale wherein a large proportion of international parameters are used with a subset of country-based parameters for items that misfit in the international scale. In our analyses, we used data from all 40 countries participating in the Progress in International Reading Literacy Study. Our findings revealed that current score scale calibration procedures yield large numbers of misfitting items (higher than 25% for some countries). Our proposed approach diminished the effects of proportion of item misfit on score scale calibrations and also yielded enhanced model-data fit estimates. These results lead to enhancing confidence in measurements obtained from international large-scale assessments.

International Journal of Testing | 2013

Analysis of Sources of Latent Class Differential Item Functioning in International Assessments

Maria Elena Oliveri; Kadriye Ercikan; Bruno D. Zumbo

In this study, we investigated differential item functioning (DIF) and its sources using a latent class (LC) modeling approach. Potential sources of LC DIF related to instruction and teacher-related variables were investigated using substantive and three statistical approaches: descriptive discriminant function, multinomial logistic regression, and multilevel multinomial logistic regression analyses. Results revealed that differential response patterns, as indicated by identification of LCs, were most strongly associated with student achievement levels and teacher-related variables rather than manifest characteristics such as gender, test language, and country, which are the focus of typical measurement comparability research. Findings from this study have important implications for measurement comparability and validity research. Evidence of within-group heterogeneity in the test data structure suggests that the identification of DIF and its sources may not apply to all examinees in the group and that measurement incomparability may be greater among groups that are not defined by manifest variables such as gender and ethnicity. Results suggest that alternative variables that may be more closely related to the investigated construct should be examined when conducting measurement comparability research.

Applied Measurement in Education | 2011

Do Different Approaches to Examining Construct Comparability in Multilanguage Assessments Lead to Similar Conclusions

Maria Elena Oliveri; Kadriye Ercikan

In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of test data structure, reliability comparisons and test characteristic curves) and item-levels (differential item functioning, item parameter correlations, and linguistic comparisons). Results from the test-level analyses indicate that the two language versions of PISA are highly similar as shown by similarity of internal consistency coefficients, test data structure (same number of factors and item factor loadings) and test characteristic curves for the two language versions of the tests. However, results of item-level analyses reveal several differences between the two language versions as shown by large proportions of items displaying differential item functioning, differences in item parameter correlations (discrimination parameters) and number of items found to contain linguistic differences.

International Journal of Science Education | 2013

Investigating Linguistic Sources of Differential Item Functioning Using Expert Think-Aloud Protocols in Science Achievement Tests

Wolff-Michael Roth; Maria Elena Oliveri; Debra Sandilands; Juliette Lyons-Thomas; Kadriye Ercikan

Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In this study, sources of DIF were examined using think-aloud protocols. The think-aloud protocols of expert reviewers were conducted for comparing the English and French versions of 40 items previously identified as DIF (N = 20) and non-DIF (N = 20). Three highly trained and experienced experts in verifying and accepting/rejecting multi-lingual versions of curriculum and testing materials for government purposes participated in this study. Although there is a considerable amount of agreement in the identification of differentially functioning items, experts do not consistently identify and distinguish DIF and non-DIF items. Our analyses of the think-aloud protocols identified particular linguistic, general pedagogical, content-related, and cognitive factors related to sources of DIF. Implications are provided for the process of arriving at the identification of DIF, prior to the actual administration of tests at national and international levels.

International Journal of Testing | 2012

Methodologies for Investigating Item- and Test-Level Measurement Equivalence in International Large-Scale Assessments

Maria Elena Oliveri; Brent F. Olson; Kadriye Ercikan; Bruno D. Zumbo

In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item response theory and ordinal logistic regression. Corresponding derivations of these three DIF methods were investigated at the test-level to examine both differential test functioning (DTF) and the correspondence between findings at the item-level with those at the test-level. Item-level findings suggested consistency in DIF detection across methods; however, differences in effect sizes of DIF were found by each method. Test-level results revealed a high degree of consistency across DTF methods. Discrepancies were found between item- and test-level comparability analyses. Item-level analyses suggested moderate to low degrees of comparability, whereas test-level findings suggested a higher degree of comparability. Findings also indicated the direction of DIF was mixed as some DIF items favored English-speaking students and others favored French-speaking students, suggesting that DIF cancellation may explain why item-level incomparability was not detected at the test-level.

Applied Measurement in Education | 2014

Effects of Population Heterogeneity on Accuracy of DIF Detection.

Maria Elena Oliveri; Kadriye Ercikan; Bruno D. Zumbo

Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on the accuracy of DIF detection. Heterogeneity within such groups may occur for a myriad of reasons including differential lengths of time residing in English-speaking countries, degrees of exposure to English-speaking environments, and amounts of English instruction. Our findings revealed that at high levels of within-group heterogeneity, DIF detection is at the level of chance, implying that a large proportion of DIF items might remain undetected when assessing heterogeneous populations potentially leading to developing biased tests. Based on our findings, we urge test development organizations to consider heterogeneity within ELL and other heterogeneous focus groups in their routine DIF analyses.

International Journal of Testing | 2013

Investigating Sources of Differential Item Functioning in International Large-Scale Assessments Using a Confirmatory Approach

Debra Sandilands; Maria Elena Oliveri; Bruno D. Zumbo; Kadriye Ercikan

International large-scale assessments of achievement often have a large degree of differential item functioning (DIF) between countries, which can threaten score equivalence and reduce the validity of inferences based on comparisons of group performances. It is important to understand potential sources of DIF to improve the validity of future assessments; however, previous attempts to identify sources of DIF have had variable results. This study had two purposes. The first was to apply a confirmatory approach (Poly-SIBTEST) to investigate sources of DIF typically found in international large-scale assessments: adaptation effects and cognitive loadings of items. We conducted three pairwise DIF analyses on Spanish and English versions of the Progress in International Reading Literacy Study 2001 Reader booklet. Results confirmed that item cognitive loadings were a source of differential functioning favoring both England and the United States when compared against Colombia; however, adaptation effects did not consistently favor one group or the other. The second purpose of this study was to highlight strengths and limitations of Poly-SIBTEST for conducting substantive analyses of differential functioning sources and also to offer suggestions for future directions on this type of methodological research.

Applied Measurement in Education | 2016

In Search of Validity Evidence in Support of the Interpretation and Use of Assessments of Complex Constructs: Discussion of Research on Assessing 21st Century Skills

Kadriye Ercikan; Maria Elena Oliveri

ABSTRACT Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the design, scaling, and interpretation aspects; (b) cognitive validity evidence that goes beyond traditional psychometric analyses of response patterns, and (c) cross-cultural validity. We analyze the four articles in this special issue with respect to these three considerations and discuss the kinds of evidence needed to support interpretation of scores from 21st century constructs.

International Journal of Testing | 2014

Uncovering Substantive Patterns in Student Responses in International Large-Scale Assessments—Comparing a Latent Class to a Manifest DIF Approach

Maria Elena Oliveri; Kadriye Ercikan; Bruno D. Zumbo

In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns. It also yielded more DIF items with larger effect sizes and more consistent item response patterns by substantive aspects (e.g., reading comprehension processes and cognitive complexity of items). Based on our findings, we suggest empirically evaluating the homogeneity assumption in international assessments because international populations cannot be assumed to have homogeneous item response patterns. Otherwise, differences in response patterns within these populations may be under-detected when conducting manifest DIF analyses. Detecting differences in item responses across international examinee populations has implications on the generalizability and meaningfulness of DIF findings as they apply to heterogeneous examinee subgroups.

International Journal of Testing | 2015

A Framework for Developing Comparable Multilingual Assessments for Minority Populations: Why Context Matters.

Maria Elena Oliveri; Kadriye Ercikan; Marielle Simon

The assessment of linguistic minorities often involves using multiple language versions of assessments. In these assessments, comparability of scores across language groups is central to valid comparative interpretations. Various frameworks and guidelines describe factors that need to be considered when developing comparable assessments. These frameworks provide limited information in relation to the development of multiple language versions of assessments for assessing linguistic minorities within countries. To this end, we make various suggestions for the types of factors that should be considered when assessing linguistic minorities. Our recommendations are tailored to the particular constraints potentially faced by various jurisdictions tasked with developing multiple language versions of assessments for linguistic minorities. These challenges include having limited financial and staffing resources to develop comparable assessments and having insufficient sample sizes to perform psychometric analyses (e.g., item response theory) to examine comparability. Although we contextualize our study by focusing on linguistic minorities within Canada due to its bilingual status, our findings may also apply to other bilingual and multilingual countries with similar minority/majority contexts.

Explore More