Matthias von Davier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthias von Davier is active.

Explore More

Publication

Featured researches published by Matthias von Davier.

Applied Psychological Measurement | 2018

The Effects of Vignette Scoring on Reliability and Validity of Self-Reports:

Matthias von Davier; Hyo-Jeong Shin; Lale Khorramdel; Lazar Stankov

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups. Vignettes are used to adjust self-assessment responses on the respondent level but entail significant assumptions: They are supposed to be invariant across respondents, and the responses to vignette prompts are supposed to be without error and strictly ordered. This article shows that these assumptions are not always met and that the anchoring vignette approach leads to higher Cronbach’s alpha values and increased correlations among adjusted variables regardless of whether the assumptions of the approach are met or violated. Results suggest that the underlying assumptions and effects of the anchoring vignette approach should be carefully examined as the increased correlations and reliability estimates can be observed even for response variables that are independent random draws and uncorrelated with any other variable.

Journal of Psychoeducational Assessment | 2018

A Note on Construct Validity of the Anchoring Method in PISA 2012

Lazar Stankov; Jihyun Lee; Matthias von Davier

We examine construct validity of the anchoring method used with 12 noncognitive scales from the Programme for International Student Assessment (PISA) 2012 project. This method combines individuals’ responses to vignettes and self-rated scores based on Likert-type items. It has been reported that the use of anchoring vignettes can reverse country-level correlations between academic achievement scores and noncognitive measures from negative to positive, and therefore align them with the typically reported individual-level correlations. Using the PISA 2012 data, we show that construct validity of this approach may be open to question because the anchored scales produce a different set of latent dimensions in comparison with nonanchored scales, even though both scales were created from the same set of individual responses. We also demonstrate that only one of three vignettes may be responsible for the resolution of the “paradox” highlighting that the choice of vignettes may be more important than what was previously reported.

Archive | 2017

Collaborative Problem Solving Measures in the Programme for International Student Assessment (PISA)

Qiwei He; Matthias von Davier; Samuel Greiff; Eric W. Steinhauer; Paul B. Borysewicz

Collaborative problem solving (CPS) is a critical and necessary skill in educational settings and the workforce. The assessment of CPS in the Programme for International Student Assessment (PISA) 2015 focuses on the cognitive and social skills related to problem solving in collaborative scenarios: establishing and maintaining shared understanding, taking appropriate actions to solve problems, and establishing and maintaining group organization. This chapter draws on measures of the CPS domain in PISA 2015 to address the development and implications of CPS items, challenges, and solutions related to item design, as well as computational models for CPS data analysis in large-scale assessments. Measuring CPS skills is not only a challenge compared to measuring individual skills but also an opportunity to make the cognitive processes in teamwork observable. An example of a released CPS unit in PISA 2015 will be used for the purpose of illustration. This study also discusses future perspectives in CPS analysis using multidimensional scaling, in combination with process data from log files, to track the process of students’ learning and collaborative activities.

Archive | 2013

Differentiating Response Styles and Construct-Related Responses: A New IRT Approach Using Bifactor and Second-Order Models

Matthias von Davier; Lale Khorramdel

Response styles (RS) can interfere if rating or Likert scales are used to assess personality traits, biasing survey results and affecting the validity of the measurement. Based on a new approach using multidimensional item response theory (MIRT) models (Bockenholt, U., Psychological Methods, 2012, doi: 10.1037/a0028111), rating data can be analyzed and RS can be controlled for by a decomposition of the ordinal response into different sequential processes using a binary decision tree. Expanding on this approach (using multiscale questionnaires), Khorramdel and von Davier (in press) showed that the obtained RS measures are not always unidimensional but may be confounded with trait-related responses. The current paper addresses this problem by applying bifactor and second-order item response theory (IRT) models to pseudo items, examining extreme RS (ERS) and midpoint RS (MRS). In contrast to simple-structure MIRT models, these models allow item responses to depend on two factors: a general factor (RS) and a specific factor (personality trait). Findings from two empirical applications using questionnaires which measure the Big Five personality dimensions and a five-point Likert scale show a superior model fit of the bifactor model with regard to ERS and MRS. While the ERS measure clearly appears to be a distinct RS measure, the MRS appears not to be, but both response style measurements show high model-based reliabilities.

The Annual Meeting of the Psychometric Society | 2016

New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models

Matthias von Davier

The second generation of a parallel algorithm for generalized latent variable models, including MIRT models and extensions, on the basis of the general diagnostic model (GDM) is presented. This new development further improves the performance of the parallel-E parallel-M algorithm presented in an earlier report by means of additional computational improvements that produce even larger gains in performance. The additional gain achieved by this second-generation parallel algorithm reaches factor 20 for several of the examples reported with a sixfold gain based on the first generation. The estimation of a multidimensional IRT model for large-scale data may show a larger reduction in runtime compared to a multiple-group model which has a structure that is more conducive to parallel processing of the E-step. Multiple population models can be arranged such that the parallelism directly exploits the ability to estimate multiple latent variable distributions separately in independent threads of the algorithm.

Quality Assurance in Education | 2018

Detecting and treating errors in tests and surveys

Matthias von Davier

Surveys that include skill measures may suffer from additional sources of error compared to those containing questionnaires alone. Examples are distractions such as noise or interruptions of testing sessions, as well as fatigue or lack of motivation to succeed. This paper aims to provide a review of statistical tools based on latent variable modeling approaches extended by explanatory variables that allow detection of survey errors in skill surveys.,This paper reviews psychometric methods for detecting sources of error in cognitive assessments and questionnaires. Aside from traditional item responses, new sources of data in computer-based assessment are available – timing data from the Programme for the International Assessment of Adult Competencies (PIAAC) and data from questionnaires – to help detect survey errors.,Some unexpected results are reported. Respondents who tend to use response sets have lower expected values on PIAAC literacy scales, even after controlling for scores on the skill-use scale that was used to derive the response tendency.,The use of new sources of data, such as timing and log-file or process data information, provides new avenues to detect response errors. It demonstrates that large data collections need to better utilize available information and that integration of assessment, modeling and substantive theory needs to be taken more seriously.

Psychometrika | 2018

Automated Item Generation with Recurrent Neural Networks

Matthias von Davier

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

Measurement: Interdisciplinary Research & Perspective | 2018

Diagnosing Diagnostic Models: From Von Neumann's Elephant to Model Equivalencies and Network Psychometrics.

Matthias von Davier

ABSTRACTThis article critically reviews how diagnostic models have been conceptualized and how they compare to other approaches used in educational measurement. In particular, certain assumptions that have been taken for granted and used as defining characteristics of diagnostic models are reviewed and it is questioned whether these assumptions are the reason why these models have not had the success in operational analyses and large-scale applications, contrary to what many have hoped. The article draws on recent results presented by diagnostic modeling scholars and related fields.ABSTRACT This article critically reviews how diagnostic models have been conceptualized and how they compare to other approaches used in educational measurement. In particular, certain assumptions that have been taken for granted and used as defining characteristics of diagnostic models are reviewed and it is questioned whether these assumptions are the reason why these models have not had the success in operational analyses and large-scale applications, contrary to what many have hoped. The article draws on recent results presented by diagnostic modeling scholars and related fields.

Large-scale Assessments in Education | 2017