Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Steffi Pohl is active.

Publication


Featured researches published by Steffi Pohl.


Multivariate Behavioral Research | 2009

How Bias Reduction Is Affected by Covariate Choice, Unreliability, and Mode of Data Analysis: Results From Two Types of Within-Study Comparisons

Thomas D. Cook; Peter M. Steiner; Steffi Pohl

This study uses within-study comparisons to assess the relative importance of covariate choice, unreliability in the measurement of these covariates, and whether regression or various forms of propensity score analysis are used to analyze the outcome data. Two of the within-study comparisons are of the four-arm type, and many more are of the three-arm type. To examine unreliability, simulations of differences in reliability are deliberately introduced into the 2 four-arm studies. Results are similar across the samples of studies reviewed with their wide range of non-experimental designs and topic areas. Covariate choice counts most, unreliability next most, and the mode of data analysis hardly matters at all. Unreliability has larger effects the more important a covariate is for bias reduction, but even so the very best covariates measured with a reliability of only .60 still do better than substantively poor covariates that are measured perfectly. Why regression methods do as well as propensity score methods used in several different ways is a mystery still because, in theory, propensity scores would seem to have a distinct advantage in many practical applications, especially those where functional forms are in doubt.


Multivariate Behavioral Research | 2010

Modeling Common Traits and Method Effects in Multitrait-Multimethod Analysis

Steffi Pohl; Rolf Steyer

Method effects often occur when constructs are measured by different methods. In traditional multitrait-multimethod (MTMM) models method effects are regarded as residuals, which implies a mean method effect of zero and no correlation between trait and method effects. Furthermore, in some recent MTMM models, traits are modeled to be specific to a certain method. However, often we are not interested in a method-specific trait but in a trait that is common to all methods. Here we present the Method Effect model with common trait factors, which allows modeling “common” trait factors and method factors that represent method “effects” rather than residuals. The common trait factors are defined as the mean of the true-score variables of all variables measuring the same trait and the method factors are defined as differences between true-score variables and means of true-score variables. Because the model allows estimating mean method effects, correlations between method factors, and correlations between trait and method factors, new research questions may be investigated. The application of the model is demonstrated by 2 examples studying the effect of negative, as compared with positive, item wording for the measurement of mood states.


Educational Evaluation and Policy Analysis | 2009

Unbiased Causal Inference From an Observational Study: Results of a Within-Study Comparison

Steffi Pohl; Peter M. Steiner; Jens Eisermann; Renate Soellner; Thomas D. Cook

Adjustment methods such as propensity scores and analysis of covariance are often used for estimating treatment effects in nonexperimental data. Shadish, Clark, and Steiner used a within-study comparison to test how well these adjustments work in practice. They randomly assigned participating students to a randomized or nonrandomized experiment. Treatment effects were then estimated in the experiment and compared to the adjusted nonexperimental estimates. Most of the selection bias in the nonexperiment was reduced. The present study replicates the study of Shadish et al. despite some differences in design and in the size and direction of the initial bias. The results show that the selection of covariates matters considerably for bias reduction in nonexperiments but that the choice of analysis matters less.


Psychological Assessment | 2009

Do Balanced Scales Assess Bipolar Constructs? The Case of the STAI Scales.

Stéphane Vautier; Steffi Pohl

Balanced scales, that is, scales based on items whose content is either negatively or positively polarized, are often used in the hope of measuring a bipolar construct. Research has shown that usually balanced scales do not yield 1-dimensional measurements. This threatens their construct validity. The authors show how to test bipolarity while accounting for method effects. This is demonstrated on a data set of state and trait anxiety measured with the State-Trait Anxiety Inventory (STAI; C. D. Spielberger, R. L. Gorsuch, R. Lushene, P. R. Vagg, & G. A. Jacobs, 1983) scales. Taking a test-retest perspective, assuming temporally stable method effects, the authors tested the bipolarity of the temporal change through suitable constraints specified in a structural equation model adapted from S. Vautier, R. Steyer, and A. Boomsma (2008). The model fit the data closely, chi(2)(13, N = 888) = 20.75, p = .07. Thus, the state and trait scales seem to measure bipolar constructs plus temporally stable method effects. Parameter estimates suggest reliable change scores for the state anxiety scale (rho = .90) and specific method effects for the state and trait scales of the STAI.


Educational and Psychological Measurement | 2013

On Studying Common Factor Variance in Multiple-Component Measuring Instruments

Tenko Raykov; Steffi Pohl

A method for examining common factor variance in multiple-component measuring instruments is outlined. The procedure is based on an application of the latent variable modeling methodology and is concerned with evaluating observed variance explained by a global factor and by one or more additional component-specific factors. The approach furnishes point and interval estimates of the proportion test score variance accounted for by the common factor and by the remaining factors, as well as of the difference in these proportions. The method is readily used in empirical behavioral research for purposes of addressing the issue of whether a scale under consideration is predominantly measuring a construct common to all its components, via application of the widely available software Mplus and R. The described procedure is illustrated using a numerical example.


Educational and Psychological Measurement | 2015

Taking the Missing Propensity Into Account When Estimating Competence Scores Evaluation of Item Response Theory Models for Nonignorable Omissions

Carmen Köhler; Steffi Pohl; Claus H. Carstensen

When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed.


Archive | 2015

Kompetenzmessung in den Bereichen Lesen und Mathematik bei Schülerinnen und Schülern mit sonderpädagogischem Förderbedarf

Anna Südkamp; Steffi Pohl; Katinka Hardt; Anne-Katrin Jordan; Christoph Duchhardt

Die Messung domanenspezifischer Kompetenzen von Schulerinnen und Schulern mit sonderpadagogischem Forderbedarf (SFB) stellt eine Herausforderung fur das Nationale Bildungspanel dar. Im Rahmen zusatzlicher Machbarkeitsstudien ist das Nationale Bildungspanel bestrebt, reliable und zu Schulerinnen und Schulern ohne sonderpadagogischen Forderbedarf vergleichbare Kompetenztestungen fur Schulerinnen und Schuler mit SFB zu ermoglichen. In dieser Studie wird der Frage nachgegangen, wie die Lesekompetenz und die mathematische Kompetenz von Schulerinnen und Schulern mit sonderpadagogischem Forderbedarf im Bereich Lernen (SFB-L) reliabel erfasst werden konnen. Zu diesem Zweck wurden sowohl die Lesekompetenz (N = 404) als auch die mathematische Kompetenz (N = 1.098) von Schulerinnen und Schulern mit SFB-L der Klassestufe 9 getestet. Neben dem Standard-Test, der ursprunglich fur Schulerinnen und Schuler ohne SFB-L entwickelt wurde, wurde eine adaptierte Testversion zur Erfassung der Lesekompetenz eingesetzt. Zur Erfassung der mathematischen Kompetenz wurden zwei adaptierte Testversionen verwendet, die auf die Bedurfnisse von Schulerinnen und Schulern mit SFB-L angepasst worden waren. Die Daten wurden mit Modellen der Item- Response-Theorie ausgewertet. Die Ergebnisse zur Passung von Aufgabenschwierigkeit und Personenfahigkeit zeigen fur beide Kompetenzdomanen, dass die Aufgaben aus den Standard-Tests fur Schulerinnen und Schuler mit besonderem Forderbedarf tendenziell zu schwierig sind. Zudem ist die Trennscharfe vieler Aufgaben gering. Mit den adaptierten Testversionen wird eine deutlich bessere Passung von Aufgabenschwierigkeit und Personenfahigkeit erreicht. Entsprechend resultieren bessere Kennwerte fur die Gute der Aufgaben. Wir stellen die Ergebnisse in den Kompetenzbereichen Lesen und Mathematik gegenuber und gehen auf domanenubergreifende Implikationen fur die Testung von Schulerinnen und Schulern mit besonderem Forderbedarf ein.


Frontiers in Psychology | 2016

Testing Students with Special Educational Needs in Large-Scale Assessments – Psychometric Properties of Test Scores and Associations with Test Taking Behavior

Steffi Pohl; Anna Südkamp; Katinka Hardt; Claus H. Carstensen; Sabine Weinert

Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are—at the same time—measurement invariant to test scores of general education students. We investigated whether we can identify a subgroup of students with SEN-L, for which measurement invariant competence measures of adequate psychometric quality may be obtained with tests available in LSAs. We furthermore investigated whether differences in test-taking behavior may explain dissatisfying psychometric properties and measurement non-invariance of test scores within LSAs. We relied on person fit indices and mixture distribution models to identify students with SEN-L for whom test scores with satisfactory psychometric properties and measurement invariance may be obtained. We also captured differences in test-taking behavior related to guessing and missing responses. As a result we identified a subgroup of students with SEN-L for whom competence scores of adequate psychometric quality that are measurement invariant to those of general education students were obtained. Concerning test taking behavior, there was a small number of students who unsystematically picked response options. Removing these students from the sample slightly improved item fit. Furthermore, two different patterns of missing responses were identified that explain to some extent problems in the assessments of students with SEN-L.


Archive | 2015

Measuring Competencies across the Lifespan - Challenges of Linking Test Scores

Steffi Pohl; Kerstin Haberkorn; Claus H. Carstensen

The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences between cohorts—even when the respective competence is measured with different items. As in NEPS for most of the competencies retest effects are assumed, linking is done via additional link studies in which the tests for two age groups are administered to a separate sample of participants. However, in order to be able to link the test results of two different measurement occasions, certain assumptions, such as, that the measures are invariant across samples and that the tests measure the same construct, need to hold. These are challenging assumptions regarding the linking of competencies across the whole lifespan. Before linking reading tests in NEPS for different age cohorts in secondary school as well as in adulthood, we, thus, investigated unidimensionality of the items for different cohorts as well as measurement invariance across samples. Our results show that the tests for different age groups do measure a unidimensional construct within the same sample. However, measurement invariance of the same test across different samples does not hold for all age groups. Thus, the same test exhibits a different measurement model in different samples. Based on our results, linking may well be justified within secondary school, while linking test scores in secondary school with those in adult age is threatened by differences in the measurement model. Possible reasons for these results are discussed and implications for the design of longitudinal studies as well as for possible analyses strategies are drawn.


Archive | 2012

Modeling traits and method effects as latent variables

Steffi Pohl; Rolf Steyer

Campbell and Fiske (1959) proposed multitrait-multimethod (MTMM) designs for the validation of measurement instruments. In these designs each of several constructs (traits) are measured with the same set of methods. According to Campbell (1959),discriminant validity is supported if the trait under investigation can be distinguished from other traits, andconvergent validity is achieved if different measurement methods yield similar results in measuring the same trait. Multiple methods are also often used in order to improve the precision of the measurement of constructs. Examples are using oral and written exams for assessing mathematical knowledge, self- and peer ratings for measuring personality constructs, or positively and negatively worded items for the measurement of well-being.

Collaboration


Dive into the Steffi Pohl's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Südkamp

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter M. Steiner

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Katinka Hardt

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mario Gollwitzer

University of Koblenz and Landau

View shared research outputs
Researchain Logo
Decentralizing Knowledge