Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark J. Gierl is active.

Publication


Featured researches published by Mark J. Gierl.


Applied Measurement in Education | 2001

Evaluating Type I Error and Power Rates Using an Effect Size Measure With the Logistic Regression Procedure for DIF Detection

Michael G. Jodoin; Mark J. Gierl

The logistic regression (LR) procedure for differential item functioning (DIF) detection is a model-based approach designed to identify both uniform and nonuniform DIF. However, this procedure tends to produce inflated Type I errors. This outcome is problematic because it can result in the inefficient use of testing resources, and it may interfere with the study of the underlying causes of DIF. Recently, an effect size measure was developed for the LR DIF procedure and a classification method was proposed. However, the effect size measure and classification method have not been systematically investigated. In this study, we developed a new classification method based on those established for the Simultaneous Item Bias Test. A simulation study also was conducted to determine if the effect size measure affects the Type I error and power rates for the LR DIF procedure across sample sizes, ability distributions, and percentage of DIF items included on a test. The results indicate that the inclusion of the effect size measure can substantially reduce Type I error rates when large sample sizes are used, although there is also a reduction in power.


Archive | 2007

Cognitive diagnostic assessment for education : theory and applications

Jacqueline P. Leighton; Mark J. Gierl

Preface Part I. The Basis of Cognitive Diagnostic Assessment: 1. Defining cognitive diagnostic assessment in education Jacqueline P. Leighton and Mark J. Gierl 2. The demand for diagnostic testing Kristen Huff 3. Philosophical rationale for cognitive models Stephen Norris 4. Cognitive psychology as it applies to diagnostic assessment Robert J. Mislevy 5. Construct validity and diagnostic testing Susan Umbretson Part II. Methods and Application of Cognitive Diagnostic Assessment: 6. Cognitive models and diagnostic assessment Jacqueline P. Leighton 7. Test construction Joanna Gorin 8. The attribute hierarchy method Mark J. Gierl, Jacqueline P. Leighton, and Steve Hunka 9. The fusion model as implemented with ARPEGGIO William Stout 10. Score reporting Part III. The Future of Cognitive Diagnostic Assessment: 11. Unresolved issues in cognitive diagnostic assessment 12. Summary and conclusion Mark J. Gierl and Jacqueline P. Leighton.


Journal of Experimental Education | 1995

Anxieties and Attitudes Related to Mathematics in Grades 3 and 6

Mark J. Gierl; Jeffrey Bisanz

Abstract Little is known about the development of mathematics anxiety in elementary school students. To address this gap in knowledge, the authors evaluated students in Grades 3 and 6 on measures of mathematics anxiety, school test anxiety, and attitudes toward mathematics to determine (a) whether different forms of mathematics anxiety exist, (b) whether mathematics test anxiety differs from school test anxiety, and (c) whether mathematics anxiety is related to different attitudes toward mathematics. Evidence was found for two distinct forms of mathematics anxiety: test and problem-solving anxiety. Mathematics test anxiety increased with age relative to mathematics problem-solving anxiety; this result demonstrated that children become more anxious about mathematics testing situations as they progress through school. Mathematics test anxiety was related, but not identical, to school test anxiety, and students in both grades were less anxious about math tests than about academic testing generally. Finally, ...


Applied Measurement in Education | 2004

Comparability of Bilingual Versions of Assessments: Sources of Incomparability of English and French Versions of Canada's National Achievement Tests

Kadriye Ercikan; Mark J. Gierl; Tanya McCreith; Gautam Puhan; Kim Koh

This research examined the degree of comparability and sources of incomparability of English and French versions of reading, mathematics, and science tests that were administered as part of a survey of achievement in Canada. The results point to substantial psychometric differences between the 2 language versions. Approximately 18% to 36% of the items were identified as differentially functioning for the 2 language groups. Large proportions of these differential item functioning (DIF) items, 36% to 100% across age groups and content areas, were attributed to adaptation related differences. A smaller proportion, 27% to 33% of the DIF items, was attributed to curricular differences. Twenty-four to 49% of DIF items could not be attributed to either of the 2 sources considered in the study.


BMC Health Services Research | 2011

Validation of the conceptual research utilization scale: an application of the standards for educational and psychological testing in healthcare.

Janet E. Squires; Carole A. Estabrooks; Christine V. Newburn-Cook; Mark J. Gierl

BackgroundThere is a lack of acceptable, reliable, and valid survey instruments to measure conceptual research utilization (CRU). In this study, we investigated the psychometric properties of a newly developed scale (the CRU Scale).MethodsWe used the Standards for Educational and Psychological Testing as a validation framework to assess four sources of validity evidence: content, response processes, internal structure, and relations to other variables. A panel of nine international research utilization experts performed a formal content validity assessment. To determine response process validity, we conducted a series of one-on-one scale administration sessions with 10 healthcare aides. Internal structure and relations to other variables validity was examined using CRU Scale response data from a sample of 707 healthcare aides working in 30 urban Canadian nursing homes. Principal components analysis and confirmatory factor analyses were conducted to determine internal structure. Relations to other variables were examined using: (1) bivariate correlations; (2) change in mean values of CRU with increasing levels of other kinds of research utilization; and (3) multivariate linear regression.ResultsContent validity index scores for the five items ranged from 0.55 to 1.00. The principal components analysis predicted a 5-item 1-factor model. This was inconsistent with the findings from the confirmatory factor analysis, which showed best fit for a 4-item 1-factor model. Bivariate associations between CRU and other kinds of research utilization were statistically significant (p < 0.01) for the latent CRU scale score and all five CRU items. The CRU scale score was also shown to be significant predictor of overall research utilization in multivariate linear regression.ConclusionsThe CRU scale showed acceptable initial psychometric properties with respect to responses from healthcare aides in nursing homes. Based on our validity, reliability, and acceptability analyses, we recommend using a reduced (four-item) version of the CRU scale to yield sound assessments of CRU by healthcare aides. Refinement to the wording of one item is also needed. Planned future research will include: latent scale scoring, identification of variables that predict and are outcomes to conceptual research use, and longitudinal work to determine CRU Scale sensitivity to change.


Medical Education | 2012

Using automatic item generation to create multiple-choice test items.

Mark J. Gierl; Hollis Lai; Simon R. Turner

Medical Education 2012: 46: 757–765


Archive | 2007

Cognitive Diagnostic Assessment for Education: Why Cognitive Diagnostic Assessment?

Jacqueline P. Leighton; Mark J. Gierl

Cognitive diagnostic assessment (CDA) is designed to measure specific knowledge structures and processing skills in students so as to provide information about their cognitive strengths and weaknesses. CDA is still in its infancy, but its parentage is fairly well established. In 1989, two seminal chapters in Robert Linns Educational Measurement signaled both the escalating interest in and the need for cognitive diagnostic assessment. Samuel Messicks chapter, “Validity”, and the late Richard Snow and David Lohmans chapter, “Implications of Cognitive Psychology for Educational Measurement”, helped solidify the courtship of cognitive psychology within educational measurement. The ideas expressed in these chapters attracted many young scholars to educational measurement and persuaded other, well-established scholars to consider the potential of a relatively innovative branch of psychology, namely, cognitive psychology, for informing test development. CDA can be traced to the ideas expressed in the previously mentioned chapters and, of course, to the many other authors whose ideas, in turn, inspired Messick, Snow, and Lohman (e.g., Cronbach, 1957; Cronbach & Meehl, 1955; Embretson, 1983; Loevinger, 1957; Pellegrino & Glaser, 1979). Since 1989, other influential articles, chapters, and books have been written specifically about CDA (see Frederiksen, Glaser, Lesgold, & Shafto, 1990). Most notably, the article by Paul Nichols (1994) titled “A Framework for Developing Cognitively Diagnostic Assessments” and the book coedited by Paul Nichols, Susan Chipman, and Robert Brennan (1995) appropriately titled Cognitively Diagnostic Assessment .


Applied Measurement in Education | 2004

Performance of SIBTEST When the Percentage of DIF Items is Large

Mark J. Gierl; Andrea Gotzmann; Keith A. Boughton

Differential item functioning (DIF) analyses are used to identify items that operate differently between two groups, after controlling for ability. The Simultaneous Item Bias Test (SIBTEST) is a popular DIF detection method that matches examinees on a true score estimate of ability. However in some testing situations, like test translation and adaptation, the percentage of DIF items can be large. In these situations, the effectiveness of SIBTEST has not been thoroughly evaluated. The problem is addressed in this study. Four variables were manipulated in a simulation study: The amount of DIF on a 40-item test (20%, 40%, and 60% of the items on the test had moderate and large DIF), the direction of DIF (balanced and unbalanced DIF items), sample size (500, 1,000, 1,500, and 2,000 examinees in each group), and ability distribution differences between groups (equal and unequal). Each condition was replicated 100 times to facilitate the computation of the DIF detection rates. The results from the simulation study indicated that SIBTEST yielded adequate DIF detection rates, even when 60% of the items contained DIF, providing DIF was balanced between the reference and focal groups and sample sizes were at least 1,000 examinees per group. SIBTEST also had adequate detection rates in the 20% unbalanced DIF conditions with samples of 1,000 examinees per group. However, SIBTEST had poor detection rates across all 40% and 60% unbalanced DIF conditions. Implications for practice and future directions for research are discussed.


International Journal of Testing | 2012

The Role of Item Models in Automatic Item Generation

Mark J. Gierl; Hollis Lai

Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates or prototypes, that highlight the features or elements in the assessment task that must be manipulated. Second, these item model elements are manipulated to generate new items with the aid of computer-based algorithms. With this two-step process, hundreds or even thousands of new items can be created from a single item model. The purpose of our article is to describe seven different but related topics that are central to the development and use of item models for automatic item generation. We start by defining item model and highlighting some related concepts; we describe how item models are developed; we present an item model taxonomy; we illustrate how item models can be used for automatic item generation; we outline some benefits of using item models; we introduce the idea of an item model bank; and finally, we demonstrate how statistical procedures can be used to estimate the parameters of the generated items without the need for extensive field or pilot testing.


International Journal of Testing | 2001

Illustrating the Use of Nonparametric Regression to Assess Differential Item and Bundle Functioning Among Multiple Groups

Mark J. Gierl; Daniel M. Bolt

The purpose of this article is to illustrate the use of nonparametric regression with kernel smoothing (Ramsay, 1991), as implemented with the computer program TESTGRAF (Ramsay, 2000), to investigate differential item or bundle functioning among multiple groups. Nonparametric regression is a flexible procedure used to estimate and display the relation between the probability that examinees with a given proficiency level will choose different options to multiple-choice items. The unit of analysis can be an item or a bundle of items. It can also be used to detect differential performance across two or more groups of examinees matched on overall proficiency. We present three examples to illustrate how nonparametric regression can be applied to multilingual, multicultural data to study group differences.

Collaboration


Dive into the Mark J. Gierl's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jacqueline P. Leighton

Social Sciences and Humanities Research Council

View shared research outputs
Top Co-Authors

Avatar

Ying Cui

University of Alberta

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qi Guo

University of Alberta

View shared research outputs
Researchain Logo
Decentralizing Knowledge