Is this you? Create Your Porfile

Craig S. Wells

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Craig S. Wells is active.

Explore More

Publication

Featured researches published by Craig S. Wells.

Journal of Contemporary Criminal Justice | 2010

Juvenile Court Referrals and the Public Schools: Nature and Extent of the Practice in Five States

Michael P. Krezmien; Peter E. Leone; Mark Zablocki; Craig S. Wells

Federal legislation and concern about high-profile school shootings have placed attention on safe schools and school discipline. Anecdotal evidence and several reports indicate that in response to calls to promote safety, schools are increasingly referring students to the juvenile courts for acts of misbehavior. Using data from the National Juvenile Court Data Archive, the study reported here examined school referrals (SR) to the juvenile courts in five states from 1995 to 2004. We studied SR over time as well as the proportion of total referrals originating in schools. There was variability in the number of referrals to the juvenile courts originating in the schools and in the trends of SR across states as well as the odds that referrals originated in schools. We found evidence that in four of five states, referrals from schools represented a greater proportion of total referrals to juvenile courts in 2004 than in 1995. We also found differences in the odds of SR to out-of-school referrals (OSR) by race and by gender in some states but not in others. The findings suggest that states may differ in the way in which their schools respond to misbehavior and in the way their schools directly refer students to the juvenile courts. We conclude with a discussion of the implications of the findings.

Applied Measurement in Education | 2008

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Craig S. Wells; Daniel M. Bolt

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not been examined. In the present study, a Monte Carlo simulation study was used to examine the empirical Type I error rates and power of two nonparametric statistics based on the Douglas and Cohen (2001) approach. The procedures are compared to two commonly used goodness-of-fit statistics, S-X 2 (Orlando & Thissen, 2000) and BILOGs G 2 (Mislevy & Bock, 1990), across conditions varied by test length, sample size, and the percentage of misfitting items. Overall, the nonparametrically based statistics controlled the Type I error rate and exhibited the most power across all conditions. Due to its close association with a graphical representation of the item response function, it is argued that the Douglas and Cohen (2001) approach may also allow for a more informative inspection of the type of misfit that is present in test items, although at a slightly greater computational cost.

Journal of Special Education | 2011

Accommodations and Item-Level Analyses Using Mixture Differential Item Functioning Models

Stanley E. Scarpati; Craig S. Wells; Christine Lewis; Stephen Jirka

The purpose of this study was to use differential item functioning (DIF) and latent mixture model analyses to explore factors that explain performance differences on a large-scale mathematics assessment between examinees allowed to use a calculator or who were afforded item presentation accommodations versus those who did not receive the same accommodations. Data from a state accountability assessment of mathematics for students in Grade 8 were analyzed. More than 73,000 students participated, of which 12,268 were students with disabilities (SWD) receiving test accommodations. DIF analyses detected performance differences between examinees without accommodations and those who used a calculator or those where the item presentation was altered. Latent performance class analyses revealed that performance differences were associated with item difficulty and abilty in addition to accommodation status. Results support validity studies that use mixture models that can consider context variables related to item type, academic skills, and accommodations.

Applied Measurement in Education | 2009

Evaluation of the Standard Setting on the 2005 Grade 12 National Assessment of Educational Progress Mathematics Test

Stephen G. Sireci; Jeffrey B. Hauger; Craig S. Wells; Christine Shea; April L. Zenisky

The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings where the results were discussed, review of documentation associated with the standard-setting study, analysis of the standard-setting data, and analysis of other data related to the mathematics proficiency of 2005 Grade 12 students. Our evaluation framework used criteria from the Standards for Educational and Psychological Testing (AERA, APA, & NCME 1999) and other suggestions from the literature (e.g., Kane, 1994, 2001). The process was found to have adequate procedural and internal evidence of validity. Using external data to evaluate the standards provided more equivocal results. In considering all evidence and data reviewed, we concluded the process used to set achievement level standards on the 2005 Grade 12 NAEP Math test was sound and the standards set are valid for the purpose of reporting achievement level results on this test. Recommendations for future NAEP standard-setting studies are provided.

International Journal of Doctoral Studies | 2009

Differential Item Functional Analysis by Gender and Race of the National Doctoral Program Survey

Benita J. Barnes; Craig S. Wells

One way that policies get enacted in higher education is through educational research. In 2000 the National Association of Graduate-Professional Students conducted the National Doctoral Program Survey (NDPS) in an effort to learn more about doctoral students’ experiences and to influence doctoral education policy at both the local and national level. However, the National Doctoral Program Survey (NDPS) have only been reported in the aggregate. This aggregate reporting is appropriate if the items on the survey are measuring the same construct with the same level of accuracy across all respondents, but if this is not the case, then the veracity of the study results can be severely compromised. The purpose of this study was to examine the NDPS instrument using differential item functioning (DIF) analysis to determine if survey items functioned differently across gender and race/ethnicity. We identified 29 of the 48 items as displaying DIF, meaning women and students of color were either more likely or less likely to agree with their Caucasian male peers on certain items. Therefore, some caution may need to be exercised when interpreting the NAGPS data for diverse groups of students.

Educational and Psychological Measurement | 2009

A Model Fit Statistic for Generalized Partial Credit Model

Tie Liang; Craig S. Wells

Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic model and Samejima s graded response model. This study extends this approach to test the fit of generalized partial credit model (GPCM). The empirical Type I error rate and power of the proposed method are assessed for various test lengths, sample sizes, and type of assessment. Overall, the proposed fit statistic performed well under the studied conditions in that the Type I error rate was not inflated and the power was acceptable, especially for moderate to large sample sizes. A further advantage of the nonparametric approach is that it provides a convenient graphical display of possible misfit.

Applied Measurement in Education | 2014

An Examination of Two Procedures for Identifying Consequential Item Parameter Drift

Craig S. Wells; Ronald K. Hambleton; Robert L. Kirkpatrick; Yu Meng

The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure was based on flagging items in which the D2 statistic was more than two standard deviations from the mean. Both procedures were implemented using an iterative purification approach to detect IPD. A simulation study was implemented to evaluate the effectiveness of both detection procedures in flagging non-negligible IPD. Both procedures were able to identify IPD and the iterative purification method provided useful information regarding the consequences of excluding or including a flagged item. The advantages and disadvantages of both procedures as well as possible modifications intended to improve the procedures’ effectiveness are discussed in the article.

Applied Measurement in Education | 2009

Evaluating Score Equity Assessment for State NAEP

Craig S. Wells; Su G. Baldwin; Ronald K. Hambleton; Stephen G. Sireci; Ana Karatonis; Stephen Jirka

Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study examined the consistency of the achievement level results (i.e., Basic, Proficient, and Advanced) across five selected states for each assessment when they were treated separately versus as part of the national sample. The proportions of examinees within each achievement level were obtained using the item statistics for the respective state and the national sample. The achievement level proportions were highly comparable for each state comparison for the Math and Reading assessment (almost indistinguishable), which suggests any lack of score equity with respect to the five studied states was inconsequential. These findings lend credibility for the current practice of using national item statistics in the calculation of state results.

Measurement and Evaluation in Counseling and Development | 2015

Psychometric Properties and Confirmatory Factor Analysis of the Student Engagement in School Success Skills.

Greg Brigman; Craig S. Wells; Linda Webb; Elizabeth Villares; John C. Carey; Karen Harrington

This article describes the confirmatory factor analysis of the Student Engagement in School Success Skills (SESSS) instrument. The results of this study confirm that the SESSS has potential to be a useful self-report measure of elementary students’ use of strategies and skills associated with enhanced academic learning and achievement.

Applied Measurement in Education | 2015

A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test.

Tie Liang; Craig S. Wells

Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three IRT models (three- and two-parameter logistic model, and generalized partial credit model) used in a mixed-format test. The statistical properties of the proposed fit statistic were examined and compared to S-X2 and PARSCALE’s G2. Overall, RISE (Root Integrated Square Error) outperformed the other two fit statistics under the studied conditions in that the Type I error rate was not inflated and the power was acceptable. A further advantage of the nonparametric approach is that it provides a convenient graphical inspection of the misfit.

Explore More