Steven L. Wise | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven L. Wise is active.

Explore More

Publication

Featured researches published by Steven L. Wise.

Educational Assessment | 2005

Low Examinee Effort in Low-Stakes Assessment: Problems and Potential Solutions

Steven L. Wise; Christine E. DeMars

Student test-taking motivation in low-stakes assessment testing is examined in terms of both its relationship to test performance and the implications of low student effort for test validity. A theoretical model of test-taking motivation is presented, with a synthesis of previous research indicating that low student motivation is associated with a substantial decrease in test performance. A number of assessment practices and data analytic procedures for managing the problems posed by low student motivation are discussed.

Educational Assessment | 2010

Examinee Noneffort and the Validity of Program Assessment Results

Steven L. Wise; Christine E. DeMars

Educational program assessment studies often use data from low-stakes tests to provide evidence of program quality. The validity of scores from such tests, however, is potentially threatened by examinee noneffort. This study investigated the extent to which one type of noneffort—rapid-guessing behavior—distorted the results from three types of commonly used program assessment designs. It was found that, for each design, a modest amount of rapid guessing had a pronounced effect on the results. In addition, motivation filtering was found to be successful in mitigating the effects caused by rapid guessing. It is suggested that measurement practitioners routinely apply motivation filtering whenever the data from low-stakes tests are used to support program decisions.

Educational and Psychological Measurement | 2007

Setting the Response Time Threshold Parameter to Differentiate Solution Behavior From Rapid-Guessing Behavior

Xiaojing J. Kong; Steven L. Wise; Dennison S. Bhola

This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d) statistically estimated using a two-state mixture model. The thresholds were compared using the criteria proposed by Wise and Kong to establish the reliability and validity of response time effort scores, which were generated on the basis of the specified threshold values. The four methods yielded very similar results, indicating that response time effort is not very sensitive to the particular threshold identification method used. Recommendations are given regarding use of the various methods.

Educational Assessment | 2006

The Generalizability of Motivation Filtering in Improving Test Score Validity.

Vicki L. Wise; Steven L. Wise; Dennison S. Bhola

Accountability for educational quality is a priority at all levels of education. Low-stakes testing is one way to measure the quality of education that students receive and make inferences about what students know and can do. Aggregate test scores from low-stakes testing programs are suspect, however, to the degree that these scores are influenced by low test-taker effort. This study examined the generalizability of a recently developed technique called motivation filtering, whereby scores for students of low motivation are systemically filtered from test data to determine aggregate test scores that more accurately reflect student performance and that can be used for reporting purposes. Across assessment tests in five different content areas, motivation filtering was found to consistently increase mean test performance and convergent validity.

Applied Measurement in Education | 2009

Correlates of Rapid-Guessing Behavior in Low-Stakes Testing: Implications for Test Development and Measurement Practice

Steven L. Wise; Dena A. Pastor; Xiaojing J. Kong

Previous research has shown that rapid-guessing behavior can degrade the validity of test scores from low-stakes proficiency tests. This study examined, using hierarchical generalized linear modeling, examinee and item characteristics for predicting rapid-guessing behavior. Several item characteristics were found significant; items with more text or those occurring later in the test were related to increased rapid guessing, while the inclusion of a graphic in a item was related to decreased rapid guessing. The sole significant examinee predictor was SAT total score. Implications of these results for measurement professionals developing low-stakes tests are discussed.

The Journal of General Education | 2009

Strategies for Managing the Problem of Unmotivated Examinees in Low-Stakes Testing Programs

Steven L. Wise

Low examinee effort distorts low-stakes general education assessment results and thereby diminishes the validity of score-based inferences regarding what students know and can do. Several issues concerning examinee effort are noted, the research literature on examinee motivation is reviewed, and several approaches proposed for effectively managing this threat are discussed.

International Journal of Testing | 2010

Can Differential Rapid-Guessing Behavior Lead to Differential Item Functioning?

Christine E. DeMars; Steven L. Wise

This investigation examined whether different rates of rapid guessing between groups could lead to detectable levels of differential item functioning (DIF) in situations where the item parameters were the same for both groups. Two simulation studies were designed to explore this possibility. The groups in Study 1 were simulated to reflect differences between high-stakes and low-stakes conditions, with no rapid guessing in the high-stakes condition. Easy, discriminating items with high rates of rapid guessing by the low-stakes group were detected as showing DIF favoring the high-stakes group when using the Mantel-Haenszel index. The groups in Study 2 were simulated to reflect gender differences in rapid guessing on a low-stakes test. Both groups had some rapid guessing, but the focal group guessed more. Easy items with greater differences in rapid guessing were more likely to be detected as showing DIF. When the group with more rapid guessing had lower mean proficiency, the overall proportion of flagged items was lower but the effect of difference in rapid guessing remained. Our results suggest that there likely are instances in which statistically identified DIF is observed due to the behavioral characteristics of the studied subgroups rather than the content of the items.

Applied Psychological Measurement | 2009

A Clarification of the Effects of Rapid Guessing on Coefficient α: A Note on Attali’s ‘‘Reliability of Speeded Number-Right Multiple-Choice Tests’’

Steven L. Wise; Christine E. DeMars

Attali (2005) recently demonstrated that Cronbach’s coefficient a estimate of reliability for number-right multiple-choice tests will tend to be deflated by speededness, rather than inflated as is commonly believed and taught. For tests that do not have a correction for guessing, if examinees tend to guess rapidly at remaining unanswered items as time is running out rather than omitting them, then these guesses introduce essentially random data into the data matrix. By showing mathematically that such rapid guessing should deflate coefficient a estimates below what they would be if the tests were unspeeded, Attali effectively refuted a popular misconception regarding the internal-consistency reliability of speeded tests. Although the methods, findings, and conclusions of Attali (2005) are correct, his article may inadvertently invite a different type of misconception. Specifically, readers might interpret Attali’s conclusion that rapid guessing at the end of speeded tests has the effect of deflating coefficient a as a more generalized, albeit incorrect, conclusion that the presence of rapid guessing in test data always deflates coefficient a estimates. The purpose of this Brief Report is to illustrate that rapid guessing can—and often does—have the effect of inflating coefficient a estimates above what they would be if there were no rapid-guessing. The explanation for this effect was in fact noted and discussed by Attali (2005). He showed that the presence of rapid guessing can sometimes inflate the correlation between partially speeded items (i.e., those that receive rapid guesses from some but not all examinees). Attali concluded, however, that ‘‘Although speeded items sometimes have inflated correlations with other speeded items, the number of these inflated correlations will generally be much smaller than the number of deflated correlations’’ (p. 361). Thus, Attali’s conclusion regarding the net effect of rapid guessing on coefficient a is based on the observation that inflated correlations will occur only infrequently with partially speeded tests. It is the relative frequencies of inflated versus deflated correlations that determines whether the presence of rapid guessing will increase or decrease coefficient a. Brief Report Applied Psychological Measurement Volume 33 Number 6 September 2009 488-49

European Journal of Psychological Assessment | 2004

Assisted Self-Adapted Testing: A Comparative Study

Pedro M. Hontangas; Julio Olea; Vicente Ponsoda; Javier Revuelta; Steven L. Wise

Abstract: A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found.

The Journal of General Education | 2002

Standard Setting: A Systematic Approach to Interpreting Student Learning

Christine E. DeMars; Donna L. Sundre; Steven L. Wise

Standards were set on three technology tests at James Madison University. The standard setting process elicited active discussion among faculty as they debated the technology knowledge and skills students need to succeed in their subsequent college courses. The standards which emerged were challenging. While only about 30% of incoming freshmen could meet the standards set, an additional 50-60% learned the skills by the end of the year. Recommendations are provided for standard setting in a general education context.

Explore More