April L. Zenisky
University of Massachusetts Amherst
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by April L. Zenisky.
Applied Measurement in Education | 2002
April L. Zenisky; Stephen G. Sireci
Computers have had a tremendous impact on assessment practices over the past half century. Advances in computer technology have substantially influenced the ways in which tests are made, administered, scored, and reported to examinees. These changes are particularly evident in computer-based testing, where the use of computers has allowed test developers to re-envision what test items look like and how they are scored. By integrating technology into assessments, it is increasingly possible to create test items that can sample as broad or as narrow a range of behaviors as needed while preserving a great deal of fidelity to the construct of interest. In this article we review and illustrate some of the current technological developments in computer-based testing, focusing on novel item formats and automated scoring methodologies. Our review indicates that a number of technological innovations in performance assessment are increasingly being researched and implemented by testing programs. In some cases, complex psychometric and operational issues have successfully been dealt with, but a variety of substantial measurement concerns associated with novel item types and other technological aspects impede more widespread use. Given emerging research, however, there appears to be vast potential for expanding the use of more computerized constructed-response type items in a variety of testing contexts.
Educational and Psychological Measurement | 2003
April L. Zenisky; Ronald K. Hambleton; Frederic Robin
In differential item functioning (DIF) studies, examinees from different groups are typically ability matched, and then one or more statistical indices are used to compare performance on a set of test items. Typically, matching is on total test score (a criterion both observable and easily accessible), but it may be limited in value because if DIF is present, it is likely to distort test scores and potentially confound any item performance differences. Thus, some researchers have advocated iterative approaches for DIF detection. In this article, a two-stage methodology for evaluating DIF in large-scale state assessment data was explored. The findings illustrated the merit of iterative approaches for DIF detection. Items being flagged as DIF in the second stage were not necessarily the same items identified as DIF in the first stage and vice versa, and this finding was directly related to the amount of DIF found in the Stage 1 analyses.
Applied Measurement in Education | 2006
Michael G. Jodoin; April L. Zenisky; Ronald K. Hambleton
Many credentialing agencies today are either administering their examinations by computer or are likely to be doing so in the coming years. Unfortunately, although several promising computer-based test designs are available, little is known about how well they function in examination settings. The goal of this study was to compare fixed-length examinations (both operational forms and newly constructed forms) with several variations of multistage test designs for making pass-fail decisions. Results were produced for 3 passing scores. Four operational 60-item examinations were compared to (a) 3 new 60-item forms, (b) 60-item 3-stage tests, and (c) 40-item 2-stage tests; all were constructed using automated test assembly software. The study was carried out using computer simulation techniques that were set to mimic common examination practices. All 60-item tests, regardless of design or passing score, produced accurate ability estimates and acceptable and similar levels of decision consistency and decision accuracy. One interesting finding was that the 40-item test results were poorer than the 60-item test results, as expected, but were in the range of acceptability. This raises the practical policy question of whether content-valid 40-item tests with lower item exposure levels and/or savings in item development costs are an acceptable trade-off for a small loss in decision accuracy and consistency.
Archive | 2009
April L. Zenisky; Ronald K. Hambleton; Richard M. Luecht
Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modules (Luecht & Nungester, 1998) or testlets (Wainer & Kiely, 1987) and can be characterized as short versions of linear test forms where some specified number of individual items are administered together to meet particular test specifications and provide a certain proportion of the total test information. The individual items in a module may be all related to one or more common stems (such as passages or graphics) or be more generally discrete from one another, per the content specifications of the testing program for the test in question. These self-contained, carefully constructed, fixed sets of items are the same for every examinee to whom each set is administered, but any two examinees may or may not be presented with the same sequence of modules, nor even the same modules.
Applied Measurement in Education | 2009
Stephen G. Sireci; Jeffrey B. Hauger; Craig S. Wells; Christine Shea; April L. Zenisky
The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings where the results were discussed, review of documentation associated with the standard-setting study, analysis of the standard-setting data, and analysis of other data related to the mathematics proficiency of 2005 Grade 12 students. Our evaluation framework used criteria from the Standards for Educational and Psychological Testing (AERA, APA, & NCME 1999) and other suggestions from the literature (e.g., Kane, 1994, 2001). The process was found to have adequate procedural and internal evidence of validity. Using external data to evaluate the standards provided more equivocal results. In considering all evidence and data reviewed, we concluded the process used to set achievement level standards on the 2005 Grade 12 NAEP Math test was sound and the standards set are valid for the purpose of reporting achievement level results on this test. Recommendations for future NAEP standard-setting studies are provided.
Wiley StatsRef: Statistics Reference Online | 2014
April L. Zenisky
Among the decisions to be made in implementing a computer-based test, the choice of test design is one of the most important. A test design is the manner by which individual items are selected for presentation to examinees during administration. The importance of evaluating test design options is critical because it not only concerns test administration but also basic activities in the test development process and the quality and precision of scores to be obtained for a specified testing purpose. Essentially, conceptualizing the test design options for computerized testing first hinges on the property of being adaptive or not, and then there are varied approaches within each of those two groupings. The purpose of this entry is to provide an overview of selected test designs, adaptive and otherwise, while highlighting certain measurement advantages and concerns associated with each. Keywords: computerized-adaptive testing; computer-based testing; computerized fixed tests; multistage testing; test designs
International Journal of Testing | 2010
April L. Zenisky; Katrina M. Crotts
The International Journal of Testing (IJT) is the journal of the International Test Commission. It is intended to support the dissemination of scholarly research on tests and test use worldwide. The purpose of this article is to reflect on what has been published in IJT over its nine volumes to date, with a focus on the extent to which the journal involves contributions from around the globe as well as the topics and issues being studied. Across the 160 scholarly articles published in IJT there are 303 unique authors, and manuscripts range in subject matter from instrument-specific validation studies to analyses of DIF and other important measurement issues. Significantly, IJT does reflect a high degree of international diversity among the authors who have been published in its pages, as it generally exhibits a higher proportion of non-North American authors as compared to other prominent measurement journals since 2001. This diversity in authorship and in content that is present in IJT speaks to the extent to which the journal is a key resource for testing and assessment professionals internationally, and the importance of supporting this mission in the future.
International Journal of Testing | 2015
April L. Zenisky
The language status of examinees in a given testing population is a critical consideration for testing agencies. In many testing contexts today, it cannot be assumed that examinees share a common linguistic background, and indeed, this is an issue that has enormous implications for fairness in testing practices from a validity perspective. Where examinees differ in their language status (minority or majority), or where wide variation in language proficiency is present among examinees relative to the language used for assessment, testing agencies must take care to ensure that this status does not contribute to construct-irrelevant variance and thereby negatively impact scores (and therefore, the validity of test results) for some proportion of examinees. How testing agencies deal with testing populations that include examinees with differing language status is an issue that has implications both within and across borders. Many testing programs now operate internationally, especially in areas of certification and licensure and psychology, and evolving demographic shifts in many countries have effected changes in the number and proportion of languages spoken by various segments of the population. For test development, the presence of linguistic minorities in the target testing population matters in almost all aspects of test planning, preparation, administration, and use, including adaptation processes, validity and reliability, scoring, accommodations, score reporting and interpretation, quality control, and test preparation. The significance of linguistic minorities as a special concern for testing practices is such that the International Test Commission is in the process of developing guidelines that will support future psychometric work in this area, drawing on practitioners and researchers with substantial experience in this area.
Journal of Computerized Adaptive Testing | 2013
Katrina M. Crotts; April L. Zenisky; Stephen G. Sireci; Xueming Li
Consulting Editors John Barnard EPEC, Australia Wim J. van der Linden CTB/McGraw-Hill, U.S.A. Juan Ramón Barrada Universidad de Zaragoza, Spain Alan D. Mead Illinois Institute of Technology, U.S.A. Kirk A. Becker Pearson VUE, U.S.A. Mark D. Reckase Michigan State University, U.S.A. Barbara G. Dodd University of Texas at Austin, U.S.A. Barth Riley University of Illinois at Chicago, U.S.A. Theo Eggen Cito and University of Twente, The Netherlands Otto B. Walter University of Bielefeld, Germany Andreas Frey Friedrich Schiller University Jena, Germany Wen-Chung Wang The Hong Kong Institute of Education Kyung T. Han Graduate Management Admission Council, U.S.A. Steven L. Wise Northwest Evaluation Association, U.S.A.
Encyclopedia of Statistics in Behavioral Science | 2005
April L. Zenisky
Among the decisions to be made in implementing a computer-based test, the choice of test design is one of the most important. A test design is the manner by which individual items are selected for presentation to examinees during administration. The importance of evaluating test design options is critical because it not only concerns test administration but also basic activities in the test development process and the quality and precision of scores to be obtained for a specified testing purpose. Essentially, conceptualizing the test design options for computerized testing first hinges on the property of being adaptive or not, and then there are varied approaches within each of those two groupings. The purpose of this entry is to provide an overview of selected test designs, adaptive and otherwise, while highlighting certain measurement advantages and concerns associated with each. Keywords: computerized-adaptive testing; computer-based testing; computerized fixed tests; multistage testing; test designs