Is this you? Create Your Porfile

Robert J. Mislevy

University of Maryland, College Park

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robert J. Mislevy is active.

Explore More

Publication

Featured researches published by Robert J. Mislevy.

Measurement: Interdisciplinary Research & Perspective | 2003

Focus Article: On the Structure of Educational Assessments

Robert J. Mislevy; Linda S. Steinberg; Russell G. Almond

In educational assessment, we observe what students say, do, or make in a few particular circumstances and attempt to infer what they know, can do, or have accomplished more generally. A web of inference connects the two. Some connections depend on theories and experience concerning the targeted knowledge in the domain, how it is acquired, and the circumstances under which people bring their knowledge to bear. Other connections may depend on statistical models and probability-based reasoning. Still others concern the elements and processes involved in test construction, administration, scoring, and reporting. This article describes a framework for assessment that makes explicit the interrelations among substantive arguments, assessment designs, and operational processes. The work was motivated by the need to develop assessments that incorporate purposes, technologies, and psychological perspectives that are not well served by familiar forms of assessments. However, the framework is equally applicable to analyzing existing assessments or designing new assessments within familiar forms.

Journal of Educational and Behavioral Statistics | 1986

Recent Developments in the Factor Analysis of Categorical Variables

Robert J. Mislevy

Despite known shortcomings of the procedure, exploratory factor analysis of dichotomous test items has been limited, until recently, to unweighted analyses of matrices of tetrachoric correlations. Superior methods have begun to appear in the literature, in professional symposia, and in computer programs. This paper places these developments in a unified framework, from a review of the classical common factor model for measured variables through generalized least squares and marginal maximum likelihood solutions for dichotomous data. Further extensions of the model are also reported as work in progress.

The Statistician | 1994

Test theory for a new generation of tests

Norman Frederiksen; Robert J. Mislevy; Isaac I. Bejar

Contents: R.J. Mislevy, Introduction. R.E. Snow, D.F. Lohman, Cognitive Psychology, New Test Design, and New Test Theory: An Introduction. R.J. Mislevy, Foundations of a New Test Theory. D.F. Lohman, M.J. Ippel, Cognitive Diagnosis: From Statistically Based Assessment Toward Theory-Based Assessment. N.S. Cole, Comments on Chapters 1-3. D. Thissen, Repealing Rules That No Longer Apply to Psychological Measurement. R.E. Bennett, Toward Intelligent Assessment: An Integration of Constructed-Response Testing, Artificial Intelligence, and Model-Based Measurement. S. Embretson, Psychometric Models for Learning and Cognitive Processes. B.F. Green, Comments on Chapters 4-6. S.P. Marshall, Assessing Schema Knowledge. P.J. Feltovich, R.J. Spiro, R.L. Coulson, Learning, Teaching, and Testing for Complex Conceptual Understanding. G.N. Masters, R.J. Mislevy, New Views of Student Learning: Implications for Educational Measurement. D.H. Gitomer, D. Rock, Addressing Process Variables in Test Analysis. J.B. Carroll, Comments on Chapters 7-10. K. Yamamoto, D.H. Gitomer, Application of a HYBRID Model to a Test of Cognitive Skill Representation. J.B. Carroll, Test Theory and the Behavioral Scaling of Test Performance. I.I. Bejar, A Generative Approach to Psychological and Educational Measurement. E.H. Haertel, D.E. Wiley, Representations of Ability Structures: Implications for Testing. H.I. Braun, Comments on Chapters 11-14.

Journal of Educational Statistics | 1992

Scaling Procedures in NAEP

Robert J. Mislevy; Eugene G. Johnson; Eiji Muraki

Scale-score reporting is a recent innovation in the National Assessment of Educational Progress (NAEP). With scaling methods, the performance of a sample of students in a subject area or subarea can be summarized on a single scale even when different students have been administered different exercises. This article presents an overview of the scaling methodologies employed in the analyses of NAEP surveys beginning with 1984. The first section discusses the perspective on scaling from which the procedures were conceived and applied. The plausible values methodology developed for use in NAEP scale-score analyses is then described, in the contexts of item response theory and average response method scaling. The concluding section lists milestones in the evolution of the plausible values approach in NAEP and directions for further improvement.

User Modeling and User-adapted Interaction | 1996

The role of probability-based inference in an intelligent tutoring system

Robert J. Mislevy; Drew H. Gitomer

Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring systems (ITSs). Basic concepts of the approach are briefly reviewed, but the emphasis is on the considerations that arise when one attempts to operationalize the abstract framework of probability-based reasoning in a practical ITS context. The discussion revolves around HYDRIVE, an ITS for learning to troubleshoot an aircraft hydraulics system. HYDRIVE supports generalized claims about aspects of student proficiency through probabilitybased combination of rule-based evaluations of specific actions. The paper highlights the interplay among inferential issues, the psychology of learning in the domain, and the instructional approach upon which the ITS is based.

Applied Psychological Measurement | 1999

Graphical Models and Computerized Adaptive Testing

Russell G. Almond; Robert J. Mislevy

Computerized adaptive testing (CAT) based on item response theory (IRT) is viewed from the perspective of graphical modeling (GM). GM provides methods for making inferences about multifaceted skills and knowledge, and for extracting data from complex performances. However, simply incorporating variables for all sources of variation is rarely successful. Thus, researchers must closely analyze the substance and structure of the problem to create more effective models. Researchers regularly employ sophisticated strategies to handle many sources of variability outside the IRT model. Relevant variables can play many roles without appearing in the operational IRT model per se, e.g., in validity studies, assembling tests, and constructing and modeling tasks. Some of these techniques are described from a GM perspective, as well as how to extend them to more complex assessment situations. Issues are illustrated in the context of language testing.

Journal of the American Statistical Association | 1985

Estimation of Latent Group Effects

Robert J. Mislevy

Abstract Conventional methods of multivariate normal analysis do not apply when the variables of interest are not observed directly but must be inferred from fallible or incomplete data. A method of estimating such effects by marginal maximum likelihood, implemented by means of an EM algorithm, is proposed. Asymptotic standard errors and likelihood ratio tests of fit are provided. The procedures are illustrated with data from the administration of the Armed Services Vocational Aptitude Battery to a probability sample of American youth.

Language Testing | 2002

Design and Analysis in Task-Based Language Assessment.

Robert J. Mislevy; Linda S. Steinberg; Russell G. Almond

In task-based language assessment (TBLA) language use is observed in settings that are more realistic and complex than in discrete skills assessments, and which typically require the integration of topical, social and/or pragmatic knowledge along with knowledge of the formal elements of language. But designing an assessment is not accomplished simply by determining the settings in which performance will be observed. TBLA raises questions of just how to design complex tasks, evaluate students’ performances and draw valid conclusions therefrom. This article examines these challenges from the perspective of ‘evidence-centred assessment design’. The main building blocks are student, evidence and task models, with tasks to be administered in accordance with an assembly model. We describe these models, show how they are linked and assembled to frame an assessment argument and illustrate points with examples from task-based language assessment.

Applied Measurement in Education | 2002

Making Sense of Data From Complex Assessments

Robert J. Mislevy; Linda S. Steinberg; F. Jay Breyer; Russell G. Almond; Lynn Johnson

Advances in cognitive psychology both deepen our understanding of how students gain and use knowledge and broaden the range of performances and situations we want to see to acquire evidence about their developing knowledge. At the same time, advances in technology make it possible to capture more complex performances in assessment settings by including, as examples, simulation, interactivity, and extended responses. The challenge is making sense of the complex data that result. This article concerns an evidence-centered approach to the design and analysis of complex assessments. We present a design framework that incorporates integrated structures for a modeling knowledge and skills, designing tasks, and extracting and synthesizing evidence. The ideas are illustrated in the context of a project with the Dental Interactive Simulation Corporation (DISC), assessing problem solving in dental hygiene with computer-based simulations. After reviewing the substantive grounding of this effort, we describe the design rationale, statistical and scoring models, and operational structures for the DISC assessment prototype.

Applied Psychological Measurement | 1989

A consumer's guide to Logist and Bilog

Robert J. Mislevy; Martha L. Stocking

Since its release in 1976, Wingersky, Barton, and Lords (1982) LOGIST has been the most widely used computer program for estimating the parameters of the three-parameter logistic item response model. An al ternative program, Mislevy and Bocks (1983) BILOG, has recently become available. This paper compares the approaches taken by the two programs and offers some guidelines for choosing between the two pro grams for particular applications. Index terms: Bayesian estimation, BILOG, IRT estimation procedures, LOGIST, marginal maximum likelihood, maximum like lihood, three-parameter logistic model estimation pro cedures.

Explore More