Susan E. Embretson
Georgia Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Susan E. Embretson.
Psychological Assessment | 1996
Susan E. Embretson
In an ever-changing world, psychological testing remains the flagship of applied psychology. Although both the context of application and the legal guidelines for using tests have changed, psychological tests themselves have been relatively stable. Many historically valued tests, in somewhat revised forms, remain in active current use. Further, although several new tests have developed in response to contemporary needs in applied psychology, the principles underlying test development have remained constant. Or have they? Classical test theory has served test development well over several decades. Guiliksens (1950) classic book, reprinted even in the 1990s, is often cited as the defining volume. However, classical test theory is much older. Many procedures were pioneered by Spearman ( 1907, 1913 ). Most psychologists should, and in fact do, know its principles. In some graduate programs, classical test theory is presented in a separate course that is required for applied psychologists and elective for other areas. In other graduate programs, classical test theory is part of the basic curriculum in testing methods for courses for clinical, counseling, industrial-organizational, and school psychologists. However, since Lord and Novicks (1968) classic book introduced model-based measurement, a quiet revolution has occurred in test theory. Model-based measurement, known as item response theory (IRT) or latent trait theory, has rapidly become mainstream as a theoretical basis for psychological measurement. Increasingly, tests are developed from modelbased measurement not only because the theory is more plausible but also because the potential to solve practical testing problems is greater. A large family of diverse IRT models are now available to apply to an assortment of measurement tasks. IRT applications to available tests will be increasing. Although the early IRT models emphasized dichotomous item formats (e.g., the Rasch model), extensions to other item formats, such as rating scales (Andrich, 1982) and partial credit scoring (Masters, 1982 ) are
Psychometrika | 1991
Susan E. Embretson
A latent trait model is presented for the repeated measurement of ability based on a multidimensional conceptualization of the change process. A simplex structure is postulated to link item performance under a given measurement condition or occasion to initial ability and to one or more modifiabilities that represent individual differences in change. Since item discriminations are constrained to be equal within a measurement condition, the model belongs to the family of multidimensional Rasch models. Maximum likelihood estimators of the item parameters and abilities are derived, and an example provided that shows good recovery of both item and ability parameters. Properties of the model are explored, particularly for several classical issues in measuring change.
Psychometrika | 1984
Susan E. Embretson
The purpose of the current paper is to propose a general multicomponent latent trait model (GLTM) for response processes. The proposed model combines the linear logistic latent trait (LLTM) with the multicomponent latent trait model (MLTM). As with both LLTM and MLTM, the general multicomponent latent trait model can be used to (1) test hypotheses about the theoretical variables that underlie response difficulty and (2) estimate parameters that describe test items by basic substantive properties. However, GLTM contains both component outcomes and complexity factors in a single model and may be applied to data that neither LLTM nor MLTM can handle. Joint maximum likelihood estimators are presented for the parameters of GLTM and an application to cognitive test items is described.
Psychometrika | 1999
Susan E. Embretson
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.
Applied Psychological Measurement | 1987
Susan E. Embretson; C. Douglas Wetzel
The cognitive characteristics of paragraph compre hension items were studied by comparing models that deal with two general processing stages: text represen tation and response decision. The models that were compared included the prepositional structure of the text (Kintsch & van Dijk, 1978), various counts of surface structure variables and word frequency (Drum et al., 1981), a taxonomy of levels of text questions (Anderson, 1972), and some new models that combine features of these models. Calibrations from the linear logistic latent trait model allowed evaluation of the impact of the cognitive variables on item responses. The results indicate that successful prediction of item difficulty is obtained from models with wide represen tation of both text and decision processing. This sug gests that items can be screened for processing diffi culty prior to being administered to examinees. However, the results also have important implications for test validity in that the two processing stages in volve two different ability dimensions.
Intelligence | 1995
Susan E. Embretson
Abstract Although both general control processing (i.e., global metacomponents) and working memory capacity have emerged as primary explanations of abstract intelligence, their relative impact on individual differences rarely has been compared. This study examines the impact of general control processing and working memory capacity on an important measuring task for abstract intelligence. A new multicomponent latent trait model (MLTM) for covert responses was applied to item response data for matrix problems. With this model, working memory capacity could be separated from general control processing because item processing requirements for the former varies systematically across items (Carpenter, Just, & Shell, 1990), whereas the latter remains constant. Structural equation modeling indicated that both processing abilities were significant sources of individual differences. However, general control processing had a stronger impact. The results, and their limitations, are discussed in the context of prior theory and research.
Applied Psychological Measurement | 1996
Susan E. Embretson
In many psychological experiments, interaction effects in factorial analysis of variance (ANOVA) designs are often estimated using total scores derived from classical test theory. However, interaction effects can be reduced or eliminated by nonlinear monotonic transformations of a dependent variable. Although cross-over interactions cannot be eliminated by trans formations, the meaningfulness of other interactions hinges on achieving a measurement scale level for which nonlinear transformations are inappropriate (i.e., at least interval scale level). Classical total test scores do not provide interval level measurement according to contemporary item response theory (IRT). Nevertheless, rarely are IRT models applied to achieve more optimal measurement properties and hence more meaningful interaction effects. This paper provides several condi tions under which interaction effects that are estimated from classical total scores, rather than IRT trait scores, can be misleading. Using derived asymptotic expecta tions from an IRT model, interaction effects of zero on the IRT trait scale were often not estimated as zero from the total score scale. Further, when nonzero inter actions were specified on the IRT trait scale, the esti mated interaction effects were biased inward when estimated from the total score scale. Test difficulty level determined both the direction and the magnitude of the biased interaction effects. Index terms: facto rial designs, interaction effects, interval measurement, item response theory, level of measurement, measure ment scales, statistical inference.
American Psychologist | 2006
Susan E. Embretson
H. Blanton and J. Jaccard examined the arbitrariness of metrics in the context of 2 current issues: (a) the measurement of racial prejudice and (b) the establishment of clinically significant change. According to Blanton and Jaccard, although research findings are not undermined by arbitrary metrics, individual scores and score changes may not be meaningfully interpreted. The author believes that their points are mostly valid and that their examples were appropriate. However, Blanton and Jaccards article does not lead directly to solutions, nor did it adequately describe the scope of the metric problem. This article has 2 major goals. First, some prerequisites for nonarbitrary metrics are presented and related to Blanton and Jaccards issues. Second, the impact of arbitrary metrics on psychological research findings are described. In contrast to Blanton and Jaccard (2006), research findings suggest that metrics have direct impact on statistics for group comparisons and trend analysis.
Archive | 1991
Isaac I. Bejar; Roger Chaffin; Susan E. Embretson
The major objective of the investigation presented in this book is to assess the validity of analogies, a component of the GRE General Test, from a perspective other than the prediction of grade-point averages. The text examines a very practical problem in test construction: the apparent inability of item writers to regularly and predictably construct verbal items in general, and analogy items in particular, that are both difficult and sufficiently discriminating. The authors demonstrate that the incorporation of results from the cognitive laboratory into the test development process is a natural step and should be attempted.
Applied Psychological Measurement | 2006
Joanna S. Gorin; Susan E. Embretson
Recent assessment research joining cognitive psychology and psychometric theory has introduced a new technology, item generation. In algorithmic item generation, items are systematically created based on specific combinations of features that underlie the processing required to correctly solve a problem. Reading comprehension items have been more difficult to model than other item types due to the complexities of quantifying text. However, recent developments in artificial intelligence for text analysis permit quantitative indices to represent cognitive sources of difficulty. The current study attempts to identify generative components for the Graduate Record Examination paragraph comprehension items through the cognitive decomposition of item difficulty. Text comprehension and decision processes accounted for a significant amount of the variance in item difficulties. The decision model variables contributed significantly to variance in item difficulties, whereas the text representation variables did not. Implications for score interpretation and future possibilities for item generation are discussed. Index terms: difficulty modeling, construct validity, comprehension tests, item generation