David J. Weiss | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David J. Weiss is active.

Explore More

Publication

Featured researches published by David J. Weiss.

Medical Care | 2007

Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)

Bryce B. Reeve; Ron D. Hays; Jakob B. Bjorner; Karon F. Cook; Paul K. Crane; Jeanne A. Teresi; David Thissen; Dennis A. Revicki; David J. Weiss; Ronald K. Hambleton; Honghu Liu; Richard Gershon; Steven P. Reise; Jin Shei Lai; David Cella

Background:The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. Objectives:Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. Analyses:Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. Recommendations:Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.

Handbook of Applied Multivariate Statistics and Mathematical Modeling | 2000

Interrater Reliability and Agreement

Howard E. A. Tinsley; David J. Weiss

Publisher Summary This chapter examines the critical distinction between interrater reliability and agreement. It reviews design issues such as the type of replication and the level of measurement that researchers must consider in determining how to collect and analyze ratings data. However, the primary purpose of this chapter is to provide guidelines for selecting the appropriate statistical procedure for analyzing ratings data. Determining the proper procedure for calculating interrater reliability and interrater agreement requires a consideration of the level of measurement achieved by the rating scale and the intended use of the ratings. The chapter considers briefly the desirability of using weighting schemes for transforming nominal scale ratings to ordinal scale ratings. Furthermore, four variations that have the greatest applicability in applied research are discussed. Generalizability theory provides a structured framework within which investigators can compare the alternative forms of the intraclass correlation in terms of the type of error assessed and conceptual meaning of the reliability index, assumptions underlying the index, relevance of the index to their research objectives, and costs of the index.

Applied Psychological Measurement | 2007

Full-Information Item Bifactor Analysis of Graded Response Data

Robert D. Gibbons; R. Darrell Bock; Donald Hedeker; David J. Weiss; Eisuke Segawa; Dulal K. Bhaumik; David J. Kupfer; Ellen Frank; Victoria J. Grochocinski; Angela Stover

A plausible factorial structure for many types of psychological and educational tests exhibits a general factor and one or more group or method factors. This structure can be represented by a bifactor model. The bifactor structure results from the constraint that each item has a nonzero loading on the primary dimension and, at most, one of the group factors. The authors develop estimation procedures for fitting the graded response model when the data follow the bifactor structure. Using maximum marginal likelihood estimation of item parameters, the bifactor restriction leads to a major simplification of the likelihood equations and (a) permits analysis of models with large numbers of group factors, (b) permits conditional dependence within identified subsets of items, and (c) provides more parsimonious factor solutions than an unrestricted full-information item factor analysis in some cases. Analysis of data obtained from 586 chronically mentally ill patients revealed a clear bifactor structure.

New Horizons in Testing#R##N#Latent Trait Test Theory and Computerized Adaptive Testing | 1983

A Comparison of IRT-Based Adaptive Mastery Testing and a Sequential Mastery Testing Procedure

G. Gage Kingsbury; David J. Weiss

Publisher Summary This chapter presents a study for the comparison of item response theory-based adaptive mastery testing (AMT) and a sequential mastery testing procedure. Monte Carlo simulation was used to delineate circumstances in which one of the mastery testing procedures might have an advantage over the other. The method used to compare the two variable-length mastery testing procedures, which were AMT and sequential probability ratio test (SPRT), to one another, as well as to a conventional testing procedure, consisted of the following steps. (1) Four item pools were generated in which the items differed from one another to different degrees. (2) The desired mastery level on the proportion-correct metric was converted to the θ metric by means of the test response function from each item pool, as required by the AMT procedure. (3) Item responses were generated for 500 simulated subjects for each of the items in the four item pools. (4) Conventional tests of three different lengths were drawn from the larger item pools; these conventional tests served as item pools from which the SPRT and AMT procedures drew items. (5) The AMT and SPRT procedures were simulated for each of the four different item pool types and the three conventional test lengths. (6) Comparisons were made among the three types of tests, AMT, SPRT, and conventional concerning the degree of correspondence between the decisions made by the three test types and the true mastery status. Further comparisons were made based on the average test length that each test type required to reach its decisions.

Archives of General Psychiatry | 2012

Development of a Computerized Adaptive Test for Depression

Robert D. Gibbons; David J. Weiss; Paul A. Pilkonis; Ellen Frank; Tara Moore; Jong Bae Kim; David J. Kupfer

CONTEXT Unlike other areas of medicine, psychiatry is almost entirely dependent on patient report to assess the presence and severity of disease; therefore, it is particularly crucial that we find both more accurate and efficient means of obtaining that report. OBJECTIVE To develop a computerized adaptive test (CAT) for depression, called the Computerized Adaptive Test-Depression Inventory (CAT-DI), that decreases patient and clinician burden and increases measurement precision. DESIGN Case-control study. SETTING A psychiatric clinic and community mental health center. PARTICIPANTS A total of 1614 individuals with and without minor and major depression were recruited for study. MAIN OUTCOME MEASURES The focus of this study was the development of the CAT-DI. The 24-item Hamilton Rating Scale for Depression, Patient Health Questionnaire 9, and the Center for Epidemiologic Studies Depression Scale were used to study the convergent validity of the new measure, and the Structured Clinical Interview for DSM-IV was used to obtain diagnostic classifications of minor and major depressive disorder. RESULTS A mean of 12 items per study participant was required to achieve a 0.3 SE in the depression severity estimate and maintain a correlation of r = 0.95 with the total 389-item test score. Using empirically derived thresholds based on a mixture of normal distributions, we found a sensitivity of 0.92 and a specificity of 0.88 for the classification of major depressive disorder in a sample consisting of depressed patients and healthy controls. Correlations on the order of r = 0.8 were found with the other clinician and self-rating scale scores. The CAT-DI provided excellent discrimination throughout the entire depressive severity continuum (minor and major depression), whereas the traditional scales did so primarily at the extremes (eg, major depression). CONCLUSIONS Traditional measurement fixes the number of items administered and allows measurement uncertainty to vary. In contrast, a CAT fixes measurement uncertainty and allows the number of items to vary. The result is a significant reduction in the number of items needed to measure depression and increased precision of measurement.

Archive | 1991

Item Response Theory

David J. Weiss; Michael E. Yoes

During the past 30 years or so, a new theoretical basis for educational and psychological testing and measurement has emerged. It has been variously referred to as latent trait theory, item characteristic curve theory, and, more recently, item response theory (IRT). Although this new test theory holds considerable promise as a successor to classical test theory, it has been underutilized by test practitioners. One important reason for this underutilization is that many test developers have not had sufficient time to devote to the study of the technical and mathematical intricacies involved in this new test theory and its mathematical models. This chapter is intended as an overview of IRT for individuals with some background in the basic methods of classical test theory. Readers are referred to Hambleton (1989) and Hambleton and Swaminathan (1985) for other overviews of IRT.

Applied Psychological Measurement | 1984

Relationship between Corresponding Armed Services Vocational Aptitude Battery (ASVAB) and Computerized Adaptive Testing (CAT) Subtests

Kathleen E. Moreno; C. Douglas Wetzel; James R. McBride; David J. Weiss

The relationships between selected subtests from the Armed Services Vocational Aptitude Battery (ASVAB) and corresponding subtests administered as computerized adaptive tests (CAT) were investigated using Marine recruits as subjects. Three adaptive sub tests were shown to correlate as well with ASVAB as did a second administration of ASVAB, even though the CAT subtests contained only half the number of items. Factor analysis showed the CAT subtests to load on the same factors as the corresponding ASVAB subtests, indicating that the same abilities were being measured. The preenlistment Armed Forces Qualifica tion Test (AFQT) composite scores were predicted as well from the CAT subtest scores as from the retest ASVAB subtest scores, even though the CAT con tained only three of the four AFQT subtests. It is con cluded that CAT can achieve the same measurement precision as a conventional test, with half the number of items.

New Horizons in Testing#R##N#Latent Trait Test Theory and Computerized Adaptive Testing | 1983

The Person Response Curve: Fit of Individuals to Item Response Theory Models

Tom E. Trabin; David J. Weiss

Publisher Summary This chapter discusses the feasibility of the person response curve (PRC) approach for investigating the fit of persons to the three-parameter item response theory (IRT) model. To operationalize the PR,C it subdivides ability test items into separate strata of varying difficulty levels. The limited literature on person variability within a test, thus, seems to have three major trends: (1) the direct analysis of person variability as originally suggested by Mosier, later called the testees trace line by Weiss, the subject characteristic curve by Vale and Weiss, and the person characteristic curve by Lumsden, (2) the designation of highly variable persons as aberrant by Levine and Rubin, (3) the elimination of aberrant person-item interactions by Wright. A careful analysis of these three approaches indicates that the first approach is the most general of the three, subsuming the other two as special cases: If the entire pattern of a testees responses is studied as a function of the difficulty levels of the items, the identification of aberrant response patterns or person–item interactions follows directly. In addition, postulating a person characteristic curve in conjunction with IRT provides a means of testing whether the response patterns of single individuals fit the theory, regardless of the number of parameters assumed.

Educational and Psychological Measurement | 1973

A Study of the Stability of Canonical Correlations and Canonical Components

Robert M. Thorndike; David J. Weiss

OVER the last 10 years considerable progress has been made in the theory and availability of canonical correlation analysis. Cooley and Lohnes (1962) drew attention to the method and provided one of the early programs for performing canonical analyses. Horst (1961) generalized Hotelling’s (1936) solution for the two set case to encompass m sets of variables. This development opened the way for Cooley’s solution for &dquo;multiple partial canonical correlation&dquo; (Cooley, 1967) in which the effect of one (or more) set of variables is partialed out of the relationship between two other sets. In addition, Meredith (1964) has shown how canonical analysis may be performed on the true score components of data which are less than perfectly reliable. The number of empirical examples of canonical correlation in the literature has not kept pace with the theoretical advances. Cooley (1965) suggested three possible reasons for this: (1) the difficulty in computation; (2) availability of other, more familiar, methods for studying the relationships between two sets of vari-

Journal of Vocational Behavior | 1972

Prediction of individual job termination from measured job satisfaction and biographical data.

Kenneth Taylor; David J. Weiss

Abstract The Minnesota Satisfaction Questionnaire (MSQ) was administered to a group of 475 employees of a discount store chain at the same time that biographical data were collected. After a lapse of one year, personnel records indicated about 20% of the employees had terminated. “Leavers” were significantly less satisfied on 10 of the 27 MSQ scales and differed from “stayers” on 3 of the 11 biographical items. Several discriminant functions were developed using sets of biographical data alone, the MSQ scales alone and both sets of predictors in combination, to predict termination. The MSQ scales alone resulted in the greatest improvement in the hit rate for predicting “leave” in the cross-validation group.

Explore More