Gregory Camilli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregory Camilli is active.

Explore More

Publication

Featured researches published by Gregory Camilli.

Journal of Educational and Behavioral Statistics | 1999

Application of a Method of Estimating DIF for Polytomous Test Items

Gregory Camilli; Peter Congdon

In this paper, a method for studying DIF is demonstrated that can used with either dichotomous or polytomous items. The method is shown to be valid for data that follow a partial credit IRT model. It is also shown that logistic regression gives results equivalent to those of the proposed method. In a simulation study, positively biased type 1 error rates of the method are shown to be in accord with results from previous studies; however, the size of the bias in the log odds is moderate. Finally, it is demonstrated how these statistics can be used to study DIF variability with the method of Longford, Holland, & Thayer (1993).

Reading Psychology | 2010

ADDRESSING SUMMER READING SETBACK AMONG ECONOMICALLY DISADVANTAGED ELEMENTARY STUDENTS

Richard L. Allington; Anne McGill-Franzen; Gregory Camilli; Lunetta M. Williams; Jennifer M. Graff; Jacqueline Love Zeig; Courtney Zmach; Rhonda Nowak

Much research has established the contribution of summer reading setback to the reading achievement gap that is present between children from more and less economically advantaged families. Likewise, summer reading activity, or the lack of it, has been linked to summer setback. Finally, family socioeconomic status has been linked to the access children have to books in their homes and neighborhoods. Thus, in this longitudinal experimental study we tested the hypothesis that providing elementary school students from low-income families with a supply of self-selected trade books would ameliorate summer reading setback. Thus, 852 students from 17 high-poverty schools were randomly selected to receive a supply of self-selected trade books on the final day of school over a 3-year period, and 478 randomly selected students from these same schools received no books and served as the control group. No further effort was provided in this intervention study. Outcomes on the state reading assessment indicated a statistically significant effect (p = .015) for providing access to books for summer reading along with a significant (d = .14) effect size. Slightly larger effects (d = .21) were found when comparing the achievement of the most economically disadvantaged students in the treatment and control groups.

Journal of Educational and Behavioral Statistics | 1984

Accounting for Statistical Artifacts in Item Bias Research.

Lorrie A. Shepard; Gregory Camilli; David M. Williams

Theoretically preferred IRT bias detection procedures were applied to both a mathematics achievement and vocabulary test. The data were from black and white seniors on the High School and Beyond data files. To account for statistical artifacts, each analysis was repeated on randomly equivalent samples of blacks and whites (n’s = 1,500). Furthermore, to establish a baseline for judging bias indices that might be attributable only to sampling fluctuations, bias analyses were conducted comparing randomly selected groups of whites. To assess the effect of mean group differences on the appearance of bias, pseudo-ethnic groups were created, that is, samples of whites were selected to simulate the average black-white difference. The validity and sensitivity of the IRT bias indices was supported by several findings. A relatively large number of items (10 of 29) on the math test were found to be consistently biased; they were replicated in parallel analyses. The bias indices were substantially smaller in white-white analyses. Furthermore, the indices (with the possible exception of χ2) did not find bias in the pseudo-ethnic comparison. The pattern of between-study correlations showed high consistency for parallel ethnic analyses where bias was plausibly present. Also, the indices met the discriminant validity test—the correlations were low between conditions where bias should not be present. For the math test, where a substantial number of items appeared biased, the results were interpretable. Verbal math problems were systematically more difficult for blacks. Overall, the sums-of-squares statistics (weighted by the inverse of the variance errors) were judged to be the best indices for quantifying ICC differences between groups. Not only were these statistics the most consistent in detecting bias in the ethnic comparisons, but they also intercorrelated the least in situations of no bias.

Journal of Educational and Behavioral Statistics | 1981

Comparison of Procedures for Detecting test-Item Bias with both Internal and External Ability Criteria

Lorrie A. Shepard; Gregory Camilli; Marilyn Averill

Test bias is conceptualized as differential validity. Statistical techniques for detecting biased items work by identifying items that may be measuring different things for different groups; they identify deviant or anomalous items in the context of other items. The conceptual basis and technical soundness were reviewed for the following item bias methods: transformed item difficulties, item discriminations, one- and three-parameter item characteristic curve methods, and chi-square methods. Sixteen bias indices representing these approaches were computed for black-white and Chicano-white comparisons on both the verbal and nonverbal Lorge-Thorndike Intelligence Tests. In addition, bias indices were recomputed for the Lorge-Thorndike tests using an external criterion. Convergent validity among bias methods was examined in correlation matrices, by factor analysis of the method correlations, and by ratios of agreements in the items found to be “most biased” by each method. Although evidence of convergent validity was found, there will still be important practical differences in the items identified as biased by different methods. The signed full chi-square procedure may be an acceptable substitute for the theoretically preferred but more costly three-parameter signed indices. The external criterion results also reflect on the validity of the methods; arguments were advanced, however, as to why internal bias methods should not be thought of as proxies for a predictive validity model of unbiasedness.

Applied Psychological Measurement | 1992

A conceptual analysis of differential item functioning in terms of a multidimensional item response model

Gregory Camilli

Differential item functioning (DIF) has been informally conceptualized as multidimensionality. Recently, more formal descriptions of DIF as multidimensionality have become available in the item response theory literature. This approach assumes that DIF is not a difference in the item parameters of two groups; rather, it is a shift in the distribution of ability along a secondary trait that influences the probability of a correct item response. That is, one group is relatively more able on an ability such as test-wiseness. The parameters of the secondary distribution are confounded with item parameters by unidimensional DIF detection models, and this manifests as differences between estimated item parameters. However, DIF is con founded with impact in multidimensional tests, which may be a serious limitation of unidimen sional detection methods in some situations. In the multidimensional approach, DIF is considered to be a function of the educational histories of the examinees. Thus, a better tool for understanding DIF may be provided through structural modeling with external variables that describe background and schooling experience.

Applied Psychological Measurement | 1993

Scale shrinkage in vertical equating

Gregory Camilli; Kentaro Yamamoto; Ming-mei Wang

As an alternative to equipercentile equating in the area of multilevel achievement test batteries, item response theory (IRT) vertical equating has produced unexpected results. When expanded standard scores were obtained to link the Comprehensive Test of Basic Skills and the California Achievement Test, the variance of test scores diminished both within particular grade levels from fall to spring, and also from lower to upper grade levels. Equi percentile equating, on the other hand, has resulted in increasing variance both within and across grade levels, although the increases are not linear across grade levels. Three potential causes of scale shrink age are discussed, and a more comprehensive, model-based approach to establishing vertical scales is described. Test data from the National Assessment of Educational Progress were used to estimate the distribution of ability at grades 4, 8, and 12 for several math achievement subtests. For each subtest, the variance of scores increased from grade 4 to grade 8; however, beyond grade 8 the results were not uniform.

Journal of Educational and Behavioral Statistics | 1994

Origin of the Scaling Constant d = 1.7 in Item Response Theory

Gregory Camilli

Cox (1970) observed that the most apparent method of scaling ti to coincide with 4 is to standardize the logistic variable, which is done by multiplying x by Tw/In = 1.81380. Johnson and Kotz (1970) graphically showed that the scaling could be improved by the factor (w//l3) (15/16) = 1.70044. However, Haley (1952) outlined the theoretical derivation of d = 1.702. Because the use of d is widespread and Haleys (1952) unpublished work is not easily accessible, the derivation of d re-presented in this brief note provides a

Journal of Educational and Behavioral Statistics | 1988

Scale Shrinkage and the Estimation of Latent Distribution Parameters.

Gregory Camilli

In this paper the phenomenon of scale shrinkage is examined. Specifically, emphasis is placed on the pattern of decreasing variances in IRT scale scores from fall to spring within a grade. It is concluded that certain situations exist in which scale shrinkage is predictable with unidimensional tests. It depends, to a large degree, on the match between item difficulties and the level of examinee ability. As the mismatch increases, so do distortions of scale because of systematic estimation errors (bias), measurement errors, and unobtainable ability estimates. These problems exist for all observed or estimated scores; however, it is shown in this paper that questions concerning the population distributions of true ability can potentially be addressed with empirical Bayes techniques.

Journal of Educational and Behavioral Statistics | 1994

Teacher’s Corner: Origin of the Scaling Constant d = 1.7 in Item Response Theory

Gregory Camilli

The scaling constant d = 1.702 used in item response theory minimizes the maximum difference between the normal and logistic distribution functions. The theoretical and numerical derivation of d given by Haley (1952) is briefly recapitulated to provide access to curious researchers, instructors, and students.

Large-scale Assessments in Education | 2013

An item response theory approach to longitudinal analysis with application to summer setback in preschool language/literacy

Sunhee Kim; Gregory Camilli

BackgroundAs the popularity of classroom observations has increased, they have been implemented in many longitudinal studies with large probability samples. Given the complexity of longitudinal measurements, there is a need for tools to investigate both growth and the properties of the measurement scale.MethodsA practical IRT model with an embedded growth model is illustrated to examine the psychometric characteristics of classroom assessments for preschool children, and also to show how nonlinear learning over time can be investigated. This approach is applied to data collected for the Academic Rating Scale (ARS) in the literacy domain, which was administered on four occasions over two years.ResultsThe model enabled an effective illustration of overall and individual gains over two academic years. In particular, a significant de-acceleration in latent literacy skills during summer was observed. The results also provided psychometric support for the argument that ARS literacy can be used to assess developmental skill levels consistent with theories of early literacy acquisition.ConclusionsThe proposed IRT approach provided growth parameters that are estimated directly, rather than obtaining these coefficients from estimated growth scores—which may result in biased and inconsistent estimates of growth parameters. The model is also capable of simultaneously representing parameters of items and persons.

Explore More