Gautam Puhan
Princeton University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gautam Puhan.
Applied Measurement in Education | 2004
Kadriye Ercikan; Mark J. Gierl; Tanya McCreith; Gautam Puhan; Kim Koh
This research examined the degree of comparability and sources of incomparability of English and French versions of reading, mathematics, and science tests that were administered as part of a survey of achievement in Canada. The results point to substantial psychometric differences between the 2 language versions. Approximately 18% to 36% of the items were identified as differentially functioning for the 2 language groups. Large proportions of these differential item functioning (DIF) items, 36% to 100% across age groups and content areas, were attributed to adaptation related differences. A smaller proportion, 27% to 33% of the DIF items, was attributed to curricular differences. Twenty-four to 49% of DIF items could not be attributed to either of the 2 sources considered in the study.
British Journal of Mathematical and Statistical Psychology | 2009
Shelby J. Haberman; Sandip Sinharay; Gautam Puhan
Recently, there has been an increasing level of interest in reporting subscores for components of larger assessments. This paper examines the issue of reporting subscores at an aggregate level, especially at the level of institutions to which the examinees belong. A new statistical approach based on classical test theory is proposed to assess when subscores at the institutional level have any added value over the total scores. The methods are applied to two operational data sets. For the data under study, the observed results provide little support in favour of reporting subscores for either examinees or institutions.
Applied Measurement in Education | 2010
Gautam Puhan; Sandip Sinharay; Shelby J. Haberman; Kevin C. Larkin
Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second question, three subscore estimation methods (i.e., subscore estimated from the observed subscore, total score, or a combination of both the subscore and total score) were compared. Analyses were conducted using data from six licensure tests. Results indicated that reporting subscores at the examinee level may not be necessary as they did not provide much additional information over what is provided by the total score. However, at the institutional level (for institution size ≥ 30), reporting subscores may not be harmful, although they may be redundant because the subscores were predicted equally well by the observed subscores or total scores. Finally, results indicated that estimating the subscore using a combination of observed subscore and total score resulted in the highest reliability.
Multivariate Behavioral Research | 2010
Sandip Sinharay; Gautam Puhan; Shelby J. Haberman
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting in educational testing. The existing methods for diagnostic score reporting are discussed. A recent method (Haberman, 2008a) that examines if diagnostic scores are worth reporting is reviewed. It is demonstrated, using results from operational and simulated data, that diagnostic scores have to be based on a sufficient number of items and have to be sufficiently distinct from each other to be worth reporting and that several operationally reported subscores are actually not worth reporting. Several recommendations are made for those interested to report diagnostic scores for educational tests.
Applied Measurement in Education | 2008
Gautam Puhan
The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to accumulate to a point when scale scores are rendered incomparable across two parallel chains or time periods. The study examined two conditions (i.e., parallel equating chains or a single long chain) to evaluate whether scale drift occurred for these conditions. Data from three tests that employed cut scores were used in this study. Results indicated that although there were some differences observed between the actual equating conversions derived via different equating chains, the effect of these differences on actual pass and fail status of test takers was not large. Recommendations on some alternatives to follow when a scale drift is observed are made and implications for future research are discussed.
Journal of Cross-Cultural Psychology | 2006
Gautam Puhan; Mark J. Gierl
The current study evaluated the effectiveness of two-stage testing on English and French versions of a science achievement test administered to a national sample in Canada in 1996 and 1999. The tests were administered and scored with the implicit assumption that the two language forms were equivalent. Analysis of the first-stage test revealed that 3 out of 12 items displayed differential item functioning (DIF) in both administrations. However, substantive reviews suggested that translation errors were not the cause of DIF. Analysis of the second-stage test revealed that the test was not comparable between ability groups but was comparable for English and French examinees within each ability group in both administrations. This study illustrates how test developers can monitor their adaptation and administration process when alternative testing procedures are used with multiple language groups. The results are also relevant to cross-cultural researchers who compare examinees from different ethnic and cultural backgrounds.
Educational and Psychological Measurement | 2010
Gautam Puhan; Alina A. von Davier; Shaloo Gupta
Equating under the external anchor design is frequently conducted using scaled scores on the anchor test. However, scaled scores often lead to the unique problem of creating zero frequencies in the score distribution because there may not always be a one-to-one correspondence between raw and scaled scores. For example, raw scores of 17 and 18 may correspond to scaled scores of 150 and 153, thereby creating zero frequencies for scaled scores of 151 and 152. These gaps in the frequency distribution may adversely affect smoothing and equating. This study examines the effect of these zero frequencies on log-linear smoothing of score distributions and final equating results. Results suggest that although smoothing is significantly affected by the presence of these zero frequencies, the impact on the actual equating results is minimal.
Archive | 2017
Neil J. Dorans; Gautam Puhan
This chapter documents ETS advances in score linking theory and practice. As a prelude, we provide a motivation for the considerable extent of research on score linking. Then we summarize published efforts that provide conceptual frameworks for score linking or examples of scale aligning. Next we deal with data collection designs and data preparation issues. Third, we turn our focus to the various procedures that have been developed to link or equate scores. This treatment is followed by a review of research describing processes for evaluating the quality of equating results. Fourth, we review studies that focus on comparing different linking methods. A brief chronological summary of the material covered in preceding parts of the chapter is then provided. The penultimate section summarizes the various books and chapters that ETS authors have contributed on the topic. An extensive list of citations spanning from 1950 to 2015 can be found just after our closing comments.
Educational Measurement: Issues and Practice | 2007
Sandip Sinharay; Shelby J. Haberman; Gautam Puhan
Educational Measurement: Issues and Practice | 2011
Sandip Sinharay; Gautam Puhan; Shelby J. Haberman