Insu Paek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Insu Paek is active.

Explore More

Publication

Featured researches published by Insu Paek.

Archive | 2004

Person regression models

Wim Van den Noortgate; Insu Paek

In this chapter, we focus on the person side of the logistic mixed model. As described in Chapter 2, the simple Rasch model can be extended by including person characteristics as predictors. The resulting models can be called latent regression models, since the latent person abilities (the θs) are regressed on person characteristics. A special kind of a person characteristic is a person group: for instance, pupils can be grouped in schools. Then there are two possibilities for modeling, either we can define random school effects, or we can utilize school indicators with fixed effects.

European Journal of Psychological Assessment | 2009

Realism of Confidence Judgments

Lazar Stankov; Jihyun Lee; Insu Paek

This paper addresses measurement and conceptual issues related to the realism of people’s confidence judgments about their own cognitive abilities. We employed three cognitive tests: listening and reading subtests from the Test of English as a Foreign Language (TOEFL iBT) and a synonyms vocabulary test. The sample consisted of community college students. Our results show that the participants tend to be overconfident about their cognitive abilities on most tasks, representing poor realism. Significant group differences were noted with respect to gender and race/ethnicity: female and European American participants showed smaller levels of overconfidence than males and African Americans or Hispanics. We point out that there appear to be significant individual differences in the understanding of subjective probabilities, and these differences can influence the realism of confidence judgments.

Educational and Psychological Measurement | 2011

Formulating the Rasch Differential Item Functioning Model Under the Marginal Maximum Likelihood Estimation Context and Its Comparison With Mantel–Haenszel Procedure in Short Test and Small Sample Conditions

Insu Paek; Mark Wilson

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel–Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known relationship of the DIF estimators between the Rasch DIF model and the MH procedure was confirmed. In general, the MH method showed a conservative tendency for DIF detection rates compared with the Rasch DIF model approach. When there is DIF, the z test (when the standard error of the DIF estimator is estimated properly) and the likelihood ratio test in the Rasch DIF model approach showed higher DIF detection rates than the MH chi-square test for sample sizes of 100 to 300 per group and test lengths ranging from 4 to 39. In addition, this study discusses proposed Rasch DIF classification rules that accommodate statistical inference on the direction of DIF.

Applied Psychological Measurement | 2013

IRTPRO 2.1 for Windows (Item Response Theory for Patient-Reported Outcomes).

Insu Paek; Kyung T. Han

This article reviews a new item response theory (IRT) model estimation program, IRTPRO 2.1, for Windows that is capable of unidimensional and multidimensional IRT model estimation for existing and user-specified constrained IRT models for dichotomously and polytomously scored item response data.

Journal of Psychoeducational Assessment | 2014

In Search of the Optimal Number of Response Categories in a Rating Scale

Jihyun Lee; Insu Paek

Likert-type rating scales are still the most widely used method when measuring psychoeducational constructs. The present study investigates a long-standing issue of identifying the optimal number of response categories. A special emphasis is given to categorical data, which were generated by the Item Response Theory (IRT) Graded-Response Modeling (GRM). Along with number of categories (from 2 to 6), two scale characteristics of scale length (n = 5, 10, and 20 items) and item discrimination (high/medium/low) were examined. Results of this study show that there was virtually no difference in psychometric properties of the scales using 4, 5, or 6 categories. Most deteriorating change was observed when the number of response categories reduced from 3 to 2 points in all six psychometric measures. Small moderating effects by scale length and item discrimination seem to be present, that is, a slightly larger impact on the psychometric properties by changing the number of response categories in a shorter and/or highly discriminating scale. This study concludes with the suggestion that a caution should be made if a scale has only 2 response categories but that limitation may be overcome by manipulating other scale features, namely, scale length or item discrimination.

Applied Psychological Measurement | 2011

Accuracy of DIF Estimates and Power in Unbalanced Designs Using the Mantel–Haenszel DIF Detection Procedure

Insu Paek; Hongwen Guo

This study examined how much improvement was attainable with respect to accuracy of differential item functioning (DIF) measures and DIF detection rates in the Mantel–Haenszel procedure when employing focal and reference groups with notably unbalanced sample sizes where the focal group has a fixed small sample which does not satisfy the minimum DIF sample size requirement specified by the testing programs, while the reference group sample size far exceeds the minimum requirement. Results showed equivalent or better results with such unbalanced but large samples than with some of the currently used minimum DIF sample size conditions. DIF investigation, therefore, does not necessarily need to cease when the focal group does not meet the minimum sample size requirement. Some analytic explanations and guidelines for DIF investigations with unbalanced sample sizes are also provided.

Applied Psychological Measurement | 2010

Conservativeness in Rejection of the Null Hypothesis When Using the Continuity Correction in the MH Chi-Square Test in DIF Applications

Insu Paek

Conservative bias in rejection of a null hypothesis from using the continuity correction in the Mantel-Haenszel (MH) procedure was examined through simulation in a differential item functioning (DIF) investigation context in which statistical testing uses a prespecified level α for the decision on an item with respect to DIF. The standard MH chi-square test or continuity-corrected MH chi-square test consistently showed a conservative nature across different sample sizes and different significance levels in rejecting the null (No DIF) and non-null (DIF) conditions, especially in small samples. Two alternative testing approaches, the uncorrected MH chi-square and the MH delta-based z test, both showed proper control of Type I error probabilities and better power than the continuity-corrected MH chi-square test. In hypothesis testing for DIF investigations with small sample sizes, either the uncorrected MH chi-square or the MH delta-based z test is recommended rather than the continuity-corrected MH chi-square test.

Educational and Psychological Measurement | 2014

A Comparison of Item Parameter Standard Error Estimation Procedures for Unidimensional and Multidimensional Item Response Theory Modeling.

Insu Paek; Li Cai

The present study was motivated by the recognition that standard errors (SEs) of item response theory (IRT) model parameters are often of immediate interest to practitioners and that there is currently a lack of comparative research on different SE (or error variance–covariance matrix) estimation procedures. The present study investigated item parameter SEs based on three error variance–covariance matrix estimation procedures for unidimensional and multidimensional IRT models: Fisher information, empirical cross-product, and supplemented expectation maximization. This study centers on the direct comparisons of SEs from different procedures and complements a recent study by Tian, Cai, Thissen, and Xin by providing insights and suggestions on the nature of the differences and similarities as well as on practical matters such as the computational cost. The simulation results show that all three procedures produced similar results with respect to bias in the SE estimates for most conditions. When the number of items is large and sample size is small, empirical cross-product, which was the most computationally efficient procedure, appeared to be affected most, producing slight upward bias.

Applied Psychological Measurement | 2015

A Note on Parameter Estimate Comparability Across Latent Classes in Mixture IRT Modeling

Insu Paek; Sun-Joo Cho

The use of mixture item response theory modeling is exemplified typically by comparing item profiles across different latent groups. The comparisons of item profiles presuppose that all model parameter estimates across latent classes are on a common scale. This note discusses the conditions and the model constraint issues to establish a common scale across latent classes.

Applied Psychological Measurement | 2014

A Review of Commercial Software Packages for Multidimensional IRT Modeling

Kyung T. Han; Insu Paek

In this study, the authors evaluate several commercially available multidimensional item response theory (MIRT) software packages, including IRTPRO 2.1, Mplus 7.1, FlexMIRT, and EQSIRT, as well as their built-in estimation algorithms, and compare them for their performance in MIRT model estimation. The study examines the performance of model parameter recovery via a series of simulations based on four approaches for latent structuring—within-item MIRT, between-item MIRT, a mixture of within- and between-item MIRT and a bifactor model. The simulation studies focused on realistic conditions and models that researchers and practitioners are likely to encounter in practice. The results showed that the studied software packages recovered the item parameters reasonably well but differed greatly in terms of the types of data and models they could handle and also the run time required for estimation completion.

Explore More