Irene Kostin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Irene Kostin is active.

Explore More

Publication

Featured researches published by Irene Kostin.

Language Testing | 1999

Does the text matter in a multiple-choice test of comprehension? the case for the construct validity of TOEFL's minitalks

Roy Freedle; Irene Kostin

The current study addresses a specific construct validity issue regarding multiple-choice language-comprehension tests by focusing on TOEFL’s minitalk passages: Is there evidence that examinees attend to the text passages in answering the test items? To address this problem, we analysed a large sample (n = 337) of minitalk items. The content and structure of the items and their associated text passages were represented by a set of predictor variables that included a wide variety of text and item characteristics identified from the experimental language-comprehension literature. Stepwise and hierarchical regression techniques showed that at least 33% of the item difficulty variance could be accounted for primarily by variables that reflected the content and structure of the whole passage and/or selected portions of the passage; item characteristics, however, accounted for very little of the variance. The pattern of these results was interpreted, with qualifications, as favouring the construct validity of TOEFL’s minitalks. Our methodology also allowed a detailed comparison between TOEFL reading and listening (minitalk) items. Several criticisms concerning multiple-choice language-comprehension tests were addressed. Future work is suggested.

Language Testing | 1993

The prediction of TOEFL reading item difficulty: implications for construct validity

Roy Freedle; Irene Kostin

The purpose of this study is to predict the difficulty of a large sample (n = 213) of TOEFL reading comprehension items. A related purpose was to examine whether text and text-by-item interaction variables play a significant role in predicting item difficulty. It was argued that evidence favouring the construct validity of multiple-choice reading test formats requires signincant contributions from these particular predictor variables. Details of item predictability and construct validity were explored by evaluating two hypoth eses : 1) that multiple-choice reading comprehension tests are sensitive to 12 categories of sentential and/or discourse variables found to influence compre hension processes in the experimental literature; and 2) that many of these categories of variables identified in the first hypothesis contribute significant independent variance in predicting item difficulty. For the first hypothesis, correlational analyses confirmed the importance of 11 out of the 12 categor ies, while stepwise regression analyses, accounting for up to 58% of the variance, provided some support for the second hypothesis. The pattern of predictors showed that text and text-by-item variables accounted for most of the variance, thereby providing evidence favouring the construct validity of the TOEFL reading items.

Journal of Personality and Social Psychology | 1976

Kinesthetic aftereffect and personality: a case study of issues involved in construct validation.

A. Harvey Baker; Irene Kostin; Lawrence Parker

Kinesthetic Aftereffect (KAE), once a promising personality index, has been abandoned by many investigators because of poor retest reliability and intermittent validity. In challenging this current consensus, we argue that (a) first-session KAE is valid; (b) poor retest reliability simply reflects later-session bias; (c) hence, multisession studies should not be used to assess validity without taking this bias into account. Those recent studies which failed to support KAE validity were each multisession in design. If our bias contention is correct, these studies should be ignored, and the claim of intermittent validity is thus rebutted. Reanalysis of the most recent major multisession, nonsupportive validity study indicates (a) Session 1 validity, (b) later-session bias, and (c) later-session valdiity when multisession scores are combined to avoid bias. Thus, KAE validly measures personality.

Intelligence | 1997

Predicting black and white differential item functioning in verbal analogy performance

Roy Freedle; Irene Kostin

Abstract Differential item functioning (DIF) is one technique for comparing ethnic populations that test makers employ to help ensure the fairness of their tests. The purpose of this ethnic comparison study is to investigate factors that may have a significant influence on DIF values associated with 217 SAT and 234 GRE analogy items obtained by comparing large samples of Black and White examinees matched for total verbal score. In one study, five significant regression predictors of ethnic differences were found to account for approximately 30% of the DIF variance. A second study replicated these findings. These significant ethnic comparisons are interpreted as consistent with a cultural/contextualist framework although competing explanations involving social-economic status and biological contributions could not be ruled out. Practical implications are discussed.

Journal of Research in Personality | 1978

When “reliability” fails, must a measure be discarded?—The case of Kinesthetic Aftereffect

A. Harvey Baker; Brian L. Mishara; Laurence Parker; Irene Kostin

Abstract Critics of Kinesthetic Aftereffect (KAE) recommend abandoning it as a personality measure largely because of poor test-retest reliability. Although no test can be valid if lacking true reliability, to discard a measure because of poor retest reliability is an oversimplification of validation procedures. This pitfall is exemplified here by a reexamination of KAE. KAE scores involve measures before (pretest) and after (test) aftereffect induction. Internal analysis of a KAE study showed: Differential bias is present; its locus is the second session pretest; its form makes second-session pretest scores functionally more similar to first- and second-session test scores and functionally more dissimilar to first-session pretest scores. Given this second session bias, the retest correlation tells us nothing about the true reliability of a one-session KAE score. However, if a measure possesses external validity, it must to some degree show true reliability. Based upon a literature review of one-session KAE validity studies, we conclude that one-session KAE scores are valid and hence show true reliability. KAE remains a promising personality measure.

Elementary School Journal | 2014

The TextEvaluator Tool

Kathleen M. Sheehan; Irene Kostin; Diane Napolitano; Michael Flor

This article describes TextEvaluator, a comprehensive text-analysis system designed to help teachers, textbook publishers, test developers, and literacy researchers select reading materials that are consistent with the text-complexity goals outlined in the Common Core State Standards. Three particular aspects of the TextEvaluator measurement approach are highlighted: (1) attending to relevant reader and task considerations, (2) expanding construct coverage beyond the two dimensions of text variation traditionally assessed by readability metrics, and (3) addressing two potential threats to tool validity: genre bias and blueprint bias. We argue that systems that are attentive to these particular measurement issues may be more effective at helping users achieve a key goal of the new Standards: ensuring that students are challenged to read texts at steadily increasing complexity levels as they progress through school, so that all students acquire the advanced reading skills needed for success in college and careers.

Journal of Personality and Social Psychology | 1979

Menstrual cycle affects kinesthetic aftereffect, an index of personality and perceptual style.

A. Harvey Baker; Irene Kostin; Laurence Parker

Research suggests that kinesthetic aftereffect (KAE) scores reflect status on a postulated stimulus intensity modulation (SIM) mechanism that damps down subjective stimulus intensity for some (reducing) and increases it for others (augmenting). Such a mechanism would help account for empirically observed individual differences in such behaviors as pain tolerance, sensory deprivation reactivity, and stimulation seeking. It was hypothesized and confirmed in three adult female samples that KAE varies curvilinearly over the menstrual cycle: Greater KAE reduction occurs at the cycles beginning and end. Neither tiredness, oral contraception, medication, attention, nor social expectations can explain this finding. Of the behaviors studied in the KAE literature, only five are also encompassed by the menstrual cycle literature. Four of these (antisocial behavior, acute schizophrenic episodes, accidents, and activity level) show similar curvilinearity over the cycle. We hypothesize that cyclical variation in the SIM mechanism mediates the curvilinear pattern observed for both these four behaviors and KAE.

Perceptual and Motor Skills | 1974

DELINQUENCY AND STIMULATION SEEKING: RE-ANALYSIS OF PETRIE'S STUDY OF KINESTHETIC AFTEREFFECTS

Baker Ah; Brian L. Mishara; Irene Kostin; Laurence Parker

Individual differences in kinesthetic aftereffects presumably reflect differential modulation of stimulus intensity. Supporting this view, Petrie (1962) found that delinquents showed greater reduction, i.e., smaller size judgments after inducing stimulation, than nondelinquents. Recent evidence has contraindicated use of Petries two-session procedure. Here, using only Petries data from Session 1 and a more appropriately defined control group, the original findings were reconfirmed.

Psychological Science | 1994

Can Multiple-Choice Reading Tests Be Construct-Valid? A Reply to Katz, Lautenschlager, Blackburn, and Harris

Roy Freedle; Irene Kostin

There is a long and continuing history of criticisms of multiple-choice tests of reading (see Farr, Pritchard, & Smitten, 1990). Farr et ai. indicated that of all the criticisms, the most serious maintains that examinees do not or need not read and comprehend the reading selections accompanying the test items at all. A similar criticism appeared in this journal (Katz, Lautenschlager, Blackburn, & Harris, 1990). Katz et ai. presented evidence that examinees were able to perform at better than chance levels of correctness for 100 items taken from the Scholastic Aptitude Tests (SAT) reading sections even when the examinees had not read the passages accompanying those items. In one study, up to 72% of the items were answered correctly at greater than chance levels (20%) in the absence ofthe passage. This work suggests that either information present in the reading items themselves is strongly related to information in the missing passage (and hence the passage need not be present to get the item correct) or the students reasoning ability and possibly background knowledge suffice to guide responses. Because of their findings, Katz et ai. called into question the construct validity of multiple-choice reading tests, and in particular the validity of the SATs reading section; that is, they suggested the test does not measure what it is intended to measure, passage comprehension. Other critics (Royer, 1990) have pointed out that various studies of factors that influence item difficulty in multiple-choice reading tests appear strongly to implicate item features as opposed to text passage features. In this regard, an especially influential study by Drum, Calfee, and Cook (1981) is often cited (see Davey, 1988; Embretson & Wetzel, 1987; Just & Carpenter, 1987; Royer, 1990). Drum et ai. divided several predictor variables into two broad categories: item variables and text variables. The best predictor turned out to be what can be called item plausibility. Without further critical analysis, this finding has been widely cited as evidence against the construct validity of multiple-choice reading tests. That is, the fact that an item variable was by far the strongest predictor of difficulty came to be viewed as evidence against construct validity-reading tests were apparently not measuring passage comprehension. The results of Katz et ai. (1990) underscore this prevailing viewpoint.

Perceptual and Motor Skills | 1973

Kinesthetic figural aftereffects: norms from four samples, and a comparison of methods for classifying augmenters, moderates, and reducers.

A.Harvey Baker; Laurence Parker; Irene Kostin

Prior studies into the validity of Petries hypothesized augmentation-reduction dimension have apparently assumed that there is a linear relationship between scores from the Petrie variant of the kinesthetic figural aftereffects task and external validity variables. Empirical evidence for this assumption is either lacking or contradictory and it is here argued that the possibility of non-linear relationships should be explored systematically. Studies of this task typically involve small samples, however, making it difficult to ascertain whether both extremes of the augmentation-reduction dimension are adequately represented. Normative data are therefore presented from four samples, and the comparability of several methods which can be used to classify individuals as Augmenters, Moderates, and Reducers is explored.

Explore More