Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yigal Attali is active.

Publication


Featured researches published by Yigal Attali.


Educational and Psychological Measurement | 2010

Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions

Yigal Attali; Don Powers

Two experiments examine the psychometric effects of providing immediate feedback on the correctness of answers to open-ended questions, and allowing participants to revise their answers following feedback. Participants answering verbal and math questions are able to correct many of their initial incorrect answers, resulting in higher revised scores. In addition, the reliability of these scores is significantly higher than the reliability of scores based on no feedback. Finally, anxiety is significantly lower following a test section with feedback and revision.


Applied Psychological Measurement | 2005

Reliability of Speeded Number-Right Multiple-Choice Tests

Yigal Attali

Contrary to common belief, reliability estimates of number-right multiple-choice tests are not inflated by speededness. Because examinees guess on questions when they run out of time, the responses to these questions generally show less consistency with the responses of other questions, and the reliability of the test will be decreased. The surprising implication is that adding questions to a multiple-choice test may lower its reliability when the test is speeded. This article develops the mathematical derivations and shows the effects of speededness on reliability in simulations.


The American Statistician | 2002

Seek Whence: Answer Sequences and Their Consequences in Key-Balanced Multiple-Choice Tests

Maya Bar-Hillel; Yigal Attali

The producers of the SAT balance answer keys rather than randomizing them. Whereas randomization yields keys that are balanced only on average, balancing assures this in every subtest. Balancing is a well-kept trade secret, and there is no evidence of awareness that it is exploitable. However, balancing leaves identifiable traces on answer keys. We present the evidence for key balancing, its signatures, and the ways in which testwise examinees can exploit it. Exploitation can add as much as 16 points to ones SAT score.


Educational and Psychological Measurement | 2011

Sequential Effects in Essay Ratings.

Yigal Attali

Contrary to previous research on sequential ratings of student performance, this study found that professional essay raters of a large-scale standardized testing program produced ratings that were drawn toward previous ratings, creating an assimilation effect. Longer intervals between the two adjacent ratings and higher degree of agreement with other raters were associated with smaller assimilation effects. The effect was also found to create a small bias in favor of low-achieving groups of examinees.


Applied Psychological Measurement | 2011

Immediate Feedback and Opportunity to Revise Answers Application of a Graded Response IRT Model

Yigal Attali

Recently, Attali and Powers investigated the usefulness of providing immediate feedback on the correctness of answers to constructed response questions and the opportunity to revise incorrect answers. This article introduces an item response theory (IRT) model for scoring revised responses to questions when several attempts are allowed. The model is based on Samejima’s graded response model, where the graded responses are defined as the number of attempts needed until a correct answer is given, and the response with the least credit is awarded for an incorrect answer after the maximum number of attempts has been reached. Advantages of this conceptualization are discussed and empirical results are presented.


Educational and Psychological Measurement | 2009

Validity of Scores for a Developmental Writing Scale Based on Automated Scoring

Yigal Attali; Donald E. Powers

A developmental writing scale for timed essay-writing performance was created on the basis of automatically computed indicators of writing fluency, word choice, and conventions of standard written English. In a large-scale data collection effort that involved a national sample of more than 12,000 students from 4th, 6th, 8th, 10th, and 12th grade, students wrote (in 30-min sessions) up to four essays in two modes of writing on topics selected from a pool of 20 topics. Scale scores were created by combining essay indicators in a standard way to compute essay scores that shared the same scoring standards across essay prompts and student grade levels. A series of ancillary analyses and studies were conducted to examine the validity of scale scores. Crossclassified random effects modeling of scores confirmed that the particular prompts on which essays are written have little effect on scores. The reliability of scores was found to be higher compared to previous reliability estimates of human essay scores. A human scoring experiment confirmed that the developmental sensitivity of scale scores and human scores was similar. A longitudinal study confirmed the expected gains in scores over a 1-year period.


Educational and Psychological Measurement | 2014

A Ranking Method for Evaluating Constructed Responses

Yigal Attali

This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and subsequent rankings by graders serve to update the prior scores of responses. Final response scores are determined by weighting the prior and ranking information. This approach allows for scaling comparative judgments on the basis of a single ranking, eliminates rater effects in scoring, and offers a conceptual framework for combining human and automated evaluation of constructed response tasks. To evaluate this approach, groups of graders evaluated responses to two tasks using either the ranking (with sets of 5 responses) or traditional rating approach. Results varied by task and the relative weighting of prior versus ranking information, but in general the ranking scores showed comparable generalizability (reliability) and validity coefficients.


Assessment in Education: Principles, Policy & Practice | 2008

How Important Is Content in the Ratings of Essay Assessments

Mark D. Shermis; Aleksandr Shneyderman; Yigal Attali

This study was designed to examine the extent to which ‘content’ accounts for variance in scores assigned in automated essay scoring protocols. Specifically it was hypothesised that certain writing genre would emphasise content more than others. Data were drawn from 1668 essays calibrated at two grade levels (6 and 8) using e‐rater™, an automated essay scoring engine with established validity and reliability. E‐rater v 2.0s scoring algorithm divides 12 variables into ‘content’ (scores assigned to essays with similar vocabulary; similarity of vocabulary to essays with the highest scores) and ‘non‐content’ (grammar, usage, mechanics, style, and discourse structure) related components. The essays were classified by genre: persuasive, expository, and descriptive. The analysis showed that there were significant main effects due to grade, F(1,1653) = 58.71, p < .001, and genre F(2, 1653) = 20.57, p < .001. The interaction of grade and genre was not significant. Eighth grade students had significantly higher mean scores than sixth grade students and descriptive essays were rated significantly higher than those classified as persuasive or expository. Prompts elicited ‘content’ according to expectations with lowest proportion of content variance in persuasive essays, followed by expository and then descriptive. Content accounted for approximately 0–6% of the overall variance when all predictor variables were used. It accounted for approximately 35–58% of the overall variance when ‘content’ variables alone were used in the prediction equation.


Computers in Education | 2015

Effects of multiple-try feedback and question type during mathematics problem solving on performance in similar problems

Yigal Attali

In a study of mathematics problem solving, the effect of providing multiple-try feedback on later success in solving similar problems was examined. Participants solved mathematics problems that were presented as either multiple-choice or open-ended questions, and were provided with one of four types of feedback: no feedback (NF), immediate knowledge of the correct response (KCR), multiple-try feedback with knowledge of the correct response (MTC), or multiple-try feedback with hints after an initial incorrect response (MTH). Results showed that gains in performance were larger in the open-ended than multiple-choice condition. Furthermore, gains under NF and KCR were similar, gains were larger under MTC than KCR, and gains were larger under MTH than MTC. The implications of these results for the design of assessments for learning are discussed. Transfer effect across math problem types and feedback conditions was examined.Solving open-ended problems resulted in larger transfer than multiple-choice.No feedback showed similar transfer effect as single try feedback.Multiple try feedback showed larger transfer effect than single try feedback.Hint feedback showed larger transfer effect than multiple try feedback.


Computers in Education | 2017

Effects of feedback elaboration and feedback timing during computer-based practice in mathematics problem solving

Yigal Attali; Fabienne Michelle Van der Kleij

This study investigated the effects of feedback on performance with pairs of isomorphic items that were embedded within consecutive mathematics web-based practice tests. Participants were randomly assigned to different experimental testing conditions: (1) feedback type: knowledge of correct response (KCR) or KCR with elaborated feedback (EF) in the form of additional explanations of the correct answer, (2) feedback timing: immediately after answering each item or delayed after completing the practice test, and (3) item format: multiple-choice or constructed response. The study specifically investigated the likelihood that participants would correctly answer the second version of the item, conditioned on their answer to the first version, across feedback type and timing conditions, and taking into account item format and participant initial ability. Results from 2445 participants showed a different pattern of results depending on initial item response correctness. With respect to feedback type, EF resulted in higher performance than KCR following incorrect first responses (suggesting initial lack of knowledge and understanding), but not following correct first responses. With respect to feedback timing, immediate feedback with additional delayed review resulted in higher performance than delayed feedback following incorrect first responses, but resulted in lower performance following correct first responses (immediate feedback without delayed review resulted in lower performance in both cases). We compared effects of feedback elaboration and timing in mathematics practice tests.We focused on learner performance on pairs of isomorphic items embedded in tests.Only for incorrect first responses, elaborated feedback was more effective.For incorrect first responses, immediate feedback was more effective than delayed.For correct first responses, delayed feedback was more effective than immediate.

Collaboration


Dive into the Yigal Attali's collaboration.

Top Co-Authors

Avatar

Don Powers

Educational Testing Service

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maya Bar-Hillel

Hebrew University of Jerusalem

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge