Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Donald E. Powers is active.

Publication


Featured researches published by Donald E. Powers.


The Modern Language Journal | 2002

Decision Making While Rating ESL/EFL Writing Tasks: A Descriptive Framework.

Alister Cumming; Robert Kantor; Donald E. Powers

This article documents 3 coordinated, exploratory studies that developed empirically a framework to describe the decisions that experienced writing assessors make when evaluating ESL/EFL written compositions. The studies are part of ongoing research to prepare a new scoring scheme and tasks for the writing component of the Test of English as a Foreign Language (TOEFL). In Study 1 a research team of 10 experienced ESL/EFL raters developed a preliminary descriptive framework from their own think-aloud protocols while each rating (without any predefined scoring criteria) 60 TOEFL essays at 6 different score points on 4 different essay topics. Study 2 applied the framework to verbal report data from 7 highly experienced English-mother-tongue (EMT) composition raters while each rated 40 TOEFL essays. In Study 3 we refined the framework by analyzing think-aloud protocols from 7 of the same ESL/EFL raters who rated compositions from 6 ESL students on 5 different writing tasks involving writing in response to reading or listening material. In each study, participants completed a questionnaire to profile their individual characteristics and relevant background variables. In addition to documenting and analyzing in detail the thinking processes of these raters, we found that both groups of raters used similar decision-making behaviors, in similar proportions of frequency, while assessing both the TOEFL essays and the new writing tasks, thus verifying the appropriateness of our descriptive framework. Raters attended more extensively to rhetoric and ideas (compared to language) in compositions they scored high than in compositions they scored low. The ESL/EFL raters attended more extensively, though, to language than to rhetoric and ideas overall, whereas the EMT raters balanced more evenly their attention to these main features of the written compositions. Most participants perceived that their previous experiences rating compositions and teaching English had influenced their criteria and their processes for rating the compositions.


Computers in Human Behavior | 2002

Stumping e-rater:challenging the validity of automated essay scoring

Donald E. Powers; Jill Burstein; Martin Chodorow; Mary E. Fowles; Karen Kukich

Abstract For this study, various parties were invited to “challenge” e-rater—an automated essay scorer that relies on natural language processing techniques—by composing essays in response to Graduate Record Examinations (GRE®) Writing Assessment prompts with the intention of undermining its scoring capability. Specifically, using detailed information about e-raters approach to essay scoring, writers tried to “trick” the computer-based system into assigning scores that were higher or lower than deserved. E-raters automated scores on these “problem essays” were compared with scores given by two trained, human readers, and the difference between the scores constituted the standard for judging the extent to which e-rater was fooled. Challengers were differentially successful in writing problematic essays. As a whole, they were more successful in tricking e-rater into assigning scores that were too high than in duping e-rater into awarding scores that were too low. The study provides information on ways in which e-rater, and perhaps other automated essay scoring systems, may fail to provide accurate evaluations, if used as the sole method of scoring in high-stakes assessments. The results suggest possible avenues for improving automated scoring methods.


American Educational Research Journal | 1980

The Effects of Special Preparation on SAT-Verbal Scores

Donald L. Alderman; Donald E. Powers

Students at eight secondary schools were randomly assigned to either a treatment group that was given special preparation for the SAT-V or to a control group whose access to the same preparation was delayed for the purposes of the study. The programs of special preparation were those already in place at the schools. A special administration of the SAT served as the posttest. Special preparation resulted in an overall difference of eight points, on the SAT-V scale of 200–800 points, between the treatment and control groups, corresponding to one additional correct item and stemming primarily from performance on analogy and antonym items.


Language Testing | 2004

A Teacher-Verification Study of Speaking and Writing Prototype Tasks for a New TOEFL.

Alister Cumming; Leslie Grant; Patricia Mulcahy-Ernt; Donald E. Powers

This study was undertaken, in conjunction with other studies eld-testing prototype tasks for a new TOEFL, to evaluate the content validity, perceived authenticity and educational appropriateness of these prototype tasks. We interviewed seven highly experienced instructors of English as a Second Language (ESL) at three universities, asking them to rate their students’ abilities in English and to review samples of their students’ performance to determine whether they thought seven prototype speaking and writing tasks being eld-tested for a new version of the TOEFL® test: • represented the domain of academic English required for studies at English-medium universities or colleges in North America; • elicited performance from their adult ESL students that corresponded to their usual performance in ESL classes and course assignments; and • realized the evidence claims on which the tasks had been designed. The instructors thought that most of their students’ performances on the prototype test tasks were equivalent to or better than their usual performance in classes. The instructors viewed positively the new prototype tasks that required students to write or to speak in reference to reading or listening source texts, but they observed certain problems with these novel tasks and suggested ways that their content and presentation might be improved for the formative development of these tasks.


Language Testing | 1986

Academic demands related to listening skills

Donald E. Powers

A literature review was conducted in order to identify various parameters underlying listening comprehension. The results of this review were used as a basis for a survey of faculty in six graduate fields as well as undergraduate English faculty. The purpose of the survey was to a) obtain faculty perceptions of the importance to academic success of various listening skills and activities, b) assess the degree to which both native and non-native speakers experience difficulties with these skills or activities, and c) determine faculty views of alternative means of evaluating these skills. Faculty perceived some listening skills as more important than others for academic success. These included nine skills in particular that were related primarily to various aspects of lecture content (e.g. identifying major ideas and relationships among them). As might be expected, faculty perceived that non- native students experience more difficulty than native students with all listening activities, and that non-native students have disproportionately greater difficulty with some activities, such as following lectures given at different speeds and comprehending or deducing the meaning of important vocabulary. With respect to measuring listening comprehension, some general approaches and specific item types were judged to be more appropriate than others. These included tasks that entail answering questions involving the recall of details as well as those involving inference or deductions. The results of the survey are used to suggest further research on the construct validity of the Listening Comprehension section of the TOEFL.


Journal of Educational Computing Research | 2001

Test Anxiety and Test Performance: Comparing Paper-Based and Computer-Adaptive Versions of the Graduate Record Examinations (Gre©) General Test

Donald E. Powers

Despite some assumptions to the contrary, there is reason to believe that the introduction of computer-adaptive testing may actually help to alleviate test anxiety and diminish the relationship between test anxiety and test performance. This study provided a test of this hypothesis. Results are based on a sample of the Graduate Record Examinations (GRE©) General Test takers who took the computer-adaptive version of the test, and another sample of GRE examinees who took the paper-based version of the test. After taking the test, all examinees completed both a test anxiety inventory and an inventory concerning attitudes toward computers. Relationships were examined between performance on each of the three GRE General Test measures and reports of test anxiety (both worry and emotionality) and computer attitudes (both anxiety and confidence). For both the test anxiety and the computer attitudes scales, the relationship to GRE scores was similar for the computer-adaptive and paper-based GRE General Test. Thus, there was no support for the studys major hypothesis. Several ancillary findings, however, do have implications for large-scale testing programs, especially those moving to computer-based testing.


Journal of Applied Psychology | 2004

Validity of Graduate Record Examinations (GRE) General Test Scores for Admissions to Colleges of Veterinary Medicine.

Donald E. Powers

From the standpoint of test validation, veterinary medicine provides both a unique context in which to study the validity of Graduate Record Examinations (GRE) test scores and a singular opportunity to address the shortcomings typical of many GRE validity studies. This article documents a study of the validity of the GRE General Test for predicting 1st-year grade averages in a comprehensive sample of veterinary medical colleges. For each of 16 veterinary medical colleges, statistical corrections were applied to correct for the effects of range restriction in the predictors and unreliability of the criterion. When fully corrected for both range restriction and unreliability, the resulting validity coefficients were, on average,.53 for the combination of all 3 GRE General Test scores,.59 for undergraduate grade point average, and.71 for GRE scores and undergraduate grade point average together.


Applied Psychological Measurement | 1981

Extending the Measurement of Graduate Admission Abilities Beyond the Verbal and Quantitative Domains

Donald E. Powers; Spencer S. Swinton

Traditionally, major national admissions tests, such as the Graduate Record Examinations (GRE) Aptitude Test, have focused primarily on the mea surement of broadly applicable verbal and quanti tative abilities. The GRE Board recently sponsored an investigation of the possibility of extending the measurement of abilities beyond the verbal and quantitative domains in order to facilitate a broadened definition of talent. That effort resulted in a restructured GRE Aptitude Test, which in cludes a measure of analytical ability for which a separate score is reported. The present study pro vides a factor analytic description of the new re structured test. Results suggest that the restruc tured test continues to tap the verbal and quantita tive skills measured by the original GRE Aptitude Test but that it also contains a distinct, identifiable analytical dimension that is highly correlated with the dimensions underlying performance on the ver bal and quantitative sections of the test.


Research in Higher Education | 1982

Feedback as an incentive for responding to a mail questionnaire

Donald E. Powers; Donald L. Alderman

The effect of offering respondents feedback on questionnaire results was investigated in a national mail survey of college-bound secondary school students. It was found that offering feedback had a significant positive effect on response rate, but the magnitude of that effect was slightly less than the increase in response rate resulting from a shorter questionnaire and considerably less than a follow-up contact with nonrespondents.


Language Testing | 2012

TOEFL iBT Speaking Test Scores as Indicators of Oral Communicative Language Proficiency

Brent Bridgeman; Donald E. Powers; Elizabeth Stone; Pamela Mollaun

Scores assigned by trained raters and by an automated scoring system (SpeechRaterTM) on the speaking section of the TOEFL iBT™ were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language Internet-based test (TOEFL iBT). Oral communicative effectiveness was evaluated both by rating scales and by the ability of the undergraduate raters to answer multiple-choice questions that could be answered only if the spoken response was understood. Correlations of these communicative competence indicators from the undergraduate raters with speech scores were substantially higher for the scores provided by the professional TOEFL iBT raters than for the scores provided by SpeechRater. Results suggested that both expert raters and SpeechRater are evaluating aspects of communicative competence, but that SpeechRater fails to measure aspects of the construct that human raters can evaluate.

Collaboration


Dive into the Donald E. Powers's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge