Michelle Meadows | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michelle Meadows is active.

Explore More

Publication

Featured researches published by Michelle Meadows.

Curriculum Journal | 2012

Assessment Reform: Students' and Teachers' Responses to the Introduction of Stretch and Challenge at A-Level.

Anthony L. Daly; Jo-Anne Baird; Suzanne Chamberlain; Michelle Meadows

This paper describes an exploration into a reform of the A-level qualification in England in 2008; namely, the introduction of the ‘stretch and challenge’ policy. This policy was initiated by the exams regulator and determined that exam papers should be redesigned to encourage the application of higher order thinking skills, both in the classroom and in examinations. The present study comprised two strands that explored the perceptions of students (n = 39) and teachers (n = 27) regarding the degree to which the incorporation of opportunities for stretch and challenge in the new examination papers had been achieved, and the likely effects on teaching, learning and exam preparation. On the whole, students and teachers welcomed the stretch and challenge policy and there were some indications that changes to the design of question papers could have some positive backwash effects.

Assessment in Education: Principles, Policy & Practice | 2009

Marking quality within test and examination systems

Paul E. Newton; Michelle Meadows

Welcome to this special issue on marking quality within test and examination systems. Recent advances in the field make this a good time to draw together a range of papers focusing on this important issue. For instance, while research into the quality of marking in large-scale assessments has been underway for over a century, it is only recently that a significant body of research has begun to apply theories of cognition and qualitative methods to throw light on what is going on in the minds of raters as they evaluate performances. This is advancing our understanding of the factors that influence marking quality and brings a theoretical approach to what had been a relatively a-theoretical domain. Another major international advance is the increase in use of new technologies with the potential to enhance marking quality. In some countries this has meant that examination scripts are now scanned electronically to be marked on-screen. In others, the aspiration is for machines with the capability to mark even complex responses automatically. Yet, with the advance of technology comes the suspicion that quality may somehow be sacrificed at the altar of cost-effectiveness. Although these technologies have the potential to enhance confidence they also have the potential to alienate members of the public and to reduce trust in our assessment systems. There is an important role for the assessment profession in reassuring ourselves and the public, through validation research, that quality is not being sacrificed as we increasingly replace aspects of marking by humans with electronic solutions. Human judgement is still central to the assessment process, of course, and work continues apace to develop the theory and practice of marking, whether aided by new technology or not. It is with this important theoretical and empirical work that our special issue is concerned. We bring together a selection of papers which aim to improve our understanding of marking quality, and of the factors that influence it, to help make a real and positive difference to the practice of large-scale educational assessment internationally. While the topics covered by these papers suggest that issues of marking quality are international, the vocabulary is less so. Papers from the US and Canada refer to scores, whereas those from the UK refer to marks. A British mark scheme equates to an American scoring rubric. In the UK there are markers or examiners (marking), in the US and in UK ESOL testing there are raters (rating). Beyond the distraction of diverse terminology, the papers have a striking similarity. All tend to be framed squarely in terms of validity (identifying and reducing threats to validity), rather than marking reliability. Further, there is recognition that maximisation of marking reliability could undermine assessment validity, through the underrepresentation or exclusion of important elements or constructs. This commitment to assessment Assessment in Education: Principles, Policy & Practice Vol. 18, No. 3, August 2011, 213–216

Assessment in Education: Principles, Policy & Practice | 2017

Rater accuracy and training group effects in Expert- and Supervisor-based monitoring systems

Jo-Anne Baird; Michelle Meadows; George Leckie; Daniel H. Caro

This study evaluated rater accuracy with rater-monitoring data from high stakes examinations in England. Rater accuracy was estimated with cross-classified multilevel modelling. The data included face-to-face training and monitoring of 567 raters in 110 teams, across 22 examinations, giving a total of 5500 data points. Two rater-monitoring systems (Expert consensus scores and Supervisor judgement of correct scores) were utilised for all raters. Results showed significant group training (table leader) effects upon rater accuracy and these were greater in the expert consensus score monitoring system. When supervisor judgement methods of monitoring were used, differences between training teams (table leader effects) were underestimated. Supervisor-based judgements of raters’ accuracies were more widely dispersed than in the Expert consensus monitoring system. Supervisors not only influenced their teams’ scoring accuracies, they overestimated differences between raters’ accuracies, compared with the Expert consensus system. Systems using supervisor judgements of correct scores and face-to-face rater training are, therefore, likely to underestimate table leader effects and overestimate rater effects.

Oxford Review of Education | 2018

Inter-subject comparability of examination standards in GCSE and GCE in England

Qingping He; Ian Stockford; Michelle Meadows

Abstract Results from Rasch analysis of GCSE and GCE A level data over a period of four years suggest that the standards of examinations in different subjects are not consistent in terms of the levels of the latent trait specified in the Rasch model required to achieve the same grades. Variability in statistical standards between subjects exists at both individual grade level and the overall subject level. Findings from this study are generally consistent with those from previous studies using similar statistical models. It has been demonstrated that the alignment of statistical standards between subjects based on the Rasch model would likely result in substantial change in performance standards of the examinations for some subjects evidenced here by significant changes in grade boundary scores and grade outcomes. It is argued that the defined purposes of GCSE and A level qualifications determine how their results should be interpreted and reported and that the existing grading and results reporting procedures are appropriate for supporting these purposes.

Oxford Review of Education | 2018

Teachers’ experience of and attitudes toward activities to maximise qualification results in England

Michelle Meadows; Beth Black

ABSTRACT Teachers in England are under pressure to maximise their pupils’ examination results, both to improve pupils’ life chances and to ensure their school performs well on government accountability measures. This article reports the findings of an anonymous, online, voluntary survey of 548 teachers from secondary schools and colleges. The survey asked teachers whether they had direct experience of 23 activities aimed at improving results. These activities ranged widely, for example from becoming markers to gain insight into the examination system, removing pupils from the school roll, to providing wording of sections of summative assessment to pupils. Respondents were also given the opportunity to describe other unlisted activities of which they had experience. They rated the acceptability of all the activities. Agreement about the acceptability of the activities varied. Some activities were almost universally condemned, while others were considered more appropriate. Care must be taken in generalising from the experiences and views of this relatively small, volunteer sample of teachers. The survey is, however, unique in providing evidence of the types of activities that some teachers employ and the kinds of ethical dilemmas they face.

International Journal of Testing | 2018

Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling

Stephen D. Holmes; Michelle Meadows; Ian Stockford; Qingping He

The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.

Research in Mathematics Education | 2017

An investigation of construct relevant and irrelevant features of mathematics problem-solving questions using comparative judgement and Kelly’s Repertory Grid

Stephen D. Holmes; Qingping He; Michelle Meadows

ABSTRACT The relationship between the characteristics of 33 mathematical problem-solving questions answered by 16-year-old students in England and the quality of problem-solving elicited was investigated in two studies. The first study used comparative judgement (CJ) to estimate the quality of the problem-solving elicited by each question, involving 33 mathematics teachers judging pairs of journal-style responses to the questions and the application of the Bradley–Terry model. In the second study a variant of Kelly’s Repertory Grid was used with five mathematics teachers to identify 23 dimensions along which the problem-solving questions varied. Significant relationships between ratings on some dimensions and the problem-solving quality estimated in the first study were found. This suggests that the Kelly’s Repertory Grid approach could be an effective way to identify features of questions that are relevant to the construct being assessed and features that could be potential sources of construct-irrelevant variance in test scores.

Archive | 2005