Matt Homer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matt Homer is active.

Explore More

Publication

Featured researches published by Matt Homer.

Assessment in Education: Principles, Policy & Practice | 2008

A comparison of performance and attitudes in mathematics amongst the ‘gifted’. Are boys better at mathematics or do they just think they are?

Melanie Hargreaves; Matt Homer; Bronwen Swinnerton

This paper explores gender differential performance in ‘gifted and talented’ 9‐ and 13‐year‐olds in a mathematics assessment in England. Boys’ and girls’ attitudes to mathematics and their views about which gender is better at mathematics are also considered. The study employs the use of a matched sample of boys and girls so that school, age and previous achievement in mathematics can be controlled whilst exploring performance on World Class Test items. The main result of this research was that there was no significant gender difference in performance for the 9‐ or the 13‐year‐olds. However, attitudinal differences were found, including a seemingly commonly held stereotypical view of mathematics as a boys’ subject. These results are important since the uptake of higher level mathematically‐based courses by girls is poor. Further findings reveal that where ‘gifted’ girls perform as well as ‘gifted’ boys, their confidence in the subject is lower than their performance might suggest. This work is also discussed in the light of related research findings and in relation to stereotype threat theory.

British Educational Research Journal | 2011

Sources of differential participation rates in school science: the impact of curriculum reform

Matt Homer; Jim Ryder; Jim Donnelly

School science courses have widely varying participation rates across a range of student characteristics. One of the stated aims of the 2006 Key Stage 4 science curriculum reforms in England was to improve social mobility and inclusion. To encourage students to study more science, this reform was followed by the introduction in 2008 of an entitlement to study the three separate sciences at Key Stage 4 for the more highly attaining students. This paper uses longitudinal national data over a five year period to investigate the extent and change of participation across science courses at KS4, focussing on student gender and socio-economic status. It finds that whilst there is some evidence of a move towards a more equitable gender balance for some courses, there is as yet little evidence of substantial change in differential participation rates by socio-economic status.

Medical Teacher | 2015

Investigating disparity between global grades and checklist scores in OSCEs

Godfrey Pell; Matt Homer; Richard Fuller

Abstract Background: When measuring assessment quality, increasing focus is placed on the value of station-level metrics in the detection and remediation of problems in the assessment. Aims: This article investigates how disparity between checklist scores and global grades in an Objective Structured Clinical Examination (OSCE) can provide powerful new insights at the station level whenever such disparities occur and develops metrics to indicate when this is a problem. Method: This retrospective study uses OSCE data from multiple examinations to investigate the extent to which these new measurements of disparity complement existing station-level metrics. Results: In stations where existing metrics are poor, the new metrics provide greater understanding of the underlying sources of error. Equally importantly, stations of apparently satisfactory “quality” based on traditional metrics are shown to sometimes have problems of their own – with a tendency for checklist score “performance” to be judged stronger than would be expected from the global grades awarded. Conclusions: There is an ongoing tension in OSCE assessment between global holistic judgements and the necessarily more reductionist, but arguably more objective, checklist scores. This article develops methods to quantify the disparity between these judgements and illustrates how such analyses can inform ongoing improvement in station quality.

Medical Teacher | 2017

Managing extremes of assessor judgment within the OSCE.

Richard Fuller; Matt Homer; Godfrey Pell; Jennifer Hallam

Abstract Context: There is a growing body of research investigating assessor judgments in complex performance environments such as OSCE examinations. Post hoc analysis can be employed to identify some elements of “unwanted” assessor variance. However, the impact of individual, apparently “extreme” assessors on OSCE quality, assessment outcomes and pass/fail decisions has not been previously explored. This paper uses a range of “case studies” as examples to illustrate the impact that “extreme” examiners can have in OSCEs, and gives pragmatic suggestions to successfully alleviating problems. Method and results: We used real OSCE assessment data from a number of examinations where at station level, a single examiner assesses student performance using a global grade and a key features checklist. Three exemplar case studies where initial post hoc analysis has indicated problematic individual assessor behavior are considered and discussed in detail, highlighting both the impact of individual examiner behavior and station design on subsequent judgments. Conclusions: In complex assessment environments, institutions have a duty to maximize the defensibility, quality and validity of the assessment process. A key element of this involves critical analysis, through a range of approaches, of assessor judgments. However, care must be taken when assuming that apparent aberrant examiner behavior is automatically just that.

Medical Teacher | 2013

Estimating and comparing the reliability of a suite of workplace-based assessments: An obstetrics and gynaecology setting

Matt Homer; Zeryab Setna; Vikram Jha; Jenny Higham; Trudie Roberts; Katherine Boursicot

This paper reports on a study that compares estimates of the reliability of a suite of workplace based assessment forms as employed to formatively assess the progress of trainee obstetricians and gynaecologists. The use of such forms of assessment is growing nationally and internationally in many specialties, but there is little research evidence on comparisons by procedure/competency and form-type across an entire specialty. Generalisability theory combined with a multilevel modelling approach is used to estimate variance components, G-coefficients and standard errors of measurement across 13 procedures and three form-types (mini-CEX, OSATS and CbD). The main finding is that there are wide variations in the estimates of reliability across the forms, and that therefore the guidance on assessment within the specialty does not always allow for enough forms per trainee to ensure that the levels of reliability of the process is adequate. There is, however, little evidence that reliability varies systematically by form-type. Methodologically, the problems of accurately estimating reliability in these contexts through the calculation of variance components and, crucially, their associated standard errors are considered. The importance of the use of appropriate methods in such calculations is emphasised, and the unavoidable limitations of research in naturalistic settings are discussed.

International Journal of Research & Method in Education | 2011

The use of national data sets to baseline science education reform: exploring value-added approaches

Matt Homer; Jim Ryder; Jim Donnelly

This paper uses data from the National Pupil Database to investigate the differences in ‘performance’ across the range of science courses available following the 2006 Key Stage 4 (KS4) science reforms in England. This is a value-added exploration (from Key Stage 3 [KS3] to KS4) aimed not at the student or the school level, but rather at that of the course. Different methodological approaches to carrying out such an analysis, ranging from simple non-contextualized techniques, to more complex fully contextualized multilevel models, are investigated and their limitations and benefits are evaluated. Important differences between courses are found in terms of the typical ‘value’ they add to the students studying them with particular applied science courses producing higher mean KS4 outcomes for the same KS3 level compared with other courses. The implications of the emergence of such differences, in a context where schools are judged to a great extent on their value-added performance, are discussed. The relative importance of a variety of student characteristics in determining KS4 outcomes are also investigated. Substantive findings are that across all types of course, science prior attainment at KS3, rather than that of mathematics or English, is the most important predictor of KS4 performance in science, and that students of lower socio-economic status consistently make less progress over KS4 than might be expected, despite prior attainment being accounted for in the modelling.

Medical Teacher | 2016

Setting standards in knowledge assessments: Comparing Ebel and Cohen via Rasch

Matt Homer; Jonathan C. Darling

Abstract Introduction: It is known that test-centered methods for setting standards in knowledge tests (e.g. Angoff or Ebel) are problematic, with expert judges not able to consistently predict the difficulty of individual items. A different approach is the Cohen method, which benchmarks the difficulty of the test based on the performance of the top candidates. Methods: This paper investigates the extent to which Ebel (and also Cohen) produces a consistent standard in a knowledge test when comparing between adjacent cohorts. The two tests are linked using common anchor items and Rasch analysis to put all items and all candidates on the same scale. Results: The two tests are of a similar standard, but the two cohorts are different in their average abilities. The Ebel method is entirely consistent across the two years, but the Cohen method looks less so, whilst the Rasch equating itself has complications – for example, with evidence of overall misfit to the Rasch model and change in difficulty for some anchor items. Conclusion: Based on our findings, we advocate a pluralistic and pragmatic approach to standard setting in such contexts, and recommend the use of multiple sources of information to inform the decision about the correct standard.

International Journal of Science Education | 2015

The Impact of a Science Qualification Emphasising Scientific Literacy on Post-compulsory Science Participation: An analysis using national data

Matt Homer; Jim Ryder

In 2006 in England an innovative suite of science qualifications for 14–16-year-olds called Twenty-First Century Science (21CS) was introduced. These qualifications have a strong focus on developing scientific literacy in all students whilst simultaneously providing preparation for the study of post-compulsory science for a smaller proportion of students. Claims have been made that such an innovative qualification would impact significantly on post-compulsory science participation—either positively or negatively. Using national data in England to track one cohort of students over 2007–2011, this study compares progression rates to post-compulsory science qualifications in England between 21CS qualifications and more traditional non-21CS qualifications. Methods employed include simple comparisons of proportions progressing from each qualification, and more complex multi-level modelling approaches that take account of both students clustered in schools, and potentially differing demographic and achievement profiles of students in the 2 groups of qualifications. A simple descriptive analysis shows that there is very little difference in overall progression rates between the 2 types of 14–16 science qualification. More fine-grained descriptive analyses show that there are some important differences, based in particular on the interaction between the amount of science studied at ages 14–16, and on the post-16 science qualification chosen (biology, chemistry or physics). Furthermore, sophisticated modelling analyses indicate a consistently negative small to moderate impact on progression from the 21CS qualification. Overall, our findings suggest that the emphasis on scientific literacy within the 21CS qualification suite has not had a major impact on the uptake of post-compulsory science qualifications.

Assessment & Evaluation in Higher Education | 2012

Psychometric characteristics of integrated multi-specialty examinations: Ebel ratings and unidimensionality

Matt Homer; Jonathan C. Darling; Godfrey Pell

Over recent years, UK medical schools have moved to more integrated summative examinations. This paper analyses data from the written assessment of undergraduate medical students to investigate two key psychometric aspects of this type of high-stakes assessment. Firstly, the strength of the relationship between examiner predictions of item performance (as required under the Ebel standard setting method employed) and actual item performance (‘facility’) in the examination is explored. It is found that there is a systematic pattern of difference between these two measures, with examiners tending to underestimate the difficulty of items classified as relatively easy, and overestimating that of items classified harder. The implications of these differences for standard setting are considered. Secondly, the integration of the assessment raises the question as to whether the student total score in the exam can provide a single meaningful measure of student performance across a broad range of medical specialties. Therefore, Rasch measurement theory is employed to evaluate psychometric characteristics of the examination, including its dimensionality. Once adjustment is made for item interdependency, the examination is shown to be unidimensional with fit to the Rasch model implying that a single underlying trait, clinical knowledge, is being measured.

Medical Teacher | 2016

Quantifying error in OSCE standard setting for varying cohort sizes: A resampling approach to measuring assessment quality

Matt Homer; Godfrey Pell; Richard Fuller; John Patterson

Abstract Background: The use of the borderline regression method (BRM) is a widely accepted standard setting method for OSCEs. However, it is unclear whether this method is appropriate for use with small cohorts (e.g. specialist post-graduate examinations). Aims and methods: This work uses an innovative application of resampling methods applied to four pre-existing OSCE data sets (number of stations between 17 and 21) from two institutions to investigate how the robustness of the BRM changes as the cohort size varies. Using a variety of metrics, the ‘quality’ of an OSCE is evaluated for cohorts of approximately n = 300 down to n = 15. Estimates of the standard error in station-level and overall pass marks, R2 coefficient, and Cronbach’s alpha are all calculated as cohort size varies. Results and conclusion: For larger cohorts (n > 200), the standard error in the overall pass mark is small (less than 0.5%), and for individual stations is of the order of 1–2%. These errors grow as the sample size reduces, with cohorts of less than 50 candidates showing unacceptably large standard error. Alpha and R2 also become unstable for small cohorts. The resampling methodology is shown to be robust and has the potential to be more widely applied in standard setting and medical assessment quality assurance and research.

Explore More