Is this you? Create Your Porfile

Carol M. Myford

University of Illinois at Chicago

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carol M. Myford is active.

Explore More

Publication

Featured researches published by Carol M. Myford.

Language Testing | 2013

Raters’ L2 background as a potential source of bias in rating oral performance

Paula Winke; Susan M. Gass; Carol M. Myford

Based on evidence that listeners may favor certain foreign accents over others (Gass & Varonis, 1984; Major, Fitzmaurice, Bunta, & Balasubramanian, 2002; Tauroza & Luk, 1997) and that language-test raters may better comprehend and/or rate the speech of test takers whose native languages (L1s) are more familiar on some level (Carey, Mannell, & Dunn, 2011; Fayer & Krasinski, 1987; Scales, Wennerstrom, Richard, & Wu, 2006), we investigated whether accent familiarity (defined as having learned the test takers’ L1) leads to rater bias. We examined 107 raters’ ratings on 432 TOEFL iBTTM speech samples from 72 test takers. The raters of interest were L2 speakers of Spanish, Chinese, or Korean, while the test takers comprised three native-speaker groups (24 each) of Spanish, Chinese, and Korean. We analyzed the ratings using a multifaceted Rasch measurement approach. Results indicated that L2 Spanish raters were significantly more lenient with L1 Spanish test takers, as were L2 Chinese raters with L1 Chinese test takers. We conclude by concurring with Xi and Mollaun (2009, 2011) and Carey et al. that rater training should address raters’ linguistic background as a potential rater effect. Furthermore, we discuss the importance of recognizing rater L2 as a possible source of bias.

Advances in Health Sciences Education | 2009

Evaluating the effectiveness of rating instruments for a communication skills assessment of medical residents

Cherdsak Iramaneerat; Carol M. Myford; Rachel Yudkowsky; Tali Lowenstein

The investigators used evidence based on response processes to evaluate and improve the validity of scores on the Patient-Centered Communication and Interpersonal Skills (CIS) Scale for the assessment of residents’ communication competence. The investigators retrospectively analyzed the communication skills ratings of 68 residents at the University of Illinois at Chicago (UIC). Each resident encountered six standardized patients (SPs) portraying six cases. SPs rated the performance of each resident using the CIS Scale—an 18-item rating instrument asking for level of agreement on a 5-category scale. A many-faceted Rasch measurement model was used to determine how effectively each item and scale on the rating instrument performed. The analyses revealed that items were too easy for the residents. The SPs underutilized the lowest rating category, making the scale function as a 4-category rating scale. Some SPs were inconsistent when assigning ratings in the middle categories. The investigators modified the rating instrument based on the findings, creating the Revised UIC Communication and Interpersonal Skills (RUCIS) Scale—a 13-item rating instrument that employs a 4-category behaviorally anchored rating scale for each item. The investigators implemented the RUCIS Scale in a subsequent communication skills OSCE for 85 residents. The analyses revealed that the RUCIS Scale functioned more effectively than the CIS Scale in several respects (e.g., a more uniform distribution of ratings across categories, and better fit of the items to the measurement model). However, SPs still rarely assigned ratings in the lowest rating category of each scale.

Academic Medicine | 2013

Improving Student Selection Using Multiple Mini-Interviews With Multifaceted Rasch Modeling

Hettie Till; Carol M. Myford; Jonathan Dowell

Purpose The authors report multiple mini-interview (MMI) selection process data at the University of Dundee Medical School; staff, students, and simulated patients were examiners and investigated how effective this process was in separating candidates for entry into medical school according to the attributes measured, whether the different groups of examiners exhibited systematic differences in their rating patterns, and what effect such differences might have on candidates’ scores. Method The 452 candidates assessed in 2009 rotated through the same 10-station MMI that measured six noncognitive attributes. Each candidate was rated by one examiner in each station. Scores were analyzed using Facets software, with candidates, examiners, and stations as facets. The computer program calculated fair average scores that adjusted for examiner severity/leniency and station difficulty. Results The MMI reliably (0.89) separated the candidates into four statistically distinct levels of noncognitive ability. The Rasch measures accounted for 31.69% of the total variance in the ratings (candidates 16.01%, examiners 11.32%, and stations 4.36%). Students rated more severely than staff and also had more unexpected ratings. Adjusting scores for examiner severity/leniency and station difficulty would have changed the selection outcomes for 9.6% of the candidates. Conclusions The analyses highlighted the fact that quality control monitoring is essential to ensure fairness when ranking candidates according to scores obtained in the MMI. The results can be used to identify examiners needing further training, or who should not be included again, as well as stations needing review. “Fair average” scores should be used for ranking the candidates.

Advances in Health Sciences Education | 2015

Constructing and Evaluating a Validity Argument for the Final-Year Ward Simulation Exercise.

Hettie Till; Jean Ker; Carol M. Myford; Kevin Stirling; Gary Mires

Abstract The authors report final-year ward simulation data from the University of Dundee Medical School. Faculty who designed this assessment intend for the final score to represent an individual senior medical student’s level of clinical performance. The results are included in each student’s portfolio as one source of evidence of the student’s capability as a practitioner, professional, and scholar. Our purpose in conducting this study was to illustrate how assessment designers who are creating assessments to evaluate clinical performance might develop propositions and then collect and examine various sources of evidence to construct and evaluate a validity argument. The data were from all 154 medical students who were in their final year of study at the University of Dundee Medical School in the 2010–2011 academic year. To the best of our knowledge, this is the first report on an analysis of senior medical students’ clinical performance while they were taking responsibility for the management of a simulated ward. Using multi-facet Rasch measurement and a generalizability theory approach, we examined various sources of validity evidence that the medical school faculty have gathered for a set of six propositions needed to support their use of scores as measures of students’ clinical ability. Based on our analysis of the evidence, we would conclude that, by and large, the propositions appear to be sound, and the evidence seems to support their proposed score interpretation. Given the body of evidence collected thus far, their intended interpretation seems defensible.

Journal of Broadcasting & Electronic Media | 2014

Measuring Social and Emotional Content in Children's Television: An Instrument Development Study

Claire G. Christensen; Carol M. Myford

Few researchers have studied the television program characteristics that effectively facilitate social and emotional learning (SEL) in children. To further this line of investigation we created the SEL in Educational Childrens Television (SELECT) rating instrument. SELECT ratings indicate whether an educational television episode presents any of 6 SEL skills using any of 5 pedagogical techniques. In this study, 3 raters used the SELECT to rate 80 episodes. Results from multi-facet Rasch analyses illustrated the SELECTs strong content validity, intra- and inter-rater reliability, and sensitivity. Episodes typically presented SEL content implicitly, emphasizing social and decision-making skills most strongly.

Journal of applied measurement | 2003