Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where André F. De Champlain is active.

Publication


Featured researches published by André F. De Champlain.


Medical Education | 2014

Automated essay scoring and the future of educational assessment in medical education

Mark J. Gierl; Syed Latifi; Hollis Lai; André-Philippe Boulais; André F. De Champlain

Constructed‐response tasks, which range from short‐answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students’ ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed‐response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as ‘automated essay scoring’ (AES).


Advances in Health Sciences Education | 2015

Examiners and content and site: Oh My! A national organization's investigation of score variation in large-scale performance assessments.

Stefanie S. Sebok; Marguerite Roy; Don A. Klinger; André F. De Champlain

Examiner effects and content specificity are two well known sources of construct irrelevant variance that present great challenges in performance-based assessments. National medical organizations that are responsible for large-scale performance based assessments experience an additional challenge as they are responsible for administering qualification examinations to physician candidates at several locations and institutions. This study explores the impact of site location as a source of score variation in a large-scale national assessment used to measure the readiness of internationally educated physician candidates for residency programs. Data from the Medical Council of Canada’s National Assessment Collaboration were analyzed using Hierarchical Linear Modeling and Rasch Analyses. Consistent with previous research, problematic variance due to examiner effects and content specificity was found. Additionally, site location was also identified as a potential source of construct irrelevant variance in examination scores.


Medical Teacher | 2016

Using cognitive models to develop quality multiple-choice questions

Debra Pugh; André F. De Champlain; Mark J. Gierl; Hollis Lai; Claire Touchie

Abstract With the recent interest in competency-based education, educators are being challenged to develop more assessment opportunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice questions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties comparable to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cognitive models.


Teaching and Learning in Medicine | 2016

Using Automatic Item Generation to Improve the Quality of MCQ Distractors

Hollis Lai; Mark J. Gierl; Claire Touchie; Debra Pugh; André-Philippe Boulais; André F. De Champlain

ABSTRACT Construct: Automatic item generation (AIG) is an alternative method for producing large numbers of test items that integrate cognitive modeling with computer technology to systematically generate multiple-choice questions (MCQs). The purpose of our study is to describe and validate a method of generating plausible but incorrect distractors. Initial applications of AIG demonstrated its effectiveness in producing test items. However, expert review of the initial items identified a key limitation where the generation of implausible incorrect options, or distractors, might limit the applicability of items in real testing situations. Background: Medical educators require development of test items in large quantities to facilitate the continual assessment of student knowledge. Traditional item development processes are time-consuming and resource intensive. Studies have validated the quality of generated items through content expert review. However, no study has yet documented how generated items perform in a test administration. Moreover, no study has yet to validate AIG through student responses to generated test items. Approach: To validate our refined AIG method in generating plausible distractors, we collected psychometric evidence from a field test of the generated test items. A three-step process was used to generate test items in the area of jaundice. At least 455 Canadian and international medical graduates responded to each of the 13 generated items embedded in a high-stake exam administration. Item difficulty, discrimination, and index of discrimination estimates were calculated for the correct option as well as each distractor. Results: Item analysis results for the correct options suggest that the generated items measured candidate performances across a range of ability levels while providing a consistent level of discrimination for each item. Results for the distractors reveal that the generated items differentiated the low- from the high-performing candidates. Conclusions: Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.


Journal of Educational Evaluation for Health Professions | 2016

Calibrating the Medical Council of Canada’s Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

André F. De Champlain; André-Philippe Boulais; Andrew Dallas

Purpose: The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory. Methods: Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4. Results: The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%). Conclusion: Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.


Applied Measurement in Education | 2016

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Mark J. Gierl; Hollis Lai; Debra Pugh; Claire Touchie; André-Philippe Boulais; André F. De Champlain

ABSTRACT Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric characteristics of generated multiple-choice test items are largely unknown and undocumented. We present item analysis results from one of the first empirical studies designed to evaluate the psychometric properties of generated multiple-choice items using the results from a high stakes national medical licensure examination. The item analysis results for the correct option revealed that the generated items measured examinees’ performance across a broad range of ability levels while, at the same time, providing a consistently strong level of discrimination for each item. Results for the incorrect options revealed that the generated items consistently differentiated the low from the high performing examinees.


Evaluation & the Health Professions | 2016

Using Automated Scoring to Evaluate Written Responses in English and French on a High-Stakes Clinical Competency Examination.

Syed Latifi; Mark J. Gierl; André-Philippe Boulais; André F. De Champlain

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates’ responses from six write-in CDM questions were used to develop a three-stage–automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised machine learning techniques were employed for developing the scoring models. In Stage 3, responses to six English and French CDM questions were scored using the scoring models from Stage 2. Of the 8,007 English and French CDM responses, 7,643 were accurately scored with an agreement rate of 95.4% between human and computer scoring. This result serves as an improvement of 5.4% when compared with the human inter-rater reliability. Our framework yielded scores similar to those of expert physician markers and could be used for clinical competency assessment.


Journal of Educational Evaluation for Health Professions | 2015

Best-fit model of exploratory and confirmatory factor analysis of the 2010 Medical Council of Canada Qualifying Examination Part I clinical decision-making cases

André F. De Champlain

Purpose: This study aims to assess the fit of a number of exploratory and confirmatory factor analysis models to the 2010 Medical Council of Canada Qualifying Examination Part I (MCCQE1) clinical decision-making (CDM) cases. The outcomes of this study have important implications for a range of domains, including scoring and test development. Methods: The examinees included all first-time Canadian medical graduates and international medical graduates who took the MCCQE1 in spring or fall 2010. The fit of one- to five-factor exploratory models was assessed for the item response matrix of the 2010 CDM cases. Five confirmatory factor analytic models were also examined with the same CDM response matrix. The structural equation modeling software program Mplus was used for all analyses. Results: Out of the five exploratory factor analytic models that were evaluated, a three-factor model provided the best fit. Factor 1 loaded on three medicine cases, two obstetrics and gynecology cases, and two orthopedic surgery cases. Factor 2 corresponded to pediatrics, and the third factor loaded on psychiatry cases. Among the five confirmatory factor analysis models examined in this study, three- and four-factor lifespan period models and the five-factor discipline models provided the best fit. Conclusion: The results suggest that knowledge of broad disciplinary domains best account for performance on CDM cases. In test development, particular effort should be placed on developing CDM cases according to broad discipline and patient age domains; CDM testlets should be assembled largely using the criteria of discipline and age.


BMC Medical Education | 2014

Multiple tutorial-based assessments: a generalizability study

Christina St-Onge; Éric Frenette; Daniel Côté; André F. De Champlain

BackgroundTutorial-based assessment commonly used in problem-based learning (PBL) is thought to provide information about students which is different from that gathered with traditional assessment strategies such as multiple-choice questions or short-answer questions. Although multiple-observations within units in an undergraduate medical education curriculum foster more reliable scores, that evaluation design is not always practically feasible. Thus, this study investigated the overall reliability of a tutorial-based program of assessment, namely the Tutotest-Lite.MethodsMore specifically, scores from multiple units were used to profile clinical domains for the first two years of a system-based PBL curriculum.ResultsG-Study analysis revealed an acceptable level of generalizability, with g-coefficients of 0.84 and 0.83 for Years 1 and 2, respectively. Interestingly, D-Studies suggested that as few as five observations over one year would yield sufficiently reliable scores.ConclusionsOverall, the results from this study support the use of the Tutotest-Lite to judge clinical domains over different PBL units.


Teaching and Learning in Medicine | 2017

Cheating in OSCEs: The Impact of Simulated Security Breaches on OSCE Performance

Andrea Gotzmann; André F. De Champlain; Fahmida Homayra; Alexa Fotheringham; Ingrid de Vries; Melissa A. Forgie; Debra Pugh

Collaboration


Dive into the André F. De Champlain's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andrea Gotzmann

Medical Council of Canada

View shared research outputs
Top Co-Authors

Avatar

Fang Tian

Medical Council of Canada

View shared research outputs
Top Co-Authors

Avatar

Marguerite Roy

Medical Council of Canada

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge