Hollis Lai
University of Alberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hollis Lai.
Medical Education | 2012
Mark J. Gierl; Hollis Lai; Simon R. Turner
Medical Education 2012: 46: 757–765
International Journal of Testing | 2012
Mark J. Gierl; Hollis Lai
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates or prototypes, that highlight the features or elements in the assessment task that must be manipulated. Second, these item model elements are manipulated to generate new items with the aid of computer-based algorithms. With this two-step process, hundreds or even thousands of new items can be created from a single item model. The purpose of our article is to describe seven different but related topics that are central to the development and use of item models for automatic item generation. We start by defining item model and highlighting some related concepts; we describe how item models are developed; we present an item model taxonomy; we illustrate how item models can be used for automatic item generation; we outline some benefits of using item models; we introduce the idea of an item model bank; and finally, we demonstrate how statistical procedures can be used to estimate the parameters of the generated items without the need for extensive field or pilot testing.
Medical Education | 2014
Mark J. Gierl; Syed Latifi; Hollis Lai; André-Philippe Boulais; André F. De Champlain
Constructed‐response tasks, which range from short‐answer tests to essay questions, are included in assessments of medical knowledge because they allow educators to measure students’ ability to think, reason, solve complex problems, communicate and collaborate through their use of writing. However, constructed‐response tasks are also costly to administer and challenging to score because they rely on human raters. One alternative to the manual scoring process is to integrate computer technology with writing assessment. The process of scoring written responses using computer programs is known as ‘automated essay scoring’ (AES).
Medical Education | 2013
Mark J. Gierl; Hollis Lai
Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four‐member expert medical panel using indicators of multiple‐choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review.
Dentomaxillofacial Radiology | 2015
Mohammed A.Q. Al-Saleh; Jacob L. Jaremko; Noura A. Alsufyani; Z Jibri; Hollis Lai; Paul W. Major
OBJECTIVES To evaluate image quality of two methods of registering MRI and CBCT images of the temporomandibular joint (TMJ), particularly regarding TMJ articular disc-condyle relationship and osseous abnormality. METHODS MR and CBCT images for 10 patients (20 TMJs) were obtained and co-registered using two methods (non-guided and marker guided) using Mirada XD software (Mirada Medical Ltd, Oxford, UK). Three radiologists independently and blindly evaluated three types of images (MRI, CBCT and registered MRI-CBCT) at two times (T1 and T2) on two criteria: (1) quality of MRI-CBCT registrations (excellent, fair or poor) and (2) TMJ disc-condylar position and articular osseous abnormalities (osteophytes, erosions and subcortical cyst, surface flattening, sclerosis). RESULTS 75% of the non-guided registered images showed excellent quality, and 95% of the marker-guided registered images showed poor quality. Significant difference was found between the non-guided and marker-guided registration (χ(2) = 108.5; p < 0.01). The interexaminer variability of the disc position in MRI [intraclass correlation coefficient (ICC) = 0.50 at T1, 0.56 at T2] was lower than that in MRI-CBCT registered images [ICC = 0.80 (0.52-0.92) at T1, 0.84 (0.62-0.93) at T2]. Erosions and subcortical cysts were noticed less frequently in the MRI-CBCT images than in CBCT images. CONCLUSIONS Non-guided registration proved superior to marker-guided registration. Although MRI-CBCT fused images were slightly more limited than CBCT alone to detect osseous abnormalities, use of the fused images improved the consistency among examiners in detecting disc position in relation to the condyle.
Medical Teacher | 2016
Debra Pugh; André F. De Champlain; Mark J. Gierl; Hollis Lai; Claire Touchie
Abstract With the recent interest in competency-based education, educators are being challenged to develop more assessment opportunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice questions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties comparable to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cognitive models.
Teaching and Learning in Medicine | 2016
Hollis Lai; Mark J. Gierl; Claire Touchie; Debra Pugh; André-Philippe Boulais; André F. De Champlain
ABSTRACT Construct: Automatic item generation (AIG) is an alternative method for producing large numbers of test items that integrate cognitive modeling with computer technology to systematically generate multiple-choice questions (MCQs). The purpose of our study is to describe and validate a method of generating plausible but incorrect distractors. Initial applications of AIG demonstrated its effectiveness in producing test items. However, expert review of the initial items identified a key limitation where the generation of implausible incorrect options, or distractors, might limit the applicability of items in real testing situations. Background: Medical educators require development of test items in large quantities to facilitate the continual assessment of student knowledge. Traditional item development processes are time-consuming and resource intensive. Studies have validated the quality of generated items through content expert review. However, no study has yet documented how generated items perform in a test administration. Moreover, no study has yet to validate AIG through student responses to generated test items. Approach: To validate our refined AIG method in generating plausible distractors, we collected psychometric evidence from a field test of the generated test items. A three-step process was used to generate test items in the area of jaundice. At least 455 Canadian and international medical graduates responded to each of the 13 generated items embedded in a high-stake exam administration. Item difficulty, discrimination, and index of discrimination estimates were calculated for the correct option as well as each distractor. Results: Item analysis results for the correct options suggest that the generated items measured candidate performances across a range of ability levels while providing a consistent level of discrimination for each item. Results for the distractors reveal that the generated items differentiated the low- from the high-performing candidates. Conclusions: Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.
Applied Measurement in Education | 2016
Mark J. Gierl; Hollis Lai; Debra Pugh; Claire Touchie; André-Philippe Boulais; André F. De Champlain
ABSTRACT Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric characteristics of generated multiple-choice test items are largely unknown and undocumented. We present item analysis results from one of the first empirical studies designed to evaluate the psychometric properties of generated multiple-choice items using the results from a high stakes national medical licensure examination. The item analysis results for the correct option revealed that the generated items measured examinees’ performance across a broad range of ability levels while, at the same time, providing a consistently strong level of discrimination for each item. Results for the incorrect options revealed that the generated items consistently differentiated the low from the high performing examinees.
Applied Psychological Measurement | 2018
Mark J. Gierl; Hollis Lai
Computerized testing provides many benefits to support formative assessment. However, the advent of computerized formative testing has also raised formidable new challenges, particularly in the area of item development. Large numbers of diverse, high-quality test items are required because items are continuously administered to students. Hence, hundreds of items are needed to develop the banks necessary for computerized formative testing. One promising approach that may be used to address this test development challenge is automatic item generation. Automatic item generation is a relatively new but rapidly evolving research area where cognitive and psychometric modeling practices are used to produce items with the aid of computer technology. The purpose of this study is to describe a new method for generating both the items and the rationales required to solve the items to produce the required feedback for computerized formative testing. The method for rationale generation is demonstrated and evaluated in the medical education domain.
Educational Research and Evaluation | 2013
Mark J. Gierl; Hollis Lai; Johnson Ching-Hong Li
The purpose of this study is to evaluate the performance of CATSIB (Computer Adaptive Testing-Simultaneous Item Bias Test) for detecting differential item functioning (DIF) when items in the matching and studied subtest are administered adaptively in the context of a realistic multi-stage adaptive test (MST). MST was simulated using a 4-item module in a 7-panel administration. Three independent variables, expected to affect DIF detection rates, were manipulated: item difficulty, sample size, and balanced/unbalanced design. CATSIB met the acceptable criteria, meaning that the Type I error and power rates met 5% and 80%, respectively, for the large reference/moderate focal sample and the large reference/large focal sample conditions. These results indicate that CATSIB can be used to consistently and accurately detect DIF on an MST, but only with moderate to large samples.