Margaret J. Safrit
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Margaret J. Safrit.
Research Quarterly for Exercise and Sport | 1989
Margaret J. Safrit; Allan S. Cohen; M. Glaucia Costa
Item response theory (IRT) has been the focus of intense research and development activity in educational and psychological measurement during the past decade. Because this theory can provide more precise information about test items than other theories usually used in measuring motor behavior, the application of IRT in physical education and exercise science merits investigation. In IRT, the difficulty level of each item (e.g., trial or task) can be estimated and placed on the same scale as the ability of the examinee. Using this information, the test developer can determine the ability levels at which the test functions best. Equating the scores of individuals on two or more items or tests can be handled efficiently by applying IRT. The precision of the identification of performance standards in a mastery test context can be enhanced, as can adaptive testing procedures. In this tutorial, several potential benefits of applying IRT to the measurement of motor behavior were described. An example is provided using bowling data and applying the graded-response form of the Rasch IRT model. The data were calibrated and the goodness of fit was examined. This analysis is described in a step-by-step approach. Limitations to using an IRT model with a test consisting of repeated measures were noted.
Research Quarterly for Exercise and Sport | 1987
Margaret J. Safrit; Terry M. Wood
Abstract The Health-Related Physical Fitness Test (HRPFT) includes four subtests which measure components of physical fitness affecting a positive health state. The validity and reliability of each subtest have been demonstrated to be adequate, as has the overall validity of the battery. However, test battery reliability has not been established. The purpose of this study was to estimate the multivariate reliability of the HRPFT as a battery, using a data set obtained from middle-school children. Test battery reliability was estimated using a canonical correlation analysis. Estimates were calculated for boys and girls 11-14 years of age. The HRPFT was highly reliable for all age groups and both sexes. Univariate reliabilities were also calculated and, with the exception of the distance run test, these estimates were high. In conclusion, the multivariate reliability of the HRPFT as a test battery is satisfactory under all conditions for these middle-school children.
Research Quarterly for Exercise and Sport | 1986
Margaret J. Safrit; Terry M. Wood
Abstract The Health-Related Physical Fitness Test (HRPFT) Opinionnaire (Safrit & Wood, 1983) was administered to a stratified random sample of physical education teachers in Illinois, Oregon, and Arizona. For the total sample across states, the return rate was 31%. The responses were analyzed by total sample, state, and school level. Where appropriate, an item analysis was conducted to examine the internal consistency of items within clusters. In the total sample, 19% of the teachers had used the HRPFT and 81% had not. The major reasons for using the HRPFT were motivation, evaluation, and diagnosis of students. Eleven percent of the teachers in the users group did not feel health-related physical fitness was an important part of the physical education curriculum. Only half of the users agreed that the HRPFT measured overall physical fitness. Approximately 25% of non-users had read about the test, and very few had heard presentations about it. Generally, the results of this survey pointed to limited use of...
Quest | 1980
Margaret J. Safrit; Ted A. Baumgartner; Andrew S. Jackson; Carol Lee Stamm
Setting motor performance standards has long been a process of interest to physical educators. Theoretical advances in the measurement technology appropriate for standard-setting, however, have occurred only in the last decade. The first portion of this paper is devoted to a discussion of issues in setting standards and a brief review of procedures for standard-setting. In the latter section, gender differences in motor performance are examined and the impact of these differences on standard-setting is considered.
Research Quarterly for Exercise and Sport | 1987
Terry M. Wood; Margaret J. Safrit
Abstract This investigation compared, using computer sampling procedures, three multivariate models for estimating test battery reliability: the canonical reliability model (Conger & Lipshitz, 1973), the maximum generalizability model (Joe & Woodward, 1976), and the canonical correlation model (Wood & Safrit, 1984). The models were compared on the basis of theoretical underpinnings; sampling distribution characteristics; and the properties of bias, consistency, and relative efficiency. While estimators for all models evidenced little bias and were consistent, the coefficient of maximum generalizability showed the least degree of bias, the smallest errors in estimation, and the greatest relative efficiency across all experimental conditions.
Research Quarterly for Exercise and Sport | 1984
Terry M. Wood; Margaret J. Safrit
Abstract Some types of motor behavior cannot be measured by one test alone. For example, physical fitness is typically viewed as a multifaceted construct which should be measured by a battery comprising a group of tests. Test developers typically report a reliability estimate for each individual test in the battery. If the reliability of each test is acceptable, the reliability of the test battery is assumed to be satisfactory. However, measurement error can affect the total battery as well as each individual test. A proposed model for estimating test battery reliability based upon canonical correlation analysis is described in this paper. Descriptive statistics from previous investigations and nonempirical data are used to exemplify the procedure. Finally, modifications of the model for psychomotor test batteries are delineated.
Research Quarterly. American Alliance for Health, Physical Education and Recreation | 1975
Carol Lee Stamm; Margaret J. Safrit
Abstract The purposes of this study were (1) to compare the use of the usual F test, Boxs approximate test, and the Geisser-Greenhouse conservative test for significance testing of the repeated measures ANOVA design, and (2) to describe the appropriate application of these techniques using motor performance data. The usual F test was conducted and the F value was significant. Then the conservative test was used to compute a new critical value and the F value was nonsignificant. Thus it was necessary to estimate theta (θ) using the Box approximation and compute another critical value, and this F value was significant. Therefore, the significance of the between trials effect could be interpreted as an indication of true differences between trials rather than a result of violating the assumption of covariance, thereby increasing the risk of obtaining significant results by chance.
Research Quarterly for Exercise and Sport | 1992
John C. Kalohn; Kent Wagoner; Long-Guang Gao; Margaret J. Safrit; Nancy Getchell
The application of criterion-referenced (CR) standard setting procedures in physical education has been limited to the examinee-centered model known as criterion groups. Alternative examinee-centered approaches are available but have not been applied in sport skills testing. The purpose of this study was to compare two examinee-centered models for setting performance standards for a sport skills test battery. CR performance standards were determined for the tennis skills test battery published in Tennis skills test manual (Hensley, 1989) using the borderline group (BG) (Livingston & Zieky, 1982) and criterion groups (CG) (Berk, 1976) models. The comparison of these two methods demonstrated that the CG method consistently produced performance standards that were lower than the BG method. In one instance the BG method produced a standard that was clearly unreasonable. Estimates of CR reliability for the CG standards (.76 less than or equal to P less than or equal to .93; .52 less than or equal to Kq less than or equal to .86) were higher than BG estimates (.55 less than or equal to P less than or equal to .84; .11 less than or equal to Kq less than or equal to .68). Although each method has strengths, neither is without problems. Results from this study suggest these two methods might be combined to minimize the problems associated with each. This combined method should produce standards with improved accuracy, validity, and reliability.
Research Quarterly for Exercise and Sport | 1985
Margaret J. Safrit; Terry M. Wood; Sara A. Ehlert; Linda M. Hooper; Patricia Patterson
Abstract When a criterion-referenced lest is used to make mastery/nonmastery classifications, the probability of false positive and false negative classifications must be considered as well as the minimum skill level for mastery classification and maximum skill level for nonmastery classification. These constraints can lead to an excessively long fixed-length criterion-referenced test. In this investigation, an alternative strategy for testing—the sequential probability ratio test—was applied to a test of motor skill. The applicability of this procedure was examined using a golf chip test with parameters of α = .05, β = .05, θ0 = .7, and θ1 = .5. The test classifications had acceptable reliability and moderate validity. The sequential test classified almost half of the students in 15 or fewer trials. However, four students could be classified only after more than 65 trials, thus raising a question about the feasibility of using this testing procedure in a physical education class. On the other hand, feasi...
Research Quarterly for Exercise and Sport | 1986
Margaret J. Safrit; Patricia Patterson
Many changes have taken place in the format and operation of the Research Quarterly for Exercise and Sport (RQES) in recent years. The purpose of this study was to survey the section editors, reviewers, authors, and subscribers of the RQES to determine their perceptions of its quality. An inventory was developed for each of the four groups. The content validity of the instrument was established in three stages, the last stage occurring during a pilot study. The return rate for the four groups ranged from 61% to 80%. In many respects, the RQES was rated positively. However, several problem areas were identified. One was the difficulty in adequately representing all areas of specialization in physical education. Another was the perception of the quality of research published in the journal. In some areas, the best research was viewed as being published in other journals. The new section editor format has been well-received, as have many other changes made over the past decade. The issue of primary concern i...