Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stephen G. Schilling is active.

Publication


Featured researches published by Stephen G. Schilling.


Elementary School Journal | 2004

Developing Measures of Teachers’ Mathematics Knowledge for Teaching

Heather C. Hill; Stephen G. Schilling; Deborah Loewenberg Ball

In this article we discuss efforts to design and empirically test measures of teachers’ content knowledge for teaching elementary mathematics. We begin by reviewing the literature on teacher knowledge, noting how scholars have organized such knowledge. Next we describe survey items we wrote to represent knowledge for teaching mathematics and results from factor analysis and scaling work with these items. We found that teachers’ knowledge for teaching elementary mathematics was multidimensional and included knowledge of various mathematical topics (e.g., number and operations, algebra) and domains (e.g., knowledge of content, knowledge of students and content). The constructs indicated by factor analysis formed psychometrically acceptable scales.


Elementary School Journal | 2004

Developing Measures of Content Knowledge for Teaching Reading

Geoffrey Phelps; Stephen G. Schilling

In this article we present results from a project to develop survey measures of the content knowledge teachers need to teach elementary reading. In areas such as mathematics and science, there has been great interest in the specialized ways teachers need to know a subject to teach it to others—often referred to as pedagogical content knowledge. However, little is known about what teachers need to know about reading to teach it effectively. We begin the article by discussing what might constitute content knowledge for teaching reading and by describing the survey items we wrote. Next, factor and scaling results are presented from a pilot study of 261 multiple‐choice items with 1,542 elementary teachers. We found that content knowledge for teaching reading included multiple dimensions, defined both by topic and by how teachers use knowledge in teaching practice. Items within these constructs formed reliable scales.


Elementary School Journal | 2007

Are Fluency Measures Accurate Predictors of Reading Achievement

Stephen G. Schilling; Joanne F. Carlisle; Sarah E. Scott; Ji Zeng

This study focused on the predictive validity of fluency measures that comprise Dynamic Indicators of Basic Early Literacy Skills (DIBELS). Data were gathered from first through third graders attending 44 schools in 9 districts or local educational agencies that made up the first Reading First cohort in Michigan. Students were administered DIBELS subtests in the fall, winter, and spring, and they took the reading subtests of the Iowa Tests of Basic Skills (ITBS) in the spring. Results showed that DIBELS subtests significantly predicted year‐end reading achievement on the ITBS, Reading Total subtest. They also showed that DIBELS at‐risk benchmarks for oral reading fluency (ORF) were reasonably accurate at identifying second and third graders who were reading below the twenty‐fifth percentile at the end of the year (80% and 76% for second and third graders, respectively). However, 32% of second graders and 37% of third graders who were identified as at low risk by the ORF benchmarks turned out not to be reading at grade level on ITBS in April. We discuss 2 possibilities for improving the assessment of students’ progress in reading: (a) supplementing DIBELS with measures of reading comprehension and vocabulary, and (b) using frequent progress‐monitoring assessments for students at risk for reading problems to identify students who are not responding to classroom instruction


Journal of Computational and Graphical Statistics | 2002

Warp Bridge Sampling

Xiao-Li Meng; Stephen G. Schilling

Bridge sampling, a general formulation of the acceptance ratio method in physics for computing free-energy difference, is an effective Monte Carlo method for computing normalizing constants of probability models. The method was originally proposed for cases where the probability models have overlapping support. Voter proposed the idea of shifting physical systems before applying the acceptance ratio method to calculate free-energy differences between systems that are highly separated in a configuration space. The purpose of this article is to push Voters idea further by applying more general transformations, including stochastic transformations resulting from mixing over transformation groups, to the underlying variables before performing bridge sampling. We term such methods warp bridge sampling to highlight the fact that in addition to location shifting (i.e., centering) one can further reduce the difference/distance between two densities by warping their shapes without changing the normalizing constants. Real data-based empirical studies using the full information item factor model and a nonlinear mixed model are provided to demonstrate the potentially substantial gains in Monte Carlo efficiency by going beyond centering and by using efficient bridge sampling estimators. Our general method is also applicable to a couple of recent proposals for computing marginal likelihoods and Bayes factors because these methods turn out to be covered by the general bridge sampling framework.


Measurement | 2007

Assessing Measures of Mathematical Knowledge for Teaching: A Validity Argument Approach

Stephen G. Schilling; Heather C. Hill

In assessing the utility of a test, two issues stand out: whether it provides information of interest to test consumers, and whether scores generated by the test assist in making good decisions. Validity addresses these two issues, making an assessment of test validity the single most important product provided by test developers. Unfortunately, despite its importance, test validation is almost universally viewed as the most unsatisfactory aspect of test development. As Messick (1988) noted, there has been a consistent disjunction between validity conceptualization and validation practice. To start, the proliferation of many different kinds of validity evidence without clear prioritization presents test consumers with an enormous task, that of sifting through various methods, approaches, and empirical work to determine the usability of a test. At the same time, some test developers use evidence (and methods) selectively, choosing convenient means for test validation, and convenient results for reporting. Kane (2001, 2004a) developed an argument-based approach to validity as a means of addressing these difficulties. His approach consists of two stages:


Measurement: Interdisciplinary Research & Perspective | 2007

Test Validation and the MKT Measures: Generalizations and Conclusions

Stephen G. Schilling; Merrie L. Blunk; Heather C. Hill

Department of Education to the Consortium for Policy Research in Education (CPRE) at the University of Pennsylvania (Grant # OERI-R308A60003) and the Center for the Study of Teaching and Policy at the University of Washington (Grant # OERI-R308B70003); and to grants to the University of Michigan by the National Science Foundation (IERI/ REC-9979863 & REC-0129421, REC0207649, EHR-0233456, and EHR-0335411), the William and Flora Hewlett Foundation and the Atlantic Philanthropies. Opinions expressed in this paper are those of the authors, and do not reflect the views of the U.S. Department of Education, the National Science Foundation, the William and Flora Hewlett Foundation or the Atlantic Philanthropies.


Measurement: Interdisciplinary Research & Perspective | 2007

The Role of Psychometric Modeling in Test Validation: An Application of Multidimensional Item Response Theory

Stephen G. Schilling

However, this separation can be viewed as artificial (Marcoulides, 2004) and we believe that test validation must necessarily employ psychometric modeling to investigate key assumptions and inferences. Such an investigation intimately connects psychometric modeling to substantive concerns and provides a gateway for the relevance of psychometrics in educational and psychological research. In this paper we examine the role of item response theory (IRT), particularly multidimensional item response theory (MIRT) in test validation from a validity argument perspective. Our conceptualization of the interpretive argument differs from Kane in that it combines his second and third inference—generalization from the test score to the expected score over the test domain and extrapolating from test domain to the knowledge/skill/judgment domain—into a general


Quality of Life Research | 2016

New measures to capture end of life concerns in Huntington disease: Meaning and Purpose and Concern with Death and Dying from HDQLIFE (a patient-reported outcomes measurement system).

Noelle E. Carlozzi; Nancy Downing; Michael K. McCormack; Stephen G. Schilling; Joel S. Perlmutter; Elizabeth A. Hahn; Jin Shei Lai; Samuel Frank; Kimberly A. Quaid; Jane S. Paulsen; D Cella; Siera Goodnight; Jennifer A. Miner; Martha Nance

PurposeHuntington disease (HD) is an incurable terminal disease. Thus, end of life (EOL) concerns are common in these individuals. A quantitative measure of EOL concerns in HD would enable a better understanding of how these concerns impact health-related quality of life. Therefore, we developed new measures of EOL for use in HD.MethodsAn EOL item pool of 45 items was field tested in 507 individuals with prodromal or manifest HD. Exploratory and confirmatory factor analyses (EFA and CFA, respectively) were conducted to establish unidimensional item pools. Item response theory (IRT) and differential item functioning analyses were applied to the identified unidimensional item pools to select the final items.ResultsEFA and CFA supported two separate unidimensional sets of items: Concern with Death and Dying (16 items), and Meaning and Purpose (14 items). IRT and DIF supported the retention of 12 Concern with Death and Dying items and 4 Meaning and Purpose items. IRT data supported the development of both a computer adaptive test (CAT) and a 6-item, static short form for Concern with Death and Dying.ConclusionThe HDQLIFE Concern with Death and Dying CAT and corresponding 6-item short form, and the 4-item calibrated HDQLIFE Meaning and Purpose scale demonstrate excellent psychometric properties. These new measures have the potential to provide clinically meaningful information about end-of-life preferences and concerns to clinicians and researchers working with individuals with HD. In addition, these measures may also be relevant and useful for other terminal conditions.


Quality of Life Research | 2016

The development of a new computer adaptive test to evaluate chorea in Huntington disease: HDQLIFE Chorea

Noelle E. Carlozzi; Nancy Downing; Stephen G. Schilling; Jin Shei Lai; Siera Goodnight; Jennifer A. Miner; Samuel Frank

Purpose Huntington’s disease (HD) is an autosomal dominant neurodegenerative disease associated with motor, behavioral, and cognitive deficits. The hallmark symptom of HD, chorea, is often the focus of HD clinical trials. Unfortunately, there are no self-reported measures of chorea. To address this shortcoming, we developed a new measure of chorea for use in HD, HDQLIFE Chorea.MethodsQualitative data and literature reviews were conducted to develop an initial item pool of 141 chorea items. An iterative process, including cognitive interviews, expert review, translatability review, and literacy review, was used to refine this item pool to 64 items. These 64 items were field tested in 507 individuals with prodromal and/or manifest HD. Exploratory and confirmatory factor analyses (EFA and CFA, respectively) were conducted to identify a unidimensional set of items. Then, an item response theory graded response model (GRM) and differential item functioning analyses were conducted to select the final items for inclusion in this measure.ResultsEFA and CFA supported the retention of 34 chorea items. GRM and DIF supported the retention of all of these items in the final measure. GRM calibration data were used to inform the selection of a 6-item, static short form and to program the HDQLIFE Chorea computer adaptive test (CAT). CAT simulation analyses indicated a 0.99 correlation between the CAT scores and the full item bank.ConclusionsThe new HDQLIFE Chorea CAT and corresponding 6-item short form were developed using established rigorous measurement development standards; this is the first self-reported measure developed to evaluate the impact of chorea on HRQOL in HD. This development work indicates that these measures have strong psychometric properties; future work is needed to establish test–retest reliability and responsiveness to change.


Quality of Life Research | 2016

HDQLIFE: the development of two new computer adaptive tests for use in Huntington disease, Speech Difficulties, and Swallowing Difficulties

Noelle E. Carlozzi; Stephen G. Schilling; Jin Shei Lai; Joel S. Perlmutter; M. A. Nance; J. F. Waljee; Jennifer A. Miner; Stacey Barton; Siera Goodnight; Praveen Dayalu

PurposeHuntington disease (HD) is an autosomal dominant neurodegenerative disease which results in several progressive symptoms, including bulbar dysfunction (i.e., speech and swallowing difficulties). Although difficulties in speech and swallowing in HD have a negative impact on health-related quality of life, no patient-reported outcome measure exists to capture these difficulties that are specific to HD. Thus, we developed a new patient-reported outcome measure for use in the Huntington Disease Health-Related Quality of Life (HDQLIFE) Measurement System that focused on the impact that difficulties with speech and swallowing have on HRQOL in HD.MethodsFive hundred and seven individuals with prodromal and/or manifest HD completed 47 newly developed items examining speech and swallowing difficulties. Unidimensional item pools were identified using exploratory factor analysis and confirmatory factor analysis (EFA and CFA, respectively). Item response theory (IRT) was used to calibrate the final measures.ResultsEFA and CFA identified two separate unidimensional sets of items: Speech Difficulties (27 items) and Swallowing Difficulties (16 items). Items were calibrated separately for these two measures and resulted in item banks that can be administered as computer adaptive tests (CATs) and/or 6-item, static short forms. Reliability of both of these measures was supported through high correlations between the simulated CAT scores and the full item bank.ConclusionsCATs and 6-item calibrated short forms were developed for HDQLIFE Speech Difficulties and HDQLIFE Swallowing Difficulties. These measures both demonstrate excellent psychometric properties and may have clinical utility in other populations where speech and swallowing difficulties are prevalent.

Collaboration


Dive into the Stephen G. Schilling's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jin Shei Lai

Northwestern University

View shared research outputs
Researchain Logo
Decentralizing Knowledge