Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul R. Yarnold is active.

Publication


Featured researches published by Paul R. Yarnold.


Journal of Evaluation in Clinical Practice | 2016

Using data mining techniques to characterize participation in observational studies: Data mining in observational studies

Ariel Linden; Paul R. Yarnold

Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches – logistic regression, support vector machines, random forests and classification tree analysis (CTA) – in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to assess covariate balance in matching studies: Machine-learning for assessing covariate balance

Ariel Linden; Paul R. Yarnold

In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies.In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies.


Journal of Evaluation in Clinical Practice | 2016

Using data mining techniques to characterize participation in observational studies

Ariel Linden; Paul R. Yarnold

Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches – logistic regression, support vector machines, random forests and classification tree analysis (CTA) – in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to identify structural breaks in single-group interrupted time series designs: Machine learning to identify structural breaks

Ariel Linden; Paul R. Yarnold

Rationale, aims and objectives Single-group interrupted time series analysis (ITSA) is a popular evaluation methodology in which a single unit of observation is being studied, the outcome variable is serially ordered as a time series and the intervention is expected to ‘interrupt’ the level and/or trend of the time series, subsequent to its introduction. Given that the internal validity of the design rests on the premise that the interruption in the time series is associated with the introduction of the treatment, treatment effects may seem less plausible if a parallel trend already exists in the time series prior to the actual intervention. Thus, sensitivity analyses should focus on detecting structural breaks in the time series before the intervention. Method In this paper, we introduce a machine-learning algorithm called optimal discriminant analysis (ODA) as an approach to determine if structural breaks can be identified in years prior to the initiation of the intervention, using data from Californias 1988 voter-initiated Proposition 99 to reduce smoking rates. Results The ODA analysis indicates that numerous structural breaks occurred prior to the actual initiation of Proposition 99 in 1989, including perfect structural breaks in 1983 and 1985, thereby casting doubt on the validity of treatment effects estimated for the actual intervention when using a single-group ITSA design. Conclusions Given the widespread use of ITSA for evaluating observational data and the increasing use of machine-learning techniques in traditional research, we recommend that structural break sensitivity analysis is routinely incorporated in all research using the single-group ITSA design.


Journal of Evaluation in Clinical Practice | 2016

Combining machine learning and propensity score weighting to estimate causal effects in multivalued treatments.

Ariel Linden; Paul R. Yarnold

RATIONALE, AIMS AND OBJECTIVES Interventions with multivalued treatments are common in medical and health research; examples include comparing the efficacy of competing interventions and contrasting various doses of a drug. In recent years, there has been growing interest in the development of methods that estimate multivalued treatment effects using observational data. This paper extends a previously described analytic framework for evaluating binary treatments to studies involving multivalued treatments utilizing a machine learning algorithm called optimal discriminant analysis (ODA). METHOD We describe the differences between regression-based treatment effect estimators and effects estimated using the ODA framework. We then present an empirical example using data from an intervention including three study groups to compare corresponding effects. RESULTS The regression-based estimators produced statistically significant mean differences between the two intervention groups, and between one of the treatment groups and controls. In contrast, ODA was unable to discriminate between distributions of any of the three study groups. CONCLUSIONS Optimal discriminant analysis offers an appealing alternative to conventional regression-based models for estimating effects in multivalued treatment studies because of its insensitivity to skewed data and use of accuracy measures applicable to all prognostic analyses. If these analytic approaches produce consistent treatment effect P values, this bolsters confidence in the validity of the results. If the approaches produce conflicting treatment effect P values, as they do in our empirical example, the investigator should consider the ODA-derived estimates to be most robust, given that ODA uses permutation P values that require no distributional assumptions and are thus, always valid.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to model dose-response relationships: Machine learning to estimate dose-response

Ariel Linden; Paul R. Yarnold; Brahmajee K. Nallamothu

RATIONALE, AIMS AND OBJECTIVES Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose-response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose-response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. METHOD Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. RESULTS Generalized estimating equations and ODA both identified many statistically significant dose-response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). CONCLUSIONS Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose-response applications.Rationale, aims and objectives Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose–response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose–response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. Method Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. Results Generalized estimating equations and ODA both identified many statistically significant dose–response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). Conclusions Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose–response applications.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to assess covariate balance in matching studies.

Ariel Linden; Paul R. Yarnold

In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies.In order to assess the effectiveness of matching approaches in observational studies, investigators typically present summary statistics for each observed pre-intervention covariate, with the objective of showing that matching reduces the difference in means (or proportions) between groups to as close to zero as possible. In this paper, we introduce a new approach to distinguish between study groups based on their distributions of the covariates using a machine-learning algorithm called optimal discriminant analysis (ODA). Assessing covariate balance using ODA as compared with the conventional method has several key advantages: the ability to ascertain how individuals self-select based on optimal (maximum-accuracy) cut-points on the covariates; the application to any variable metric and number of groups; its insensitivity to skewed data or outliers; and the use of accuracy measures that can be widely applied to all analyses. Moreover, ODA accepts analytic weights, thereby extending the assessment of covariate balance to any study design where weights are used for covariate adjustment. By comparing the two approaches using empirical data, we are able to demonstrate that using measures of classification accuracy as balance diagnostics produces highly consistent results to those obtained via the conventional approach (in our matched-pairs example, ODA revealed a weak statistically significant relationship not detected by the conventional approach). Thus, investigators should consider ODA as a robust complement, or perhaps alternative, to the conventional approach for assessing covariate balance in matching studies.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to identify structural breaks in single‐group interrupted time series designs

Ariel Linden; Paul R. Yarnold

Rationale, aims and objectives Single-group interrupted time series analysis (ITSA) is a popular evaluation methodology in which a single unit of observation is being studied, the outcome variable is serially ordered as a time series and the intervention is expected to ‘interrupt’ the level and/or trend of the time series, subsequent to its introduction. Given that the internal validity of the design rests on the premise that the interruption in the time series is associated with the introduction of the treatment, treatment effects may seem less plausible if a parallel trend already exists in the time series prior to the actual intervention. Thus, sensitivity analyses should focus on detecting structural breaks in the time series before the intervention. Method In this paper, we introduce a machine-learning algorithm called optimal discriminant analysis (ODA) as an approach to determine if structural breaks can be identified in years prior to the initiation of the intervention, using data from Californias 1988 voter-initiated Proposition 99 to reduce smoking rates. Results The ODA analysis indicates that numerous structural breaks occurred prior to the actual initiation of Proposition 99 in 1989, including perfect structural breaks in 1983 and 1985, thereby casting doubt on the validity of treatment effects estimated for the actual intervention when using a single-group ITSA design. Conclusions Given the widespread use of ITSA for evaluating observational data and the increasing use of machine-learning techniques in traditional research, we recommend that structural break sensitivity analysis is routinely incorporated in all research using the single-group ITSA design.


Journal of Evaluation in Clinical Practice | 2016

Combining machine learning and matching techniques to improve causal inference in program evaluation: Machine learning and matching techniques

Ariel Linden; Paul R. Yarnold

RATIONALE, AIMS AND OBJECTIVES Program evaluations often utilize various matching approaches to emulate the randomization process for group assignment in experimental studies. Typically, the matching strategy is implemented, and then covariate balance is assessed before estimating treatment effects. This paper introduces a novel analytic framework utilizing a machine learning algorithm called optimal discriminant analysis (ODA) for assessing covariate balance and estimating treatment effects, once the matching strategy has been implemented. This framework holds several key advantages over the conventional approach: application to any variable metric and number of groups; insensitivity to skewed data or outliers; and use of accuracy measures applicable to all prognostic analyses. Moreover, ODA accepts analytic weights, thereby extending the methodology to any study design where weights are used for covariate adjustment or more precise (differential) outcome measurement. METHOD One-to-one matching on the propensity score was used as the matching strategy. Covariate balance was assessed using standardized difference in means (conventional approach) and measures of classification accuracy (ODA). Treatment effects were estimated using ordinary least squares regression and ODA. RESULTS Using empirical data, ODA produced results highly consistent with those obtained via the conventional methodology for assessing covariate balance and estimating treatment effects. CONCLUSIONS When ODA is combined with matching techniques within a treatment effects framework, the results are consistent with conventional approaches. However, given that it provides additional dimensions and robustness to the analysis versus what can currently be achieved using conventional approaches, ODA offers an appealing alternative.


Journal of Evaluation in Clinical Practice | 2016

Using machine learning to model dose-response relationships

Ariel Linden; Paul R. Yarnold; Brahmajee K. Nallamothu

RATIONALE, AIMS AND OBJECTIVES Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose-response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose-response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. METHOD Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. RESULTS Generalized estimating equations and ODA both identified many statistically significant dose-response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). CONCLUSIONS Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose-response applications.Rationale, aims and objectives Establishing the relationship between various doses of an exposure and a response variable is integral to many studies in health care. Linear parametric models, widely used for estimating dose–response relationships, have several limitations. This paper employs the optimal discriminant analysis (ODA) machine-learning algorithm to determine the degree to which exposure dose can be distinguished based on the distribution of the response variable. By framing the dose–response relationship as a classification problem, machine learning can provide the same functionality as conventional models, but can additionally make individual-level predictions, which may be helpful in practical applications like establishing responsiveness to prescribed drug regimens. Method Using data from a study measuring the responses of blood flow in the forearm to the intra-arterial administration of isoproterenol (separately for 9 black and 13 white men, and pooled), we compare the results estimated from a generalized estimating equations (GEE) model with those estimated using ODA. Results Generalized estimating equations and ODA both identified many statistically significant dose–response relationships, separately by race and for pooled data. Post hoc comparisons between doses indicated ODA (based on exact P values) was consistently more conservative than GEE (based on estimated P values). Compared with ODA, GEE produced twice as many instances of paradoxical confounding (findings from analysis of pooled data that are inconsistent with findings from analyses stratified by race). Conclusions Given its unique advantages and greater analytic flexibility, maximum-accuracy machine-learning methods like ODA should be considered as the primary analytic approach in dose–response applications.

Collaboration


Dive into the Paul R. Yarnold's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Akida Lebby

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar

Charles L. Bennett

United States Department of Veterans Affairs

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph R. Berger

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Laura Rose Bobolts

Nova Southeastern University

View shared research outputs
Top Co-Authors

Avatar

LeAnn B. Norris

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Ray

University of South Carolina

View shared research outputs
Researchain Logo
Decentralizing Knowledge