Psychotherapy and Psychosomatics | 2021
More (Adjustment) Is Not Always Better: How Directed Acyclic Graphs Can Help Researchers Decide Which Covariates to Include in Models for the Causal Relationship between an Exposure and an Outcome in Observational Research
Abstract
When constructing a model for an outcome of interest (e.g., a linear regression model), the choice of covariates to be included depends in part on the researcher’s aims. (Although the term covariate has been variously defined, here we simply use covariate to refer to an explanatory variable included in a regression model that is not the explanatory variable of interest, i.e., the exposure.) If the aim is to build a predictive model [1], covariate selection focuses on improving model predictions while also limiting overfitting, which occurs when an overly complex model yields predictions that are too specific to a particular dataset, reducing its generalizability [2, 3]. Alternatively, the aim may be to build a causal model [1]. (Note that Table 1 contains a glossary with definitions for all key terms, which are italicized.) For instance, one might build a causal model in order to quantify the average causal effect of an exposure on an outcome. The average causal effect can be defined as the average difference in outcome had each individual in the population experienced the exposure, as compared to if no one had experienced the exposure [4]. As an illustration for this commentary, we will consider a causal model of the effect of the exposure preterm birth on the outcome attention-deficit/hyperactivity disorder (ADHD) during childhood. Because this example is offered purely for illustrative purposes, we have deliberately simplified it and ignored methodological difficulties that would complicate a real-world investigation of this issue (e.g., mismodeling the functional form of the variables, measurement error, and presence of unmeasured confounders). With causal models, the aim is to obtain the least biased possible estimate of the average causal effect, where the term “bias” is used to mean any deviation from an accurate measurement [5]. Thus, covariate selection should focus on eliminating or reducing bias. One type of bias occurs when the estimated causal effect reflects not only the causal relationship between the exposure and the out-