[PDF] Genome-wide Causation Studies of Complex Diseases

Abstract

Despite significant progress in dissecting the genetic architecture of complex diseases by genome-wide association studies (GWAS), the signals identified by association analysis may not have specific pathological relevance to diseases so that a large fraction of disease causing genetic variants is still hidden. Association is used to measure dependence between two variables or two sets of variables. Genome-wide association studies test association between a disease and SNPs (or other genetic variants) across the genome. Association analysis may detect superficial patterns between disease and genetic variants. Association signals provide limited information on the causal mechanism of diseases. The use of association analysis as a major analytical platform for genetic studies of complex diseases is a key issue that hampers discovery of the mechanism of diseases, calling into question the ability of GWAS to identify loci underlying diseases. It is time to move beyond association analysis toward techniques enabling the discovery of the underlying causal genetic strctures of complex diseases. To achieve this, we propose a concept of a genome-wide causation studies (GWCS) as an alternative to GWAS and develop additive noise models (ANMs) for genetic causation analysis. Type I error rates and power of the ANMs to test for causation are presented. We conduct GWCS of schizophrenia. Both simulation and real data analysis show that the proportion of the overlapped association and causation signals is small. Thus, we hope that our analysis will stimulate discussion of GWAS and GWCS.

Full PDF

GGenome-wide Causation Studies of Complex Diseases

Rong Jiao , Xiangning Chen , Eric Boerwinkle & Momiao Xiong

1* 1

Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, USA Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Nevada, USA Epidemiology, Human Genetics & Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA

Key words:

Causal inference, GWAS, GWCS, additive noise models, linkage disequilibrium, prediction * Address for correspondence and reprints: Dr. Momiao Xiong, Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, Texas 77225, (Phone): 713-500-9894, (Fax): 713-500-0900, E-mail: [email protected].

BSTRACT

NTRODUCTION

Although significant progress in dissecting the genetic architecture of complex diseases by GWAS has been made, the overall contribution of the new identified genetic variants is small and a large fraction of causal genetic variants is still hidden.

Association is to measure dependence between two variables or two sets of variables in the data and to use these dependencies for prediction that is not dealing with causal problems (Altman and Krzywinski, 2015;

Lopez-Paz 2016). Association analysis may detect superficial patterns between disease and genetic variants. Its signals provide limited information on the causal mechanism of diseases (Kahrilas and Kahrilas 2019). Association analysis has been a major paradigm of genetic analsyis of complex diseases for almost a century. Understanding the etiology and mechanism of complex diseases using association analysis remains elusive. Most genetic questions to uncover the mechanism of the disease is causal in nature. Causation analysis is an essential to the genetic analysis of complex diseases, yet ignored for long time (Lopez-Paz 2016; Kreif and DiazOrdaz 2019). It is well recognized that association analysis is not a direct method to discover the causal mechanism of complex diseases. Many investigators think that “association is essential to causation” and hope that we can successfully shift from association to causation (Jones et al. 2017). A current paradigm to make transition from association to causation is through omics analysis (Clyde 2017; Ongen et al. 2017). However, such approaches have two limitations. First most current omics analysis still detect association signals. For example, eQTL analysis that tests for association of a discrete variable (genetic variant) with a continuous variable (gene expression) is still association analysis. Observed association may not lead to inferring a causal relationship ( Orho-Melander 2015; Lee et al. 2018). The recent study has found that association signals tend to be spread across most of the genome (Boyle et al. 2017). The paradigm of GWAS with eQTL may still fail to identify the causal paths from genetic variants to disease. Second, the lack of association may not be necessary to imply the absence of a causal relationship (Callaway et al. 2017). The set of causal loci including causal QTL, causal eQTL, and causal mQTL is not the subset of association loci that are identified in QTL, eQTL and mQTL analysis simply because QTL, eQTL and mQTL analysis are based on regression. A large proportion of causal loci may not be discovered by association analysis. Finding causal SNPs only via searching the set of associated SNPs may miss many causal SNPs. In summary, the use of association analysis as a major analytical platform for genetic studies of complex diseases is a key issue that hampers identification of causal SNPs and discovery of the causal mechanisms of the diseases. Distuinguishing causation from association is an age-old problem. Methods for causation analysis that is one of the greatest challenging problems in science and technology need to be developed as an alternative to association analysis (Zenil et al. 2019). Without a proper causal analysis, to fully detect causal SNPs is not possible in general. Intutively, causation implies that changes in one variable will directly make changes in the other (Jaffe 2010). The essential distinction between association and causation relies on what the response will be if we intervene in the system (Lattimore and Ong 2018). There are two types of causal inference: intenventional causal inference and observational causal inference (Kaplan 2018). Interventional causal inference learns the effect of taking an action directly via experiments, for example, randomized controlled trials. Interventional experiments are a gold standard for causal inference. However, since in human genetics, we cannot change the genetic materials of human subjects, experimental interventions are unethical and infeasible. Therefore, it is essential to develop tatistical methods and algorithms to predict the outcomes of an intenvention from passive observation (Spirtes et al. 2000; Lattimore and Ong 2018). In this paper, we take an observational causal inference approach to identifying causal SNPs. Although we infer causation from observation data, our concept of causation is derived from intervention (Pearl 2019). In principle, causal inference is based on interventional distribution. The do-calculus is used as an essential concept for causal inference, which can simplify the expression for an interventional distribution. Repeated applications of the do-calculus will lead to an expression containing only observational quantities that can be used to estimate the interventional distribution from observational data (Lattimore and Ong 2018). Therefore, the do-operation is a key concept that makes observational causal inference feasible. Three essential frameworks: causal Beyesian networks, structural equation models and counterfactuals have been developed for observational causal inference (Rosenbaum and Rubin 1983; Pearl 2000; Peters et al. 2014; Peters et al. 2017; Xiong 2018; Lattimore and Ong 2018). In history, causal Beyesian networks, structural equation models and counterfactuals developed relatively independently in different fields, but they can be unified using interventional queries with do-calculus (Lattimore and Ong 2018). This allows methods and algorithms developed within one framework to be easily applied to one another, and also allows predictions about the consequences of intervening upon (rather than merely observing) variables, and provides a method of evaluating counterfactual claims. Therefore, we will use do-calculus as an unified framework for causal inference. Similar to GWAS which investigates the dependence relationship between SNPs and disease at a time, GWCS investigates the causal relationship between SNPs and disease at a time, eferred to as bivariate causal discovery. The traditional causal inference theory infers causal relationships among more than three variables and cannot be applied to bivariate causal discovery. Only recently the independence of cause and mechanism (ICM) and functional causal model, specifically additive noise models (ANMs) (Peters et al. 2017; Xiong 2018), has been proposed. ICM and discrete ANMs can be applied to GWCS. For a long time, many genetic epidemiologists hold a view that that causal inference from observational data is impossible. Some views and concepts that misunderstand causation widely exist in genetic epidemiology. There is also lack of algorithms for causal inference in genetics. Purpose of this paper is to rigorously define causation, clarify concept of causation and association and develop effective causal models and algorithms that can be easily used to discover causal structure in genetic analysis. While there is increasing evidence that association signals provide limited information on causes of disease and some investigators call the future of the GWAS into question (Callaway 2017), the modern causal inference theory provides powerful tools for bivariate causal discovery. In the past two dacades, causal theory has been well developed and is becoming an important component of artificial intelligence (AI). “ Reasoning in causal terms is omnipresent, from fundamental physics to medicine, social sciences and economics, and in everyday life” (Barrett et al. 2019). It is urgent to develop concepts and theory to show that under the right conditions and assumptions, causal-effect relationship between two variables can be inferred from purely observational data. It is time to develop a new generation of genetic analysis to shift the current paradigm of genetic analysis from association analysis to causal inference. To make the shift feasible, we will rigorously use do-calculus to model intervations, define the concept of causation, unify counterfactuals, functional causal models and ICM, and investigate the onnections and difference between association and causation. ANMs are easily used causal models. We use ANMs 𝑌𝑌 = 𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 ) + 𝑁𝑁 𝑌𝑌 where 𝑌𝑌 represents the disease status, 𝑋𝑋 represents the indicator variable for the genotype of a SNP, and 𝑁𝑁 𝑌𝑌 represents some residual term, as a general framework to distinguish causal directions and develop a new ANM-based statistic to test causation of SNP locus with the disease, which will be used for GWCS of complex disease. Under the assumption of no counfounders (causal model with counfounders will be discussed somewhere else), we investigate the identifiability of the ANM-based statistics for bivariate (SNP and disease) causal discovery. Since an analytical form for the distribution of the causal test statistic is difficult to derive, permutation methods will be used to compute the distribution of the causal test. To evaluate its performance for genetic causal analysis, we use large scale simulations to calculate the type I error rates of of the ANM-based statistics to test causation and to ompute its power under various conditions. To further evaluate its performance, an ANM-based casual test is applied. The proposed method is applied to the CATIE-MGS-SWD schizophrenia (SCZ) study dataset with 8,421,111 common SNPs typed in 13,557 individuals to perform GWCS of SCZ. To further investigate the properties of the ANM-based causal test, we will investigate the prediction ability of causal SNPs and impact of linkage disequilibrium (LD) on the causation analysis. Our purpose is to provide a detailed analysis of GWAS and GWCS as a response to the comments about the ability of GWCS to identify disease causing loci (Orho-Melander 2015). Basic concepts of association and causation n this section, we briefly introduce causal inference theory to make this section as self-contained as possible. We assume that two variables 𝑥𝑥 and 𝑦𝑦 are considered. Their joint distribution is denoted by 𝑃𝑃 ( 𝑥𝑥 , 𝑦𝑦 ) . Association between two variables 𝑥𝑥 and 𝑦𝑦 are defined as dependence between them. Statistically, association between 𝑥𝑥 and 𝑦𝑦 is defined as 𝑃𝑃 ( 𝑦𝑦 | 𝑥𝑥 ) ≠ 𝑃𝑃 ( 𝑦𝑦 ) . (1) Statistical dependence is a symmetric concept: if the variable 𝑥𝑥 to depends on the variable 𝑦𝑦 , then the variable 𝑦𝑦 also depends on the variable 𝑦𝑦 . Classical machine learning and statistical methods, built on pattern recognition and association analyses, are insufficient for causal reasoning. The science of causal reasoning is developing in various disciplines. In different disciplines, there may be different definitions of causations. Four key approaches have emerged: structural equation models, causal Bayesian networks, counterfactuals and independence of cause and mechanism (ICM) (Lattimore and Ongv 2018; Marsala 2015; Peters et al. 2017; Xiong 2018). The four schools of causality have been recently unified. Intervention calculus (do-calculus) can be taken as an unifuing language for causal inference. Intervention calculus

The purpose of intervention calculus is to describe the mathematical conditions under which we can make causal inference from observational data. Intuitively, causation is defined as the encoding of potential outcomes under intervention. Intervention is surgeries on mechanism. In other words, changes in one variable under intervention will affect the outcomes of another variable and hence can be used to measure effects of intervention (action). We consider two variables 𝑋𝑋 and 𝑌𝑌 . A causal model can be defined by intervention (action) as follows. If we do 𝑋𝑋 (forcing the random variable 𝑋𝑋 to take a specified value), then 𝑌𝑌 will be affected. Causation analysis investigates prediction of the effects of actions that perturb the observed system (Mooij et al. 2016). We use 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) to denote the distribution of 𝑌𝑌 conditional on an intervention that sets 𝑋𝑋 = 𝑥𝑥 . Now 𝑋𝑋 causing 𝑌𝑌 can be methatically defined as 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) ≠ 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) for some 𝑋𝑋 , 𝑋𝑋 , 𝑋𝑋 ≠ 𝑋𝑋 . (2) If 𝑋𝑋 causes Y ( 𝑋𝑋 → 𝑌𝑌 ) , then in general, we have 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) = 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) ≠ 𝑃𝑃 ( 𝑌𝑌 ) . (3) However, 𝑃𝑃�𝑋𝑋�𝑑𝑑𝑑𝑑 ( 𝑌𝑌 ) � = 𝑃𝑃 ( 𝑋𝑋 ) ≠ 𝑃𝑃 ( 𝑋𝑋 | 𝑌𝑌 ) , or if 𝑌𝑌 causes 𝑋𝑋 ( 𝑌𝑌 → 𝑋𝑋 ) then 𝑃𝑃�𝑌𝑌�𝑑𝑑𝑑𝑑 ( 𝑋𝑋 ) � = 𝑃𝑃 ( 𝑌𝑌 ) ≠ 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) . Although the joint probability can be factorized in terms of marginal distribution and conditional istribution as 𝑃𝑃 ( 𝑋𝑋𝑌𝑌 ) = 𝑃𝑃 ( 𝑋𝑋 ) 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) = 𝑃𝑃 ( 𝑌𝑌 ) 𝑃𝑃 ( 𝑋𝑋 | 𝑌𝑌 ) , If 𝑋𝑋 causes Y ( 𝑋𝑋 → 𝑌𝑌 ) , we have the factorization: 𝑃𝑃 ( 𝑋𝑋𝑌𝑌 ) = 𝑃𝑃 ( 𝑋𝑋 ) 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) , but in this case ( 𝑋𝑋 → 𝑌𝑌 ), we do not have 𝑃𝑃 ( 𝑋𝑋𝑌𝑌 ) = 𝑃𝑃 ( 𝑌𝑌 ) 𝑃𝑃 ( 𝑋𝑋 | 𝑑𝑑𝑑𝑑 ( 𝑌𝑌 )) , i.e., 𝑃𝑃 ( 𝑋𝑋𝑌𝑌 ) ≠ 𝑃𝑃 ( 𝑌𝑌 ) 𝑃𝑃 ( 𝑋𝑋 | 𝑑𝑑𝑑𝑑 ( 𝑌𝑌 )) , the joint probability of 𝑋𝑋 and 𝑌𝑌 cannot be factorized in terms of marginal distriation 𝑃𝑃 ( 𝑌𝑌 ) and interventional probability distribution 𝑃𝑃 ( 𝑋𝑋 | 𝑑𝑑𝑑𝑑 ( 𝑌𝑌 )) unless 𝑋𝑋 and 𝑌𝑌 are independent. For the genetic problem, 𝑌𝑌 represents a disease status and 𝑋𝑋 represents a genotype. The action do 𝑋𝑋 means that changing genotype 𝑋𝑋 is conducted (for human subject this is impossible, but for animal, it can be done by genome editing). Intervention calculus implies that if 𝑋𝑋 causes disease, then 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑𝑋𝑋 )) = 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) , otherwise if 𝑋𝑋 is not disease lcous, then 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) = 𝑃𝑃 ( 𝑌𝑌 ) ≠ 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) . Do-calculus can also be defined as 𝐸𝐸 [ 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )] . If effect variable 𝑌𝑌 is a binary variable, then we have 𝐸𝐸 [ 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )] = 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) . The various relationships between marginal, conditional and interventional distributions of 𝑋𝑋 and 𝑌𝑌 under causation and association are summarized in Figure 1. Figure 1 (d) clearly demonstrates differences between association and causation. Although temperate in the room and thermometer are associated, since temperature causes changes in thermometer, change in thermometer cannot change the temperature in the room, i.e., 𝑃𝑃 ( 𝑋𝑋 | 𝑑𝑑𝑑𝑑 ( 𝑌𝑌 )) = 𝑝𝑝 ( 𝑋𝑋 ) . In summary, association is studied by observed conditional distribution and causation is investigated by interventional distribution where causal effect is determined by the effect of hypothetic manipulation of an input on an output. In other words, association is investigated by seeing and causation is investigated by doing. To illustrate the the difference between seeing and doing, we present the following example: Example 1

Consider 𝑍𝑍 𝑖𝑖 ~ 𝑁𝑁 (0,1), 𝑖𝑖 ← 𝑍𝑍 𝑖𝑖 , 𝑌𝑌 𝑖𝑖 ← 𝑋𝑋 𝑖𝑖 + 𝑍𝑍 𝑖𝑖2 . Then, from the observed data generated by this model, we can estimate that 𝐸𝐸 [ 𝑌𝑌 | 𝑋𝑋 = 1] ≈ Next we perform intervention do ( 𝑋𝑋 = 1) . Then, the intervened generative model is 𝑍𝑍 𝑖𝑖 ~ 𝑁𝑁 (0,1), 𝑋𝑋 𝑖𝑖 ← , 𝑌𝑌 𝑖𝑖 ← 𝑋𝑋 𝑖𝑖 + 𝑍𝑍 𝑖𝑖2 . Then, we obtain that 𝐸𝐸 [ 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1)] ≈ This clearly demonstrates that seeing and doing are quite different.

Conuterfactual

Causality can also be defined in terms of potential outcomes or counterfactuals in the Neyman-Rubin causal model (Rosenbaum and Rubin 1983). We can read interventional distribution

𝑃𝑃�𝑌𝑌�𝑑𝑑𝑑𝑑 ( 𝑋𝑋 ) � as counterfactual questions: “what would have been the distrution of 𝑌𝑌 had 𝑋𝑋 = 𝑥𝑥 ?” (Lopez-Paz 2016). Intuitively, counterfactuals assume the presence of an alternative world where everything is the same as the factual world, except for the alternative (hypothetical) intervention and its effects. For simplicity, consider a binary treatment 𝑇𝑇 = {0,1} or potential intervention, where 𝑇𝑇 = 1 indicates the treatment (intervention) and 𝑇𝑇 = 0 indicates the control (no intervention). Each individual has two potential outcomes, { 𝑌𝑌 𝑖𝑖1 , 𝑌𝑌 𝑖𝑖0 } , one for each value of the treatment: 𝑖𝑖1 : potential outcome if the individual received a treatment ( 𝑇𝑇 𝑖𝑖 = 1) , 𝑌𝑌 𝑖𝑖0 : potential outcome if the individual received no treatment ( 𝑇𝑇 𝑖𝑖 = 0) . This implies that a potential outcome is the outcome that would be realized if the individual received a specific value of the treatment (intervention). A SNP has two alleles. We can define 𝑇𝑇 = � 𝐴𝐴 𝑎𝑎 . The potential outcome is 1 if the individual is affected, potential outcome is 0 if the individual is normal. Let 𝑋𝑋 be the set of contexts (covariates). The “fundamental problem of causal inference” (Holland 1986) is that we can only observe one of the potential outcomes rather than both of them. The unobserved (missed) potential outcome is called “counterfactual” outcome. Similar to do-calculus, coundetrfactual can be defined as stating that 𝑌𝑌� would change to 𝑌𝑌 if it were 𝑇𝑇 . In other words, we imagine that value 𝑌𝑌 would be taken if we did hypothetical intervention T . Causal effects are defined as differences in counterfactual variables. In other words, it measures difference between what would have happened if we did one thing versus what would have happened if alternatively, we did something else (Lattimore and Ong 2018). A brief overview about counderfactual theory is summarized in Supplementary A. Structural equation model and independence of cause and mechanism (ICM)

The third language of causation which we inrtoduce is structural equation moels (SEMs). SEMs can be used to model causal relationships between some given variables, where each variable is expressed as a function of some other variables (its causes or treatments) as well as ome noise (Nowzohour and Bühlmann 2016, Xiong 2018). The model consists of three essential components: (1) causal structure, (2) the functional dependence among causal and effect variables, and (3) the joint distribution of the noises. We assume that (1) there are no unobserved variables and hence that the noise terms are independent and (2) the difference between the effect variable and some noise term is a deterministic function of the causal variables. In this paper, we focus on bivariable causal discovery. The SEMs for two variables is defined as (Lattimore and Ongv 2018) 𝑋𝑋 = 𝑓𝑓 𝑥𝑥 ( 𝜀𝜀 𝑥𝑥 ), 𝑌𝑌 = 𝑓𝑓 𝑦𝑦 ( 𝑋𝑋 , 𝜀𝜀 𝑌𝑌 ), 𝜀𝜀 𝑥𝑥 ⫫ 𝜀𝜀 𝑌𝑌 , (4) where 𝜀𝜀 𝑥𝑥 and 𝜀𝜀 𝑦𝑦 are noises or exogenous random variables. If the functions 𝑓𝑓 𝑥𝑥 and 𝑓𝑓 𝑦𝑦 are free form, the SEMs are called nonparametric structural equation models. One can The structural equation model (4) encodes the assumption that the outcome 𝑌𝑌 𝑖𝑖 for an individual 𝑖𝑖 is caused by the cause (treatment) 𝑋𝑋 𝑖𝑖 which the individual receives and other factors 𝜀𝜀 𝑦𝑦 that are indepdent of cause 𝑋𝑋 . The SEMs describe the causal effects of performing real-world interventions or experiments on their variables 𝑋𝑋 . Although conditional independences can be used to make causal inference from the data under study, conditional independence cannot be applied to causal analysis under two variaables (Lopez-Paz 2016). Consider observational causal inference for two random variables, 𝑋𝑋 and 𝑌𝑌 . We want to infer whether 𝑋𝑋 → 𝑌𝑌 or 𝑋𝑋 ← 𝑌𝑌 . Unfortunately, the absence of a third random variable prevents us from measuring conditional independences. To overcome this limitation, in the past decade, observational causal inference methods that are not based on conditional independence have been developed.

One of them is the widely used Independence of cause and mechanism (ICM) principle. ndependence of Cause and Mechanism (ICM) assumes that causes and mechanisms are chosen independently by nature is a recently proposed principle for causal reasoning and causal learning (Janzing and Sch¨olkopf 2010; Shajarisales et al. 2015; Peters et al. 2017). ICM assumes that the mechanism that generates effect from its case contains no imformation about the the cause. Assume that 𝑋𝑋 is a cause and 𝑌𝑌 is an effect. The joint distribution 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) can be decomposed into 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) = 𝑃𝑃 ( 𝑋𝑋 ) 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) . (5) The conditional distribution 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) is a mechanism that generates effect 𝑌𝑌 from cause 𝑋𝑋 . The conditional distribution 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) is independent of the distribution of cause 𝑃𝑃 ( 𝑋𝑋 ) . If 𝑋𝑋 causes 𝑌𝑌 then 𝑃𝑃 ( 𝑌𝑌 | 𝑑𝑑𝑑𝑑 ( 𝑋𝑋 )) = 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) . The conditional distribution 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) contains no information about marinal distribution of cause 𝑃𝑃 ( 𝑋𝑋 ) . Therefore, the ICM postulates that the conditional distribution of each variable given its causes contains no information about its cause. SEMs, ICM and counterfactuals developed relatively independently in different fields. However, it is shown that they can be unified under some assumptions using interventional queries with do-calculus (Supplementary B). This allows methods and algorithms developed within one framework to be easily applied to one another and provides foundation for interpretation of ANMs and justification of GWCS. Additive noise models

In the previous section, we showed that the SEMs, the ICM and counterfactual approach to causal inference are equivalent in general. To facilitate the application of causal inference to the real world, we need simpler methods to implement these general approaches to causal inference. In this paper, we propose to use discrete additive noise models (ANMs) that are based on the CM principle, as a tool for GWCS. We assume that there is no confoundeing, no selection bias and no feedback between the cause and effects (Mooij et al. 2016). The methods for causality analysis with confoundeing will be presented else where. Let 𝑌𝑌 be a binary variable to indicate disease status: 𝑌𝑌 = 1 , presence of disease and 𝑌𝑌 = 0 , normal and 𝑋𝑋 be a genotype indicator variable: 𝑋𝑋 = � 𝑑𝑑𝑑𝑑 𝐷𝐷𝑑𝑑 𝐷𝐷𝐷𝐷 , where 𝐷𝐷 is a disease allele. Let 𝑚𝑚 be an integer. Assume that 𝑍𝑍 = 𝑘𝑘𝑚𝑚 + 𝑟𝑟 , 𝑟𝑟 = 0, 1, … , 𝑚𝑚 − . 𝑍𝑍 is called a m-cyclic random variable, if 𝑍𝑍 takes the remainder 𝑟𝑟 as its value. Now we define a discrete ANMs for genetic causation analysis. Let 𝑋𝑋 and 𝑌𝑌 be 3 and 2-cyclic random variables, respectively. An ANM from 𝑋𝑋 to 𝑌𝑌 is defined as (Peters et al. 2011) 𝑌𝑌 = 𝑓𝑓 𝑦𝑦 ( 𝑋𝑋 ) + 𝑁𝑁 𝑦𝑦 , 𝑋𝑋 ⫫ 𝑁𝑁 𝑦𝑦 , (6) where 𝑓𝑓 𝑥𝑥 is an integer function and 𝑁𝑁 𝑦𝑦 is a 2-cyclic noise variable. An ANM is called reversible if there is also an ANM: 𝑋𝑋 = 𝑓𝑓 𝑥𝑥 ( 𝑌𝑌 ) + 𝑁𝑁 𝑥𝑥 , 𝑌𝑌 ⫫ 𝑁𝑁 𝑥𝑥 , (7) where 𝑁𝑁 𝑥𝑥 is a 3-cyclic noise variable. In practice, there may be multipe potential cuasations 𝑋𝑋 , … , 𝑋𝑋 𝑘𝑘 . However, only one causation 𝑥𝑥 is explicitely considered in the model equation (6). Other cuasations 𝑋𝑋 , … , 𝑋𝑋 𝑘𝑘 are unobserved. Their causal effects to 𝑌𝑌 are accounted for by residual. Then, we can show that the following model 𝑌𝑌 = 𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 ) + 𝑁𝑁� 𝑌𝑌 , 𝑋𝑋 ⫫ 𝑁𝑁� 𝑌𝑌 (8) where the effects of 𝑋𝑋 , … , 𝑋𝑋 𝑘𝑘 on 𝑌𝑌 are included in 𝑁𝑁� 𝑌𝑌 still holds if we assume that 𝑋𝑋 ⫫𝑋𝑋 , … , 𝑋𝑋 ⫫ 𝑋𝑋 𝐾𝐾 . Its extension to multiple dependent causations is more complicated and will be presented elsewere. It is well known that the set of joint distributions 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) that allow the ANM in both forward and backward directions is very small. In other words, in general, the direction of the ANMs is identifiable (Peters et l. 2011). Assumptions for identificability of the direction of the ANMs are summarized in Supplementary C. In our cases, 𝑋𝑋 is an indicator variable for genotypes and 𝑌𝑌 is a binary variable for disease status. In Supplementary C, we show that in general, reversible is impossible and hence the direction of the ANMs is identifiable. Numerical algorithms to implement ANMs for genetic causal analysis

To implement the ANMs to identify a causal SNP, we used the numerical algorithm that was presented in the paper (Peters et al. 2011) to test the causal relathinsip between the SNP and disease. The algorithm is summarized as follows (Hu et al. 2018).

Algorithm to implement the discret ANMs for genetic causal analysis:

Assume that qualitative trait data 𝑌𝑌 and indicator variable 𝑋𝑋 for the genotypes of a SNP are available. 1. To infer direction

𝑋𝑋 → 𝑌𝑌 , we regress the trait 𝑌𝑌 on the genotype indicator variable 𝑋𝑋 : 𝑌𝑌 = 𝑓𝑓 ( 𝑋𝑋 ) + 𝑁𝑁 𝑌𝑌 . Calculate the residuals 𝑁𝑁� 𝑌𝑌 = 𝑌𝑌 − 𝑓𝑓̂ ( 𝑋𝑋 ) . 2. To infer potential causal direction

𝑌𝑌 → 𝑋𝑋 , we fit the following nonlinear integer regression to the data: 𝑋𝑋 = 𝑔𝑔 ( 𝑌𝑌 ) + 𝑁𝑁 𝑋𝑋 . Calculate the residuals 𝑁𝑁� 𝑋𝑋 = 𝑋𝑋 − 𝑔𝑔� ( 𝑌𝑌 ) . 3. Test for independence between the residuals and potential causation. If

𝑁𝑁� 𝑌𝑌 and 𝑋𝑋 are independent ( 𝑁𝑁� 𝑌𝑌 ⫫ 𝑋𝑋 ) , and 𝑁𝑁� 𝑋𝑋 and 𝑌𝑌 are not independent, then 𝑋𝑋 causes 𝑌𝑌 ( 𝑋𝑋 → 𝑌𝑌 ) If both

𝑁𝑁� 𝑌𝑌 and 𝑋𝑋 , and 𝑁𝑁� 𝑋𝑋 and 𝑌𝑌 are not independent or if both 𝑁𝑁 𝑌𝑌 � and 𝑋𝑋 , and 𝑁𝑁� 𝑋𝑋 and 𝑌𝑌 are independent, then no causation conclusion can be made. Nonlinear integer regressions to implement the ANMs have two important features. First, in general, we do not have general functional forms for nonlinear integer functions. We usually, investigate all possible mapping (functions) from 𝑋𝑋 to 𝑌𝑌 and evaluate their values of lost function. Second, the ordinary regression usually minimizes the sum of square of errors. However, in the above algorithm, in addition to evaluate the loss function, we still need to test the independence between the the regressor and residuals. Therefore, Peters et al. (2011) suggested using a dependence measure (DM) between regressor and residuals as a lost function. We adopt the discrete regression with dependence measure minimization procedure for genetic causal analysis (Peters et al. 2011). Discrete nonlinear regression with dependence measure minimization for genetic causal analysis Step 1:

Calculate the sampling distribution

𝑃𝑃� ( 𝑋𝑋 , 𝑌𝑌 ) . tep 2: Initialization. 𝑓𝑓 ( ) �𝑥𝑥 𝑖𝑖 � = 𝑎𝑎𝑟𝑟𝑔𝑔𝑚𝑚𝑎𝑎𝑥𝑥 𝑦𝑦 𝑃𝑃��𝑋𝑋 = 𝑥𝑥 𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦� , 𝑡𝑡 = 0 . Step 3: Repeat ;1 += tt Step 4: for ni ,...,1 = do Step 5: 𝑓𝑓 ( 𝑡𝑡 ) �𝑥𝑥 𝑖𝑖 � = 𝑎𝑎𝑟𝑟𝑔𝑔𝑚𝑚𝑖𝑖𝑎𝑎 𝑌𝑌 𝐷𝐷𝐷𝐷 ( 𝑋𝑋 , 𝑌𝑌 − 𝑓𝑓 𝑥𝑥 𝑖𝑖 →𝑦𝑦 ( 𝑡𝑡−1 ) ( 𝑋𝑋 )) end for Step 6: until | �𝑓𝑓 ( 𝑡𝑡 ) − 𝑓𝑓 ( 𝑡𝑡−1 ) � | < 𝜀𝜀 or �𝑁𝑁� 𝑌𝑌 = 𝑌𝑌 − 𝑓𝑓 ( 𝑡𝑡 ) ( 𝑋𝑋 ) � ⫫ 𝑋𝑋 , or 𝑡𝑡 = 𝑇𝑇 , where ε and T are pre-specified. A 𝜒𝜒 test statistic will be used as the dependence measure (DM). Specifically, we formulate a contingence table (Table 1). In the ANM equation (6), 𝑁𝑁 𝑦𝑦 is assumed as a 2-cyclic noise variable. Let 𝑎𝑎 and 𝑎𝑎 be number of individuals with 𝑁𝑁 𝑌𝑌 = 0 and 𝑁𝑁 𝑌𝑌 = 1 , respectively. Let 𝑎𝑎 = 𝑎𝑎 + 𝑎𝑎 . Consider three genotypes: 𝑑𝑑𝑑𝑑 , 𝑑𝑑𝐷𝐷 and 𝐷𝐷𝐷𝐷 . Let 𝑁𝑁 𝑌𝑌 = 0, 𝑎𝑎 , 𝑎𝑎 and 𝑎𝑎 be the number of individuals with genotypes 𝑑𝑑𝑑𝑑 , 𝑑𝑑𝐷𝐷 and 𝐷𝐷𝐷𝐷 , respectively. Let 𝑁𝑁 𝑌𝑌 = 1, and 𝑏𝑏 , 𝑏𝑏 and 𝑏𝑏 be the number of individuals with genotypes 𝑑𝑑𝑑𝑑 , 𝑑𝑑𝐷𝐷 and 𝐷𝐷𝐷𝐷 , respectively. Define the marginal frequencies as shown in Table 1. Then, we obtain

𝐸𝐸�𝑎𝑎 � = 𝑛𝑛 ( 𝑎𝑎 +𝑏𝑏 ) 𝑛𝑛 , 𝑗𝑗 = 1,2,3 , and 𝐸𝐸�𝑏𝑏 � = 𝑛𝑛 ( 𝑎𝑎 +𝑏𝑏 ) 𝑛𝑛 , 𝑗𝑗 = 1,2,3 . Then, the test statistic for testing independence is defined as 𝐷𝐷𝐷𝐷 = ∑ � ( 𝑎𝑎 −𝐸𝐸 [ 𝑎𝑎 ]) 𝐸𝐸 [ 𝑎𝑎 ] + ( 𝑏𝑏 −𝐸𝐸 [ 𝑏𝑏 ]) 𝐸𝐸 [ 𝑏𝑏 ] � . (9) Under the null hypothesis of independence, the test statistic 𝐷𝐷𝐷𝐷 is distributed as a central 𝜒𝜒 ( ) distribution with 2 degrees of freedom . If SNPs involve rare variants, the expected counts of many cells will be small. Fisher’s exact test should be used to test for independence. The statement that there is no causal relationship between the SNP and disease implies that neither causations

𝑋𝑋 → 𝑌𝑌 nor causation

𝑌𝑌 → 𝑋𝑋 holds. Let

𝐷𝐷𝐷𝐷

𝑋𝑋→𝑌𝑌 and

𝐷𝐷𝐷𝐷

𝑌𝑌→𝑋𝑋 be the 𝜒𝜒 statistics for testing causations 𝑋𝑋 → 𝑌𝑌 and causation

𝑌𝑌 → 𝑋𝑋 , respectively. The null hypothesis for testing causal relationships between two random variables 𝑋𝑋 and 𝑌𝑌 is : H no causation between two random variables X and Y . The statistic to test the causal relationsips between two randoom variables X and Y is defined as 𝑇𝑇 𝐶𝐶 = | 𝐷𝐷𝐷𝐷

𝑋𝑋→𝑌𝑌 − 𝐷𝐷𝐷𝐷

𝑌𝑌→𝑋𝑋 | . (10) When C T is large, either 𝐷𝐷𝐷𝐷

𝑋𝑋→𝑌𝑌 > 𝐷𝐷𝐷𝐷

𝑌𝑌→𝑋𝑋 which implies 𝑋𝑋 causes 𝑌𝑌 , or 𝐷𝐷𝐷𝐷

𝑌𝑌→𝑋𝑋 > 𝐷𝐷𝐷𝐷

𝑋𝑋→𝑌𝑌 which implies that 𝑌𝑌 causes 𝑋𝑋 . When ≈ C T , this indicates that no causal decision can be made. Since 𝐷𝐷𝐷𝐷

𝑋𝑋→𝑌𝑌 and

𝐷𝐷𝐷𝐷

𝑌𝑌→𝑋𝑋 may be dependent, a closed, analytic expression for the distribution of 𝑇𝑇 𝐶𝐶 is not yet known (Bausch 2012). Although a computational algorithm to numerically calculate the distribution of 𝑇𝑇 𝐶𝐶 is available, in this paper we will use the permulation test to calculate the P-value of the test 𝑇𝑇 𝐶𝐶 . Distance Correlation as a Causation Measure

In previous sections, we introduce the basis principal for assessing causation YX → that the distribution )( XP of causal X is independent of the causal mechanism or conditional distribution )|( XYP of the effect Y , given causal X . Now the question is how to assess their independence. The Pearson correlation coefficient 𝜌𝜌 ( 𝑋𝑋 , 𝑌𝑌 ) , the widely-used classical meaure of dependence measures linear dependence between two random variables 𝑋𝑋 and 𝑌𝑌 , and in the ivariate normal case, 𝜌𝜌 ( 𝑋𝑋 , 𝑌𝑌 ) = 0 is equivalent to independence between 𝑋𝑋 and 𝑌𝑌 . If the distributions of 𝑋𝑋 and 𝑌𝑌 are not normal, then 𝜌𝜌 ( 𝑋𝑋 , 𝑌𝑌 ) = 0 may not imply independence between 𝑋𝑋 and 𝑌𝑌 . Recently distance correlation that can be applied to all distributions with finite first moments is proposed to measure dependence between random vectors which allows for both linear and nonlinear dependence (Sze´kely et al. 2007, 2009). Distance correlation extends the traditional Pearson correlation in two remarkable directions: (1) Distance correlation extends the Pearson correlation defined between two random variables to the correlation between two sets of variables with arbitrary numbers; (2)

Zero of distance correlation indicates independence of two random vectors. Consider two vectors of random variables: p - dimensional vector X and q - dimensional vector Y . Let 𝑃𝑃 ( 𝑋𝑋 ) and 𝑃𝑃 ( 𝑌𝑌 ) be density functions of the vectors X and Y , respectively. Let 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) be the joint density function of X and Y . There are two ways to define independence between two vectors of variables: i) density function definition and ii) characteristic function definition. In other words, if X and Y are independent then either i) 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) = 𝑃𝑃 ( 𝑋𝑋 ) 𝑃𝑃 ( 𝑌𝑌 ) or ii) 𝑓𝑓 𝑋𝑋 , 𝑌𝑌 ( 𝑡𝑡 , 𝑠𝑠 ) = 𝑓𝑓 𝑋𝑋 ( 𝑡𝑡 ) 𝑓𝑓 𝑌𝑌 ( 𝑠𝑠 ) , where ][),( )(, ysxtiYX TT eEstf + = , ][)( xitX T eEtf = and ][)( yisY T eEsf = are the characteristic functions of ( ), YX , X and Y , respectively. Therefore, we can use both distances || 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) −𝑃𝑃 ( 𝑋𝑋 ) 𝑃𝑃 ( 𝑌𝑌 )|| and ||)()(),(|| , sftfstf YXYX − to measure the dependence between two vectors X and Y . Since characteristic function 𝑓𝑓 is a complex-valued function, its norm is defined as | 𝑓𝑓 | = 𝑓𝑓𝑓𝑓̅ . Definition of The distance covariance 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 , 𝑌𝑌 ) between two random vectors 𝑋𝑋 nd , distance variance 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) , and algorithms for their calculations are briefly introduced in Supplementary D. Square of correlation 𝑅𝑅 ( 𝑋𝑋 , 𝑌𝑌 ) is defined as 𝑅𝑅 ( 𝑋𝑋 , 𝑌𝑌 ) = � 𝑑𝑑𝐶𝐶𝐶𝐶𝐶𝐶 ( 𝑋𝑋 , 𝑌𝑌 ) �𝑑𝑑𝑉𝑉𝑎𝑎𝑉𝑉 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑉𝑉 ( 𝑌𝑌 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑌𝑌 ) > 00 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑌𝑌 ) = 0 . (11) Now we propose to use distance correlation to measure the dependence between the distributions 𝑃𝑃 ( 𝑋𝑋 ) and 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) . Assume that 𝑋𝑋 takes 𝑚𝑚 different values and 𝑌𝑌 takes 𝑚𝑚� different values. Define two vectors 𝑃𝑃 ( 𝑋𝑋 ) = [ 𝑃𝑃 ( 𝑋𝑋 ), … , 𝑃𝑃 ( 𝑋𝑋 𝑚𝑚 )] 𝑇𝑇 and 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) = [ 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ), … , 𝑃𝑃 ( 𝑌𝑌 𝑚𝑚� | 𝑋𝑋 ), … . 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 𝑚𝑚 ), … , 𝑃𝑃 ( 𝑌𝑌 𝑚𝑚� | 𝑋𝑋 𝑚𝑚 )] 𝑇𝑇 . A meaures for causal directions 𝑋𝑋 → 𝑌𝑌 and

𝑌𝑌 → 𝑋𝑋 are defined as 𝐶𝐶 𝑋𝑋→𝑌𝑌 = 1 − 𝑅𝑅 ( 𝑋𝑋 , 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 )) (12) and 𝐶𝐶 𝑌𝑌→𝑋𝑋 = 1 − 𝑅𝑅 ( 𝑌𝑌 , 𝑃𝑃 ( 𝑋𝑋 | 𝑌𝑌 )) , (13) respectively. A measure for quantifying the strength of causal relationships between 𝑋𝑋 (genetic variant) and 𝑌𝑌 (disease phenotype) can be defined by 𝐶𝐶𝑅𝑅 = | 𝐶𝐶 𝑋𝑋→𝑌𝑌 − 𝐶𝐶

𝑌𝑌→𝑋𝑋 | . (14) Using Theorem 3 in the paper (Sze´kely et al. 2009) , we can show that ≤ 𝐶𝐶 𝑋𝑋→𝑌𝑌 ≤ and 𝐶𝐶 𝑋𝑋→𝑌𝑌 = 1 if and only if

𝑋𝑋 → 𝑌𝑌 . Similarly, we can show that ≤ 𝐶𝐶

𝑌𝑌→𝑋𝑋 ≤ and 𝐶𝐶 𝑌𝑌→𝑋𝑋 = 1 if and only if

𝑌𝑌 → 𝑋𝑋 . Consider linear transformations 𝑈𝑈 = 𝑎𝑎 + 𝑏𝑏 𝐷𝐷 𝑋𝑋 and 𝑉𝑉 = 𝑎𝑎 + 𝑏𝑏 𝐷𝐷 𝑌𝑌 where 𝐷𝐷 and 𝐷𝐷 are orthonormal matrices, then we can show that 𝐶𝐶 𝑈𝑈→𝑉𝑉 = 𝐶𝐶 𝑋𝑋→𝑌𝑌 . In other words, linear transformation of the random variables will not change the srtength of causality between two variables 𝑋𝑋 and 𝑌𝑌 . RESULTS Type 1 Error of Statistics for Testing Causation

To examine the validity of statistics 𝑇𝑇 𝐶𝐶 for testing the causal relationships between a common SNP and disease, we performed a series of simulation studies to compare their empirical levels with the nominal ones. We consier two scenarios: (1) no causation in the absence of association and (2) no causation in the presence of association. We selected the top 100 common SNPs (MAF between 0.19 and 0.49) from gene TEKT4P2 on chromosome 21 from 1,000 Genome Project. In scenario (1), a binary trait 𝑌𝑌 is randomly generated and independent of indicator variables 𝑋𝑋 for genotypes of SNPs. In senario (2), we first randomly generated 𝑋𝑋 and 𝑌𝑌 , and then selected the associated pairs of data as our dataset ( 𝑋𝑋 , 𝑌𝑌 ) . We generated the data with 100,000 subjects by resampling from the 99-individual CEU population in 1,000 Genome Project. Number of permutations was 1,000, Number of replication of tests was 1,000. The sampled subjects from the generated population for type 1 error rate calculations were 500, 1,000, 2,000 and 5,000, respectively. We first consider scenatior 1. Table 2 summarized the average type I error rates of the test statistics for testing the causal relationsips between SNP and disease in the absence of association between SNP and disease over all 100 SNPs at the nominal levels 𝛼𝛼 = 0.05 and 𝛼𝛼 = 0.01 , respectively. To ensure no ssociation in the data, we also presented Table 3 that summarized average type 1 error rates of the association test over 100 SNPs. These tables showed that the in the absence of association, type I error rates of the test statistics for testing the causal relationships between SNPs and disease were not appreciably different from the nominal levels. Next we consider scenartio 2. Table 4 presented the average type I error rates of the test statistics for testing the causal relationsips between SNP and disease in the presence of association between SNP and disease over all 100 SNPs at the nominal levels 𝛼𝛼 = 0.05 and 𝛼𝛼 = 0.01 , respectively. Agan, these results demonstrated even in the presence of association the type I error rates of the test statistics for testing the causal relationships between SNPs and disease were not appreciably different from the nominal levels. Power Evaluation

To evaluate the performance of the ANMs for assessing the causal relationships between SNP and disease, simulated data were used to estimate their power to detect a true causation. First, we invesitigate the power as a function of sample sizes with fixed causal measure parameter. The data were generated by the following cyclic model: 𝑌𝑌 = 𝑓𝑓 ( 𝑋𝑋 ) + 𝑁𝑁 𝑌𝑌 , 𝑁𝑁 𝑌𝑌 ⫫ 𝑋𝑋 , (15) where 𝑌𝑌 = {0, 1} 𝑤𝑤𝑎𝑎𝑠𝑠 a binary trait and genearted by the model equation (15), 𝑋𝑋 = {0, 1, 2} was an indicator function for genotype of a SNP selected from 1,000 Genome Project, the minor allele frequency of the SNP was 0.1, 𝑓𝑓 was an integer function: 𝑓𝑓 (0) = 0, 𝑓𝑓 (1) = 0, 𝑓𝑓 (2) = 1 , 𝑁𝑁 𝑌𝑌 = {0, 1} was a noise distributed as a binormial with probability parameter 𝑃𝑃 . We used the model equation (15) to generate the population of 100,000 individuals with 𝑌𝑌 and 𝑋𝑋 . A set of 500, 1,000, 2,000, 5,000, 10,000 and 20,000 individuals were sampled from the population. A otal of 1,000 simulations were repeated for the power calculation. Three factors: the probability parameter 𝑃𝑃 in the bionomial distribution, significance level 𝛼𝛼 and sample sizes affect the power of the ANMs for testing causation. We first fixed the parameter 𝑃𝑃 and significance level 𝛼𝛼 . Figure 2 plotted the power curves as a function of sample sizes where four scenarios: : (1) 𝑃𝑃 =0.2, 𝛼𝛼 = 0.05 ; (2) 𝑃𝑃 = 0.2, 𝛼𝛼 = 0.01 ; (3) 𝑃𝑃 = 0.4, 𝛼𝛼 = 0.05 and (4) 𝑃𝑃 = 0.4, 𝛼𝛼 = 0.01 were considered. We observed from Fiure 2 that for 𝑃𝑃 = 0.2, 𝛼𝛼 = 0.01 , we could reach 81% power even when sample sizes were only 500 and for 𝑃𝑃 = 0.4, 𝛼𝛼 = 0.01 , we still could reach 80% power when sample sizes were 5,000. We then fixed sample sizes 𝑎𝑎 and significance level 𝛼𝛼 . Figures 3 and 4 showed the power curves of the causation test as a function of the parameter 𝑃𝑃 with significance levels 𝛼𝛼 = 0.05 and 𝛼𝛼 = 0.01 , respectively. We observed that when the parameter 𝑃𝑃 increased, the power of the causal tests decreased. Indeed, the parameter 𝑃𝑃 determined the value of the residual 𝑁𝑁 𝑌𝑌 , which in turn, influenced the causality measure. When the parameter 𝑃𝑃 was small, the values of the response variable 𝑌𝑌 were mainly determined by causal 𝑋𝑋 . As the parameter 𝑃𝑃 increased, the impact of the noise 𝑁𝑁 𝑌𝑌 on 𝑌𝑌 increased and hence the causality measure decreased, in turn, the power of the causal tests decreased. Finally, when 𝑃𝑃 = 0.5 , with the equal probability, the noise 𝑁𝑁 𝑌𝑌 produced values 1 and 0, 𝑌𝑌 was mainly determined by noise 𝑁𝑁 𝑌𝑌 , the ANMs had alomost no power to detect causation. Application to Real Data Example GWCS of Schizophenia

To further evaluate its performance, the ANMs for testing causation were applied to the CATIE-MGS-SWD schizophrenia (SCZ) study dataset with 8,421,111 common SNPs typed in 13,557 individuals. In both GWAS and GWCS, the 𝜒𝜒 test was used for association analysis. A Manhattan plot of GWAS and GWCS was shown in Figure 5. For viewins clarity, in the Manhattan plot of GWAS and GWCS, we only showed P-values of causal analysis ( in green color) and association analysis (in black and grey colors) of all SNPs with P-values < 10 −5 . We observed that associated SNPs were quite uniformally distributed across the genome, but the causal SNPs concentrated only on some genome regions. This may indicate that the Causal SNPs contained more information than the associated SNPs. Due to computational time limitation of permutations, a P-value for declaring significant causation was −6 . In total, 245 SNPs in 29 genes showed significant causations with SCZ. The results were summarized in Supplemental Table 1 where the P-values of both causation and association tests were listed. The selected top 15 causal SNPs were listed in Table 5. Among them, 62 causal SNPs can be confirmed from the literature and four of them were on the typical 108 schizophrenia-associated genetic loci (Schezoprenia working group, 2014; Sullivan et al. 2007; Fatemi et al. 2011; Lei et al. 2013; Costas et al. 2013; Athanasiu et al. 2013; Misztak et al. 2018; Ren et al. 2011; Suzuki et al. 2003; Cho et al. 2015; Ide and Lewis 2010). We also conducted GWAS for this dataset. A total of 5,917 SNPs are associated with SCZ at the significance level of −6 and only 58 showed causation. These resuts showed several remarkable features. First, we can observe some SNPs that showed both significant causation and association. For example, four SNPs: rs1324544, rs2829725, rs9931378 and rs12057989 showed both strong causation and association (Table 5). econd, the number of causal SNPs was much smaller than the number of associated SNPs. Third, highly significantly asscociated SNPs may show no significant causation. Fourth, the SNPs that showed strong causation signals may not demonstrate association. For example, SNP rs12739344 in gene AKT3 showed strong causation (P-value < −6 ) , but did not reach threshold P-value for association (P-value for association is −6 ) . It is well kown that the genetic variation in the gene AKT3 is a top risk signal in schizophrenia and network analysis identified that AKT3 contributes to four of the pathways involved in SCZ (Howell et al. 2017). SNP rs10986439 in gene GABBR2 showed significant causation (P-value < −6 ) , but no association with SCZ (P-value is 0.000458). Genetic-imaging analysis showed that gene GABBR2 was in neuron development, synapse organization and axon pathways which could affect cognition in schizophrenia (Luo et al. 2018). Fifth, proportion of SNPs showed both causation and association was small (36.3% of causal SNPs showed association and only 0.98% of associated SNPs schowed causation). Disease Prediction

Genomic predictors and risk estimates for a large number of diseases can be constructed from SNPs. The traditional methods for developing genomic risk scores (GRS) utilize small numbers of SNPs, typically those identified as genome-wide significant association (Abraham and Inouye 2015). To evaluate the predicitive ability of causal SNPs and associated SNPs, we selected the top 245 causal SNPs (all P-values < −6 ) and top 245 associated SNPs for SCZ risk prediction. Logistic regression and 10 fold cross validation were used to calculate prediction accuracy. Table 6 listed ten-fold cross-validated accuracy for prediction of SCZ. Table 6 showed that using the same number of SNPs, all the sets of SNPs selected by causal analysis had higher prediction accuracy than the set of SNPs selected by association analysis. Specifically, the prediction ccuracy of 245 top causal SNPs was about 3% higher than that of the 245 top SNPs selected by association analysis. This may imply that the causal SNPs contain more biological information than associated SNPs. Impact of Linake Disequilibrium

In this section, we investigate the impact of linkage disequilibrium (LD) on the causal analysis. It is well known that linkage disequilibrium has a large impact on the association analysis. The theoretical analysis of the impact of LD on the causal effect is gven in Supplementary E. Next we use simulations to invesitigate the impact of LD on the causation analysis. Data for two markers: rs150012736 and rs376953511 were taken from 1000 Genome Project. In the 1000 Genome Project dataset , LD ( 𝑟𝑟 ) between rs150012736 and rs376953511 was calculated as 0.5. Assume that SNP1 was a causal SNP. We did not make assumptions about whether or not SNP2 was a causal SNP. The trait values was generated by the discrete cyclic ANMs: 𝑌𝑌 = 𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 ) + 𝑁𝑁 𝑌𝑌 , (16) where 𝑓𝑓 𝑌𝑌 is a specified nonlinear integer function and 𝑁𝑁 𝑌𝑌 is a bionomial variable. We fitted the ANMs to the data ( 𝑌𝑌 , 𝑋𝑋 𝑚𝑚 ) where 𝑋𝑋 𝑚𝑚 represented the indicator variable for genotypes of SNP2. The results of causation and association tests were summarized in Tables S3 and S4, and Tables 7 and 8. Tables S3 and S4 showed that we can detect both association and causation between SNP1 and disease with a high power when sample sizes were larger than 2,000. Table 7 showed that type 1 error rates of test to detect causation between SNP2 and disease was not very high and decreased when sample sizes increased. In other words, we did not detect causation at SNP2. However, Table 8 showed that association test detected association of SNP2 with disease with igh power.The simulation results showed that the impact of LD on the causal tests was much smaller than on the association tests. To further evakuate the impact of LD on causation test, real data analysis was conducted. From the results of GWCS of SCZ, we selected SNP rs6578689 that had P-values < −6 and −7 for causation and association tests, respectively. Then, we selected 20 neighboring SNPs of causal SNP rs6578689. We tested their causation and association with SCZ. Table 9 summarized the results of the causation and association tests. These results showed that even neighboring SNPs that had 𝑟𝑟 > 0.44 demonstrated no causation with SCZ, but strong associations with small P-values < −9 with SCZ. These results of real data analysis demonstrated that LD had a small impact on causation analysis, but large impact on association tests. DISCUSSION

Alternative to GWAS, the major goal of this paper is to propose a notion of GWCS and to address several important issues for GWCS. The standard approach to causal discovery is to use interventions or randomized experiments. Many genetic epidemiologists have always thought it impossible to detect causal SNPs using observational data. However, intervension or randomized experiments are unethical, time-consuming, expensive and infeasible in many cases. To address this critical barrier in GWCS, we focus on causal discovery methods developed for causal inference from observational data, not from interventional or randomized experiments and propose to use discrete ANMs as a major tool for GWCS. By large simulations and real data analysis we demonstrate the feasibility and limitations of the proposed GWCS as a new paradigm of genetic analysis. ssociation is to measure dependent relationships and association analysis can be deone from observational data. Causal inference is inductive reasoning (Causal inference in AI, 2019). In other words, causal inference is reasonin from the observed part to the unobserved general. The goal of causal inference is to learn the response of taking an action and is usually carried out from interventions. However, as we pointed out before, it is infeasible to conduct intervention experiments in humans. Modern causal theory attempts to learn the outcome of an intervention from the observed data. Causation that can be inferred from observational data has been debated for more than a century. In this paper, we review great progresses that have been made in causal inferences over the past several decades, and define causation as the effect of taking action in some system from observational data in terms of interventions or counterfactuals (Lattimore and Ong 2018). We also review three emerging major approaches to bivariate causal discovery: “ do ” action, counterfactuals and ICM and showed that these three approaches can be unified. The ANMs that are widely used algorithms to implement ICM are explored for GWCS. In GWCS, we assume that there are no confoundings and selection bias. Methods for causation analysis with confounders will be presented elsewhere. Therefore, we lay down theoretic foundations for GWCS. The original ANMs are used to distinguish cause-effect direction and do not provide P-value calculation for testing the causation of the SNP with disease. To overcome this limitation, we develop a test statistic and use permutations to calculate the P-value of statistics for testing the causation of the SNP with disease. This provides a practical approach to GWCS. An essential problem for performing GWCS in practice is the type 1 error rates, power of the test statistics and feasibility of computations. We showed that type 1 error rates of the ANMs for testing the causation in both presence and absence of association were not significantly deviated rom the nominal level. In other words, large simulation results demonstrated that the ANMs for causation analysis of genetic variants were valid. Power of the ANMs depends on the probability parameter 𝑃𝑃 in the bionomial distribution generating noise 𝑁𝑁 𝑌𝑌 , sample sizes and significance levels. As we discussed in the text, probability parameter 𝑃𝑃 determines the strength of causation. We showed that even for significance lelvel 𝛼𝛼 = 0.01 and 𝑃𝑃 = 0.4 , when sample sizes were 5,000, the power of the ANMs was close to 80%. If the parameter 𝑃𝑃 ≤ , using 500 sample sizes, we could ensure that the ANMs can reach power greater than 90% under both 𝛼𝛼 = 0.05 and 𝛼𝛼 = 0.01 . These results implied that the ANMs had high power to detect causation in many cases. Distuinguishing causation from association is an age-old problem. The most classical causal inference theory focuses on inferring causal relationships among more than three variables. Due to lack of methods for bivariate causal discovery, very few GWCS and very few results of significant causal genetic variants from GWCS have been reported. In the past decade, the rapid development in modern causal analysis theory has provided several efficient methods for biovariate causal discovery including ANMs. To promote application of causal inference to genetic analysis, we applied the ANMs to GWCS of SCZ. From the GWCS of SCZ, we have several important observations. First we observed that the number of causal SNPs (245 SNPs) was much less than the number of associated SNPs (5,917 SNPs). The cusal SNPs were mainly located in Chromosomes 1, 4, 5, 6, 7, 8, 20, 11, 12 and very few causal SNPs were located in other chromosomes. However, the associated SNPs were located across the genome. The results of GWCS of SCZ also challenged the “Omnigenic” model that assumed that “all genes affect every complex trait” (Greenwood 2018) and most association signals that tend to be spread across most of the genome influenced he phenotype variation (Boyle et al. 2017). The most identified association signals may have nothing to do with causing phenotype variation. Second, the proportion of SNPs that showed both causation and association was small (36.3% of causal SNPs showed association and only 0.98% of associated SNPs schowed causation). This implied that the majority of causal SNPs could not be discovered by association analysis and most associated SNPs were not involved in the mechanisms of diseases. The results of GWCS of SCZ strongly suggested that association analysis will miss the majority of the causal SNPs and identifying and validating causal SNPs from the set of associated SNPs will be time consuming and not be efficient. Third, full genomic information and genomic risk prediction has enabled new insights about the etiology and genetic architecture of complex disease. Although, we cannot directly validate the causality of the identified SNPs from GWCS, evaluating the difference in disease risk prediction accuracy between the set of causal SNPs and the set of associated SNPs allows assessing the biological relevance of the causal SNPs and associated SNPs. The prediction accuracy of 245 top causal SNPs was about 3% higher than that of 245 top associated SNPs. This may suggest that the causal SNPs contain more biological information than associated SNPs. Fourth, both simulation and real data analysis showed that the LD had strong impact on association analysis, but surprisingly much less impact on the causal analysis. It is well known that LD is a confounding factor for association analysis and often creats spurious associations. Presence of LD across the genome will limit our abaility of using association analysis to discover mechanism of disease. Due to the limited impact of LD on causal analysis, we may expect that GWCS will provide an alternative to association analysis to discover causal genetic structure of complex diseases. lthough 62 of 245 discovered causal SNPs can be confirmed from the literature and four of them are on the typical 108 schizophrenia-associated genetic loci (Schizophrenia working group, 2014), the results were very preliminary. Functional studies of causal SNPs should be investigated in the future. Causality is not only critical for us to understand disease mechanisms, but also particularly important for the development of efficient treatment. Much of the failure of previous efforts of drug development was attributable to the insufficient understanding of the disease mechanism. The question whether we can infer causal relationships between genetic variants and disease from observational data has been debated for more than a century. Association and correlation analysis are the current paradigm of most genetic studies and have been used for more than a century. Our study demonstrated that large proportions of causal loci cannot be discovered by association analysis. Finding causal SNPs only via searching the set of associated SNPs may not be sufficient for unravelling mechanisms of complex diseases. Causal analysis as an alternative to association analysis for genetic studies has neven been systematically investigated. The main purpose of this paper is to stimulate discussion about causal analysis and association analysis, and both theoretical and practical research in genomic causal analysis. We hope that our results will greatly increase confidence in applying causal inference to genetic analysis, more and more intelligent methods for causal inference will be developed, and more and more valid GWCS of complex diseases will be investigated. DATA ACCESS

Software for implementing the proposed methods for GWCS can be downloaded from https://sph.uth.edu/research/centers/hgc/xiong/software.htm and Github ( https://github.com/jiaorong007?tab=repositories ) .

EFERENCES

Abraham G, Inouye M. 2015. Genomic risk prediction of complex human disease and its clinical application.

Curr Opin Genet Dev.

Nat Methods.

J Psychiatr Res . 44(12):748-53. Barrett J, Lorenz R, Oreshkov O. 2019. Quantum causal models. arXiv:1906.10726. Bausch J. 2012. On the efficient calculation of a linear combination of chi-square random variables with an application in counting string vacua. arXiv:1208.2691. Boyle EA, Li YI, Pritchard JK. 2017. An expanded view of complex traits: from polygenic to omnigenic.

Cell.

Nature.

Psychiatry Investig.

Nat Rev Genet.

18: 271. Costas J, Suárez-Rama JJ, Carrera N, Paz E, Páramo M, Agra S, Brenlla J, Ramos-Ríos R, Arrojo M. 2013. Role of DISC1 interacting proteins in schizophrenia risk from genome-wide analysis of missense SNPs.

Ann Hum Genet.

Schizophr Res.

Journal of the American Statistical Association , 945–960. Howell KR, Floyd K, Law AJ. 2017 . PKBγ/AKT3 loss -of-function causes learning and memory deficits and deregulation of AKT/mTORC2 signaling: relevance for schizophrenia.

PLoS One.

Biol Psychiatry

IEEE Trans. Information Theory,

Med J Aust.

Chest.

The Journal of Infectious Diseases.

Heart Rhythm.

PLoS One.

World Scientific. isztak P, Pańcz yszyn-Trzewik P, Sowa-

Kućma M. 2018 . Histone deacetylases (HDACs) as therapeutic target for depressive disorders.

Pharmacol Rep . 70(2):398-408. Mooij J, Peters J, Janzing D, Zscheischler J, Schölkopf B. 2016. Distinguishing cause from effect using observational data: methods and benchmarks.

Journal of Machine Learning Research , 17(32):1-102. Nowzohour C, Bühlmann P. 2016. Score-based causal learning in additive noise models.

Statistics . 50 (3):471–485. Ongen H, Brown AA, Delaneau O, Panousis NI, Nica AC, GTEx Consortium, Dermitzakis ET. 2017. Estimating the causal tissues for complex traits and diseases.

Nat Genet . 49:1676-1683. Orho-Melander M. 2015. Genetics of coronary heart disease: towards causal mechanisms, novel drug targets and more personalized prevention.

J Intern Med.

Communications of the ACM.

IEEE Trans Pattern Anal Mach Intell . 33:2436-2350. Peters J, Janzing D, Schölkopf B. 2017. Elements of causal inference: foundations and learning algorithms. The MIT Press, Boston. Peters J, Mooij J, Janzing D, Schölkopf B. 2014. Causal discovery with continuous additive noise models.

Journal of Machine Learning Research , 15:2009-2053. Ren RJ1, Wang LL, Fang R, Liu LH, Wang Y, Tang HD, Deng YL, Xu W, Wang G, Chen SD. 2011. The MTHFD1L gene rs11754661 marker is associated with susceptibility to Alzheimer's disease in the Chinese Han population.

J Neurol Sci.

Biometrika . 70(1):41–55. Ross SM. 1985. Introduction to probability models. Third Edition. Academic Press, Inc. London. Schizophrenia Working Group of the Psychiatric Genomics Consortium. 2014. Biological insights from 108 schizophrenia-associated genetic loci.

Nature.

In Proceedings of the 32 nd International Conference on Machine Learning (ICML) , pages 285–294. pirtes P, Glymour C and Scheines R. 2000. Constructing Bayesian networks models of gene expression networks from microarray data.

In Proceddings of the Atlantic Symposium on Computational Biology . Sullivan PF, Keefe RS, Lange LA, Lange EM, Stroup TS, Lieberman J, Maness PF. 2007. NCAM1 and neurocognition in schizophrenia.

Biol Psychiatry . 61(7):902-10. Suzuki T, Iwata N, Kitamura Y, Kitajima T, Yamanouchi Y, Ikeda M, Nishiyama T, Kamatani N, Ozaki N. 2003. Association of a haplotype in the serotonin 5-HT4 receptor gene (HTR4) with Japanese schizophrenia.

Am J Med Genet B Neuropsychiatr Genet.

Ann Stat.

Ann. Appl. Stat.

Cancers (Basel). pii: E349. Zenil H, Kiani NA, Zea AA, Tegnér J. 2019. Causal deconvolution by algorithmic generative models.

Nature Machine Intelligencevolume. 𝑑𝑑𝑑𝑑

Genotype 𝑑𝑑𝐷𝐷

Genotype

𝐷𝐷𝐷𝐷 𝑁𝑁 𝑌𝑌 = 0 𝑎𝑎 𝑎𝑎 𝑎𝑎 𝑎𝑎 𝑁𝑁 𝑌𝑌 = 1 𝑏𝑏 𝑏𝑏 𝑏𝑏 𝑎𝑎 𝑎𝑎 + 𝑏𝑏 𝑎𝑎 + 𝑏𝑏 𝑎𝑎 + 𝑏𝑏 𝑎𝑎 = 𝑎𝑎 + 𝑎𝑎 Table 2. Average type 1 error rates of the statistics for testing causal relationships between SNP and disease. Sample Size Nominal Level 500 1,000 2,000 5,000 0.05 0.044 0.046 0.048 0.051 0.01 0.005 0.006 0.007 0.009 Table 3. Type 1 error rates for association test. Nominal Level 500 1,000 2,000 5,000 0.05 0.05 0.05 0.049 0.049 0.01 0.01 0.01 0.01 0.01

Table 4. Average type 1 error rates of the statistics for testing causal relationships between SNP and disease in the presence of association. Nominal Level 500 1,000 2,000 5,000 0.05 0.042 0.046 0.047 0.046 0.01 0.005 0.007 0.007 0.008 able 5. P-values of top 15 SNPs that had significant causal relationships with schizophrenia. P-values RS Number Chr Position Gene Related Disease Causation Association rs1324544 6 9181479

Table 6.

Ten-fold cross-validated accuracy and AUC for SCZ risk prediction of using top 15 causal SNPs and association SNPs. Number of SNPs 7 8 9 10 11 12 13 14 15 245 Accuracy of Causal SNPs 0.5511 0.5542 0.5542 0.5542 0.5542 0.5540 0.5534 0.5531 0.5521 0.5737 AUC of Causal SNPs

Accuracy of Associated SNPs 0.5470 0.5457 0.5434 0.5423 0.5415 0.5410 0.5404 0.5401 0.5395 0.5430 AUC of Associated SNPs 0.5204 0.5200 0.5191 0.5189 0.5178 0.5173 0.5168 0.5163 0.5158 0.5249 able 7. Type I error rates of causal test between SNP2 and disease. Significance Level 500 1,000 2,000 5,000 0.05 0.183 0.159 0.142 0.104 0.01 0.105 0.118 0.105 0.093

Table 8. Power of test for association between SNP2 and disease. Significance Level 500 1,000 2,000 5,000 0.05 0.918 0.979 0.992 0.994 0.01 0.860 0.957 0.990 0.992

Table 9. P-values for causation and association tests of 20 neighboring SNPs of causal SNP rs6578689. SNPs Chr P-values Neighbor SNPs Position r P-values

Causation Association

Causation Association rs6578689 11

Several possible causal relationships between two observed variables 𝑋𝑋 and 𝑌𝑌 : (a) association; (b) 𝑋𝑋 causes 𝑌𝑌 ; (c) 𝑌𝑌 causes 𝑋𝑋 ; (d) temperature change causes thermometer change. Figure 2.

Power curves of the ANMs for testing causation as a function of sample sizes where power of the tests was calculated under four scenarios: (1) 𝑃𝑃 = 0.2, 𝛼𝛼 = 0.05 ; (2) 𝑃𝑃 = 0.2, 𝛼𝛼 =0.01 ; (3) 𝑃𝑃 = 0.4, 𝛼𝛼 = 0.05 and (4) = 0.4, 𝛼𝛼 = 0.01 . Figure 3.

Power convers of the ANMs for testing causation as a function of the parameter 𝑃𝑃 in bionomial distribution where four sample sizes 500, 1,000, 5,000 and 10,000 were considered, assuming 𝛼𝛼 = 0.05 . Figure 4.

Power curves of the ANMs to test causation as a function of the parameter 𝑃𝑃 in bionomial distribution. Four sample sizes 500, 1,000, 5,000 and 10,000 were considered, assuming 𝛼𝛼 = 0.01 . Figure 5.

A Manhattan plot of GWAS and GWCS. igure 1. Figure 2. igure 3. Figure 4. igure 5. upplementary A Counterfactuals for causal inference

This introduction focuses on how to use counterfactuals to investigate the causal effect. The average causal effect (ACE) (or treatment effect) is defined as ACE = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 ] − 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 ] , (A1) where 𝐸𝐸 [. ] is taken over entire population (Elwert 2013) . Since for each individual we can only observe one of 𝑌𝑌 𝑖𝑖1 and 𝑌𝑌 𝑖𝑖0 , we cannot estimate ACE. Standard statistics to estimate ACE is given by 𝑆𝑆 = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 �𝑇𝑇 𝑖𝑖 = 1] − 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 �𝑇𝑇 𝑖𝑖 = 0] . (A2) where 𝐸𝐸 [. ] is taken over the treatment group and control group, respectively, not over the entire population. If potential outcome is binary, then equations (A1) and (A2) can be rewritten as 𝐴𝐴𝐶𝐶𝐸𝐸 = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1) − 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1) (A3) and 𝑆𝑆 = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1 �𝑇𝑇 𝑖𝑖 = 1) − 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1| 𝑇𝑇 𝑖𝑖 = 0) . (A4) Since quantity 𝑆𝑆 depends on the treatment assignment, it measures the association between the potential outcome with the treatment asigment. Therefore, in general, the ACE is not equal to 𝑆𝑆 . The sufficient conditions to make them equal are Condition I: 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 ] = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 �𝑇𝑇 𝑖𝑖 = 1] = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 | 𝑇𝑇 𝑖𝑖 = 0] (A5) or 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 �𝑇𝑇 𝑖𝑖 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 �𝑇𝑇 𝑖𝑖 = 0), (A6) which implies that the mean potential outcome (or the probability distribution) under treatment for those in the treatment group equals the mean potential outcome (or the probability distribution) under treatment for those in the control group. Condition II: 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 ] = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 �𝑇𝑇 𝑖𝑖 = 1] = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 | 𝑇𝑇 𝑖𝑖 = 0] (A7) or 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1| 𝑇𝑇 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1| 𝑇𝑇 = 0) , (A8) which implies that the mean potential outcome (or the probability distribution) under control for those in the treatment group equals the mean potential outcome (or the probability distribution) under control for those in the control group. Under conditions I and II we obtain 𝑆𝑆 = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 �𝑇𝑇 = 1] − 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 �𝑇𝑇 = 0] = 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖1 ] − 𝐸𝐸 [ 𝑌𝑌 𝑖𝑖0 ] = 𝐴𝐴𝐶𝐶𝐸𝐸 , (A9) or 𝑆𝑆 = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1 �𝑇𝑇 = 1) − 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1 �𝑇𝑇 = 0) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1) − 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1) = 𝐴𝐴𝐶𝐶𝐸𝐸 . (A10) In other words, conditions I and II ensure that association measure 𝑆𝑆 is equal to the average causal effect ACE. Conditions I and II imply and assume that the average potential outcomes of people in the treatment group are equal to that of the outcomes of the people in the control group. In other words, no difference between association measurement and causal measurement can be achieved by randomized treatment assignment. Random experiments are often expensive, unethical and infeasible. Observational data should be used for causal inference. The assumption to ensure that 𝑆𝑆 is an unbiased and consistent estimator of the ACE is the following ignorability: ( 𝑌𝑌 𝑖𝑖1 , 𝑌𝑌 𝑖𝑖0 ) ⫫ 𝑇𝑇 , (A11) i.e., the potential outcomes must be jointly independent of treatment assignment. In the observational studies, the ignorability assumption, in general, is difficult to be satisfied. Therefore, we make further assumptions to extend the ignorability to conditional ignorability: ( 𝑌𝑌 𝑖𝑖1 , 𝑌𝑌 𝑖𝑖0 ) ⫫ 𝑇𝑇 | 𝑋𝑋 , (A12) where 𝑋𝑋 is a set of variables. Conditional ignorability in equation (A12) assumes that the potential outcomes, 𝑌𝑌 𝑖𝑖1 and 𝑌𝑌 𝑖𝑖0 are jointly independent of treatment assignment conditional on groups defined by the value of 𝑋𝑋 . upplementary B Unification of the SEMs, ICM, counterfactuals and do-calculus methods for causal inference In this supplementary, we briefly show that the SEMs, ICM, counterfactuals and do-calculus methods for causal inference with two random variables can be unified. Suppose that both 𝑋𝑋 and 𝑌𝑌 are binary variables. Taking action 𝑋𝑋 = 1 implies that the equation 𝑋𝑋 = 𝑓𝑓 𝑥𝑥 ( 𝜀𝜀 𝑥𝑥 ) should be replaced by 𝑋𝑋 = 1 . Setting 𝑋𝑋 = 1 will not affect the function 𝑓𝑓 𝑌𝑌 and distribution of noise 𝜀𝜀 𝑌𝑌 . The interventional distribution is given by 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � = ∑ 𝑃𝑃 ( 𝜀𝜀 𝑌𝑌 ) 𝐼𝐼 { 𝑓𝑓 𝑌𝑌 (1, 𝜀𝜀 𝑌𝑌 ) = 𝑦𝑦 } 𝜀𝜀 𝑌𝑌 . (B1) Note that the conditional observational distribution of 𝑌𝑌 given 𝑋𝑋 is 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 1) = ∑ ∑ 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 | 𝑋𝑋 = 1) 𝑃𝑃�𝜀𝜀 𝑦𝑦 �𝜀𝜀 𝑥𝑥 �𝐼𝐼 { 𝑓𝑓 𝑦𝑦 � 𝜀𝜀 𝑦𝑦 � = 𝑦𝑦 } 𝜀𝜀 𝑌𝑌 𝜀𝜀 𝑥𝑥 . (B2) By the assumption 𝜀𝜀 𝑥𝑥 ⫫ 𝜀𝜀 𝑌𝑌 , we have 𝑃𝑃�𝜀𝜀 𝑦𝑦 �𝜀𝜀 𝑥𝑥 � = 𝑃𝑃 ( 𝜀𝜀 𝑦𝑦 ) . (B3) Substituting equation (B3) into equation (B2) yields 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 1) = ∑ 𝑃𝑃 ( 𝜀𝜀 𝑦𝑦 ) 𝐼𝐼 { 𝑓𝑓 𝑦𝑦 � 𝜀𝜀 𝑦𝑦 � = 𝑦𝑦 } 𝜀𝜀 𝑦𝑦 . (B4) Combining equations (B1) and (B4), we obtain 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � = 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 1) , (B5) which shows that interventional distribution 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � is equal to the observational distribution 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 1) . imilarly, we can prove 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 0) � = 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 0) . (B6) The causal effect of 𝑋𝑋 on the variable 𝑌𝑌 under the structural equation model is defined as 𝐴𝐴𝐶𝐶𝐸𝐸𝑆𝑆 = 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � − 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 0) � . (B7) Equations (B5) and (B6) show that under the structural equation model (4) the interventional distribution is equal to the observational distribution. Under the ignorability assumption ( 𝑌𝑌 𝑖𝑖1 , 𝑌𝑌 𝑖𝑖0 ) ⫫ 𝑋𝑋 𝑖𝑖 , we obtain 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1 �𝑋𝑋 𝑖𝑖 = 1) = 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 1) , (B8) 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1) = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 �𝑋𝑋 𝑖𝑖 = 1) = 𝑃𝑃 ( 𝑌𝑌 = 1| 𝑋𝑋 = 0) . (B9) Combining equations (B5), (B6), (B7) and (B8), we obtain 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖1 = 1) , (B10) 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 0) � = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖0 = 1) . (B11) It follows from equations (A3), (B7), (B10) and (B11) that 𝐴𝐴𝐶𝐶𝐸𝐸𝑆𝑆 = 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 1) � − 𝑃𝑃�𝑌𝑌 = 1 �𝑑𝑑𝑑𝑑 ( 𝑋𝑋 = 0) � = 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖 = 1 ) − 𝑃𝑃 ( 𝑌𝑌 𝑖𝑖 = 1 ) = 𝐴𝐴𝐶𝐶𝐸𝐸 . (B12))

Equation (B12) shows that the causal effect under the SEMs is equal to the average causal effect under the counterfactual model with ignorability assumption. ext we discuss the equivalence between ICM and SEMs. Take the SEMs (4) as transformations: 𝑋𝑋 = 𝑓𝑓 𝑥𝑥 ( 𝜀𝜀 𝑥𝑥 ) , 𝑌𝑌 = 𝑓𝑓 𝑦𝑦 ( 𝑋𝑋 , 𝜀𝜀 𝑌𝑌 ) . We need to show that ICM of 𝑋𝑋 → 𝑌𝑌 implies 𝜀𝜀 𝑥𝑥 ⫫ 𝜀𝜀 𝑌𝑌 . The Jacobian matrix of the above transformation is 𝐽𝐽 = 𝜕𝜕𝑓𝑓 𝑥𝑥 𝜕𝜕𝜀𝜀 𝑥𝑥 𝜕𝜕𝑓𝑓 𝑦𝑦 𝜕𝜕𝜀𝜀 𝑦𝑦 . Then, by transformation theorem (Ross 1985), we obtain 𝑃𝑃 ( 𝑋𝑋 , 𝑌𝑌 ) = 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 , 𝜀𝜀 𝑦𝑦 )| 𝜕𝜕𝑓𝑓𝑥𝑥𝜕𝜕𝜀𝜀𝑥𝑥𝜕𝜕𝑓𝑓𝑦𝑦𝜕𝜕𝜀𝜀𝑦𝑦 | , (B13) 𝑃𝑃 ( 𝑋𝑋 ) = 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 )| 𝜕𝜕𝑓𝑓𝑥𝑥𝜕𝜕𝜀𝜀𝑥𝑥 | , (B14) which implies that 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) = 𝑃𝑃 ( 𝜀𝜀𝑥𝑥 , 𝜀𝜀𝑦𝑦 )| 𝜕𝜕𝑓𝑓𝑥𝑥𝜕𝜕𝜀𝜀𝑥𝑥𝜕𝜕𝑓𝑓𝑦𝑦𝜕𝜕𝜀𝜀𝑦𝑦 | 𝑃𝑃 ( 𝜀𝜀𝑥𝑥 )| 𝜕𝜕𝑓𝑓𝑥𝑥𝜕𝜕𝜀𝜀𝑥𝑥 | = 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 , 𝜀𝜀 𝑦𝑦 ) 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 )| 𝜕𝜕𝑓𝑓𝑦𝑦𝜕𝜕𝜀𝜀𝑦𝑦 | . (B15) ICM states that distributions 𝑃𝑃 ( 𝑋𝑋 ) and 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) are independent, or 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) contains no information of 𝑃𝑃 ( 𝑋𝑋 ) . Therefore, 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 , 𝜀𝜀 𝑦𝑦 ) must be equal to 𝑃𝑃 ( 𝜀𝜀 𝑥𝑥 ) 𝑃𝑃 ( 𝜀𝜀 𝑦𝑦 ) , i.e., 𝜀𝜀 𝑥𝑥 ⫫ 𝜀𝜀 𝑌𝑌 . Otherwise, from equation (B15), we know that 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) involves both 𝜀𝜀 𝑥𝑥 and 𝜀𝜀 𝑦𝑦 , which implies he distribution 𝑃𝑃 ( 𝑌𝑌 | 𝑋𝑋 ) will contain information of 𝑃𝑃 ( 𝑋𝑋 ) , and ICM will not hold. This shows that ICM of 𝑋𝑋 → 𝑌𝑌 implies the structural equation model (4).

Supplementary C Identifiability of the direction of discrete ANMs

Conditions for identifiability were summarized in Theorem 4 in the paper (Peters et l. 2011) . They assumed that 𝑌𝑌 = 𝑓𝑓 𝑦𝑦 ( 𝑋𝑋 ) + 𝑁𝑁 𝑦𝑦 , 𝑋𝑋 ⫫ 𝑁𝑁 𝑦𝑦 , 𝑃𝑃 ( 𝑋𝑋 = 𝑘𝑘 ) ≠ 𝑃𝑃 ( 𝑁𝑁 𝑌𝑌 = 𝑙𝑙 ) ≠ ∀𝑘𝑘 , 𝑙𝑙 and considered three cases: (1) Both 𝑓𝑓 𝑌𝑌 and 𝑓𝑓 𝑋𝑋 are bijective. If the ANM 𝑋𝑋 → 𝑌𝑌 is reversible, then 𝑋𝑋 and 𝑌𝑌 are uniformly distributed; (2) 𝑓𝑓 𝑋𝑋 is bijective. Suppose that 𝑓𝑓 𝑋𝑋 ( 𝑌𝑌 ) = 𝑓𝑓 𝑋𝑋 ( 𝑌𝑌 ) . If the ANM 𝑋𝑋 → 𝑌𝑌 is reversible, then 𝑃𝑃 ( 𝑁𝑁 𝑌𝑌 =𝑌𝑌 −𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 )) 𝑃𝑃 ( 𝑁𝑁 𝑌𝑌 =𝑌𝑌 −𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 )) = 𝑃𝑃 ( 𝑌𝑌=𝑌𝑌 ) 𝑃𝑃 ( 𝑌𝑌=𝑌𝑌 ) , ∀𝑋𝑋 , in many cases, 𝑃𝑃 ( 𝑌𝑌 = 𝑌𝑌 ) = 𝑃𝑃 ( 𝑌𝑌 = 𝑌𝑌 ) . (3) 𝑓𝑓 𝑌𝑌 is bijective. Suppose that 𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 ) = 𝑓𝑓 𝑌𝑌 ( 𝑋𝑋 ) . If the ANM 𝑋𝑋 → 𝑌𝑌 is reversible, then 𝑃𝑃 ( 𝑋𝑋=𝑋𝑋 ) 𝑃𝑃 ( 𝑋𝑋=𝑋𝑋 ) = 𝑃𝑃 ( 𝑁𝑁 𝑋𝑋 =𝑋𝑋 −𝑓𝑓 𝑋𝑋 ( 𝑌𝑌 )) 𝑃𝑃 ( 𝑁𝑁 𝑋𝑋 =𝑋𝑋 −𝑓𝑓 𝑋𝑋 ( 𝑌𝑌 )) , ∀𝑌𝑌 , in many cases, 𝑃𝑃 ( 𝑋𝑋 = 𝑋𝑋 ) = 𝑃𝑃 ( 𝑋𝑋 = 𝑋𝑋 ) . In our cases, 𝑋𝑋 is an indicator variable for genotypes and 𝑌𝑌 is a binary variable for disease status. Therefore, in general, in any of the three cases, reversible is impossible and hence the direction of the ANMs is identifiable. Supplementary D Distance covariance and correlation between two random vectors

The distance covariance 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 , 𝑌𝑌 ) between two random vectors 𝑋𝑋 and 𝑌𝑌 with finite first moments is defined as 𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 , 𝑌𝑌 ) = || 𝑓𝑓 𝑋𝑋 , 𝑌𝑌 ( 𝑡𝑡 , 𝑠𝑠 ) − 𝑓𝑓 𝑋𝑋 ( 𝑡𝑡 ) 𝑓𝑓 𝑌𝑌 ( 𝑠𝑠 )|| = 𝑝𝑝 𝑐𝑐 𝑞𝑞 ∫ | 𝑓𝑓 𝑋𝑋 , 𝑌𝑌 ( 𝑡𝑡 , 𝑠𝑠 ) −𝑓𝑓 𝑋𝑋 ( 𝑡𝑡 ) 𝑓𝑓 𝑌𝑌 ( 𝑠𝑠 )| | 𝑡𝑡 | 𝑝𝑝1+𝑝𝑝 | 𝑠𝑠 | 𝑞𝑞1+𝑞𝑞 𝑅𝑅 𝑝𝑝+𝑞𝑞 𝑑𝑑𝑡𝑡𝑑𝑑𝑠𝑠 (D1) where 𝑐𝑐 𝑝𝑝 = 𝜋𝜋 Γ ( ) and 𝑐𝑐 𝑞𝑞 = 𝜋𝜋 Γ ( ) . Similarly, distance variance 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) is defined as 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) = 𝑝𝑝2 ∫ | 𝑓𝑓 𝑋𝑋 , 𝑋𝑋 ( 𝑡𝑡 , 𝑠𝑠 ) −𝑓𝑓 𝑋𝑋 ( 𝑡𝑡 ) 𝑓𝑓 𝑋𝑋 ( 𝑠𝑠 )| �𝑡𝑡 | 𝑝𝑝1+𝑝𝑝 �𝑠𝑠 | 𝑝𝑝1+𝑝𝑝 𝑅𝑅 𝑑𝑑𝑡𝑡𝑑𝑑𝑠𝑠 . (D2) The square of distance correlation 𝑅𝑅 ( 𝑋𝑋 , 𝑌𝑌 ) is defined as 𝑅𝑅 ( 𝑋𝑋 , 𝑌𝑌 ) = � 𝑑𝑑𝐶𝐶𝐶𝐶𝐶𝐶 ( 𝑋𝑋 , 𝑌𝑌 ) �𝑑𝑑𝑉𝑉𝑎𝑎𝑉𝑉 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑉𝑉 ( 𝑌𝑌 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑌𝑌 ) > 00 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑋𝑋 ) 𝑑𝑑𝑉𝑉𝑎𝑎𝑟𝑟 ( 𝑌𝑌 ) = 0 . (D3) The distance covariance and correlation can be easily estimated as follows (Sze´kely et al. 2007). Assume that pairs of nkYX Kk ,...,1 ),,( = are sampled. Calculate the Euclidean distances: nlnkYYbXXa qlkklplKkl ,...,1,,...,1,|||| , |||| ==−=−= . Define ∑ = = nl klk ana , ∑ = = nk kll ana , ∑ ∑ = = = nk nl kl ana , ∑ = = nl klk bnb , ∑ = = nk kll bnb and ∑ ∑ = = = nk nl kl bnb . Define two matrices: nnkl AA × = )( and nnkl BB × = )( , where .... aaaaA lkklkl +−−= , ... bbbbB lkklkl +−−= , nlk ,...,1, = . Finally, the sampling distance covariance ),( YXV n , variance )( XV n and correlation ),( YXR n are defined as ∑ ∑ = = = nk nl klkln BAnYXV , (D4) ∑ ∑ = = == nk nl klnn AnXXVXV , ∑ ∑ = = = nk nl kln BYV )( ,  =>= , 0)()(0 0)()(, )()( ),(),(

22 2222 22

YVXV YVXVYVXV YXVYXR nn nnnn nn (D5) respectively.

Supplementary E Theoretical Analysis of the Impact of LD on the Causal Effects

For the convenience of presentation, we first consider the true linear model for a quantitative trait (Xiong 2018): 𝑌𝑌 = 𝜇𝜇 + 𝑋𝑋𝛼𝛼 + 𝑁𝑁 𝑌𝑌 , 𝑋𝑋 ⫫ 𝑁𝑁 𝑌𝑌 , (E1) where 𝑋𝑋 is an inicater variable for the genotype at the true causal locus and distribution of 𝑁𝑁 𝑌𝑌 is not normal. Suppose that 𝑋𝑋 𝑚𝑚 is an indicator variable for the genotype at a marker locus with marker allele frequencies 𝑃𝑃 𝑀𝑀 and 𝑃𝑃 𝑚𝑚 and LD measure 𝐷𝐷 𝑚𝑚 between the marker and true causal loci. Then, we have the following linear regression model for the marker locus: 𝑌𝑌 = 𝜇𝜇 + 𝑋𝑋 𝑚𝑚 𝛼𝛼 𝑚𝑚 + 𝑁𝑁 𝑌𝑌𝑚𝑚 . (E2) Then, we can show (Xiong 2018) that 𝛼𝛼 𝑚𝑚 𝑎𝑎 . 𝑠𝑠 �� 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝛼𝛼 . (E3) Equation (E3) implies that in the presence of LD, the marker locus still shows some association with genetic additive effect 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝛼𝛼 approximately. Now we investigate the impact of LD on causal inference. Substituting equation (E1) into equation (E2), we obtain 𝑁𝑁 𝑌𝑌𝑚𝑚 = 𝑁𝑁 𝑌𝑌 + 𝑋𝑋𝛼𝛼 − 𝑋𝑋 𝑚𝑚 𝛼𝛼 𝑚𝑚 . (E4) Define ∆ = 𝑋𝑋𝛼𝛼 − 𝑋𝑋 𝑚𝑚 𝛼𝛼 𝑚𝑚 ≈ ( 𝑋𝑋 − 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 ) 𝛼𝛼 . (E5) When ∆≠ , distance covariance 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 𝑚𝑚 , 𝑁𝑁 𝑌𝑌𝑚𝑚 ) is equal to ≤ 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 𝑚𝑚 , 𝑁𝑁 𝑌𝑌𝑚𝑚 ) = 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 + 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , 𝑁𝑁 𝑌𝑌 + ∆ ) ≤ 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 , 𝑁𝑁 𝑌𝑌 ) + 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , ∆ ) = 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , ∆ ) , (E6) where 𝑋𝑋 and 𝑁𝑁 𝑌𝑌 are independent by the ICM. 𝑋𝑋 𝑚𝑚 → 𝑌𝑌 must imply that ∆ = 0 (Sze´kely and Rizzo, 2009) or 𝑋𝑋 = 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 . (E7) Equation (E7) indicates that 𝑋𝑋 𝑚𝑚 → 𝑋𝑋 . However, in general, SNPs do not have causal relationships. Therefore, 𝑑𝑑𝐶𝐶𝑑𝑑𝐶𝐶 ( 𝑋𝑋 𝑚𝑚 , 𝑁𝑁 𝑌𝑌𝑚𝑚 ) ≠ and 𝑋𝑋 𝑚𝑚 , 𝑁𝑁 𝑌𝑌𝑚𝑚 are not independent, which implies that 𝑋𝑋 𝑚𝑚 does not cause 𝑌𝑌 . Now we calculate the causal measure. Let 𝐶𝐶 𝑋𝑋→𝑌𝑌 = 1 − 𝑅𝑅 ( 𝑋𝑋 , 𝑁𝑁 𝑌𝑌 ) be the causal measure of the causal SNP 𝑋𝑋 . Then, the causal measure of the marker 𝑋𝑋 𝑚𝑚 is given by 𝐶𝐶 𝑋𝑋 𝑚𝑚 →𝑌𝑌 = 𝐶𝐶 𝑋𝑋→𝑌𝑌 − 𝑅𝑅 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , 𝑋𝑋 − 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 ) . (E8) ≥ 𝑅𝑅 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , 𝑋𝑋 − 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 ) ≥ implies 𝐶𝐶 𝑋𝑋→𝑌𝑌 ≥ 𝐶𝐶 𝑋𝑋 𝑚𝑚 →𝑌𝑌 ≥ . (E9) Causation measure 𝐶𝐶 𝑋𝑋 𝑚𝑚 →𝑌𝑌 depends on the distance correlation between 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 and 𝑋𝑋 − 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 . For qualitative trait, we can use a logistic integer function as a nonlinear function. After some algebraic operations, we have the model: 𝑌𝑌 = 𝑒𝑒 𝑋𝑋𝑋𝑋

𝑋𝑋𝑋𝑋 + 𝑁𝑁 𝑌𝑌 (E10) or 𝑌𝑌 = 𝑓𝑓 ( 𝑋𝑋𝛼𝛼 ) + 𝑁𝑁 𝑌𝑌 , (E11) where 𝑓𝑓 ( 𝑋𝑋𝛼𝛼 ) is a nonlinear function. Equation (E11) can be approximated by 𝑌𝑌 = 𝑓𝑓 (0) + 𝑓𝑓 ′ (0) 𝑋𝑋𝛼𝛼 + 𝑁𝑁 𝑌𝑌 . (E12) Thus, the model (E11) is reduced to model equation (E1). Using the same arguments from the model equation (E1), we can define the causality measure for marker 𝑋𝑋 𝑚𝑚 : 𝐶𝐶 𝑋𝑋 𝑚𝑚 →𝑌𝑌 = 𝐶𝐶 𝑋𝑋→𝑌𝑌 − 𝑅𝑅 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , 𝑋𝑋 − 𝑓𝑓 ′ ( ) 𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 ) . (E13) For the discrete ANMs, we cannot find 𝑓𝑓 ′ (0) , the causal measure for the marker may simply be written as 𝐶𝐶 𝑋𝑋 𝑚𝑚 →𝑌𝑌 = 𝐶𝐶 𝑋𝑋→𝑌𝑌 − 𝑅𝑅 ( 𝑋𝑋 𝑚𝑚 − 𝑋𝑋 , 𝑋𝑋 − 𝛾𝛾𝐷𝐷 𝑚𝑚 𝑃𝑃 𝑀𝑀 𝑃𝑃 𝑚𝑚 𝑋𝑋 𝑚𝑚 ) , (E14) where 𝛾𝛾 is a appropriate constant. able S1. P-values of causation and assication of 214 SNPs showing significant causal relationships with schizophrenia. SNPs Chr Gene Related Disease P-values Causation Association rs1324544 6

Table S3. Power to detect association between SNP1 and Disease. Sample Sizes 500 1000 2000 5000 0.05 0.999 1 1 0.999 0.01 0.992 0.992 0.993 0.992