[PDF] Analysis of genetic differences between psychiatric disorders: Exploring pathways and cell-types/tissues involved and ability to differentiate the disorders by polygenic scores

Abstract

Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment. Here we presented a comprehensive analysis to identify genetic markers differentially associated with various psychiatric disorders/traits based on GWAS summary statistics, covering 18 psychiatric traits/disorders and 26 comparisons. We also conducted comprehensive analysis to unravel the genes, pathways and SNP functional categories involved, and the cell types and tissues implicated. We also assessed how well one could distinguish between psychiatric disorders by polygenic risk scores (PRS). SNP-based heritabilities (h2SNP) were significantly larger than zero for most comparisons. Based on current GWAS data, PRS have mostly modest power to distinguish between psychiatric disorders. For example, we estimated that AUC for distinguishing schizophrenia from major depressive disorder (MDD), bipolar disorder (BPD) from MDD and schizophrenia from BPD were 0.694, 0.602 and 0.618 respectively, while the maximum AUC (based on h2SNP) were 0.763, 0.749 and 0.726 respectively. We also uncovered differences in each pair of studied traits in terms of their differences in genetic correlation with comorbid traits. For example, clinically-defined MDD appeared to more strongly genetically correlated with other psychiatric disorders and heart disease, when compared to non-clinically-defined depression in UK Biobank. Our findings highlight genetic differences between psychiatric disorders and the mechanisms involved. PRS may aid differential diagnosis of selected psychiatric disorders in the future with larger GWAS samples.

Full PDF

1 Analysis of genetic differences between psychiatric disorders: exploring pathways and cell-types/tissues involved and ability to differentiate the disorders by polygenic scores

Shitao RAO *, Liangying YIN *, Yong XIANG, Hon-Cheong SO School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, China CUHK Shenzhen Research Institute, Shenzhen, China Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Shatin, Hong Kong Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong Hong Kong Branch of the Chinese Academy of Sciences (CAS) Center for Excellence in Animal Evolution and Genetics, The Chinese University of Hong Kong, Shatin, Hong Kong *These authors contributed equally to this work ^Correspondence to: Hon-Cheong So , Lo Kwee-Seong Integrated Biomedical Sciences Building, The Chinese University of Hong Kong, Shatin, Hong Kong. Tel: +852 3943 9255; E-mail: [email protected] Abstract

Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment. Here we presented a comprehensive analysis to identify genetic markers differentially associated with various psychiatric disorders/traits based on GWAS summary statistics, covering 18 psychiatric traits/disorders and 26 comparisons. We also conducted comprehensive analysis to unravel the genes, pathways and SNP functional categories involved, and the cell types and tissues implicated. We also assessed how well one could distinguish between psychiatric disorders by polygenic risk scores(PRS). SNP-based heritabilities( h SNP ) were significantly larger than zero for most comparisons. Based on current GWAS data, PRS have mostly modest power to distinguish between psychiatric disorders. For example, we estimated that AUC for distinguishing schizophrenia from major depressive disorder(MDD), bipolar disorder(BPD) from MDD and schizophrenia from BPD were 0.694, 0.602 and 0.618 respectively, while the maximum AUC(based on h SNP ) were 0.763, 0.749 and 0.726 respectively. We also uncovered differences in each pair of studied traits in terms of their differences in genetic correlation with comorbid traits. For example, clinically-defined MDD appeared to more strongly genetically correlated with other psychiatric disorders and heart disease, when compared to non-clinically-defined depression in UK Biobank. Our findings highlight genetic differences between psychiatric disorders and the mechanisms involved. PRS may aid differential diagnosis of selected psychiatric disorders in the future with larger GWAS samples.

Introduction

Psychiatric disorders are common and more than one-third of the population suffer from at least one kind of disorder in their life(1). Psychiatric disorders also rank among the top in terms of total disability-adjusted life years (DALYs)(2) lost. Studies have revealed that psychiatric disorders generally have moderate to high heritability, likely contributed by a large number of genes, and common variation plays a substantial role(3, 4). Recent analyses based on genome-wide association studies (GWAS) have suggested a moderate to high genetic correlation between many psychiatric disorders(5, 6). On the other hand, although displaying strong genetic correlations, these disorders are clinically defined as independent categorical entities as they each have distinguishing clinical symptoms and often require different treatments(7). Identifying differential genetic variations between these disorders may shed light on how the disorders differ biologically and help to guide more personalized treatment in the future. Another potential clinical application is that genetic markers may help differential diagnosis (DDx) of related disorders. For example, a patient who presents with depression for the first episode may actually be having bipolar disorder(BPD). It is often difficult to distinguish the two diagnoses by clinical features alone at the first presentation, but their treatments are very different. If genetic information can help differentiate BPD from unipolar depression, it will enable more appropriate treatments given at an earlier stage of illness. Most genetic studies to date have been focusing on identifying shared loci or genetic overlap between psychiatric disorders (8). An effort to explore genetic architecture differences between BPD and SCZ was made by a recent study(9, 10). They first compared 9,252 BPD cases to 7,129 SCZ cases but did not find any SNPs reaching genome-wide significance(9); however, polygenic risk score (PRS) analysis showed that the score significantly differed between SCZ and BPD patients, indicating that differences between the two disorders have a genetic basis. More recently, they conducted an association analysis with a larger sample size (23,585 SCZ cases and 15,270 BPD cases) and identified two genome-wide significant SNPs(10). However, the above analyses require individual genotyping data, which might be difficult to access due to privacy concerns. In addition, many of the largest GWAS analyses were conducted by meta-analyses and typically only summary statistics were available. Here we presented a comprehensive analysis to identify differential genetic markers covering 18 psychiatric disorders/traits and 26 comparisons, based on GWAS summary statistics. The analytic framework was successfully validated by simulation studies before application. Our results based on GWAS summary data showed almost perfect genetic correlation with those obtained via comparing BPD and SCZ individual genotyping data (10)(r g = 1.054, se = 0.025), suggesting that our approach is reliable and resembles results from individual-level data analysis. Importantly, we also conducted in-depth analysis to reveal the genes and pathways involved, and which cell types and tissues were the most relevant in differentiating the disorders. We also uncovered differences in each pair of disorders in terms of how they are genetically related to different sets of comorbidities. Another novel contribution is that by applying a recently developed methodology (11), we assessed how well we could distinguish two psychiatric disorders (e.g., major depressive disorder[MDD] vs BPD) using PRS from existing GWAS data, as well as the maximum discriminating ability from all GWAS-panel variants. This may be clinically relevant in the future given the lack of biomarkers to aid differential diagnosis (DDx) of psychiatric disorders. Methods GWAS summary statistics (12-24) . We included a total of 10 psychiatric disorders in our analysis, including MDD, post-traumatic stress disorder(PTSD), eating disorder (ED), schizophrenia (SCZ), bipolar disorder (BPD), autistic spectrum disorder (ASD), attention deficit/hyperactivity disorder (ADHD), anxiety disorder, obsessive-compulsive disorder (OCD) and alcohol dependence. Besides, we also included three other depression-related phenotypes to be compared against MDD from PGC. These 3 phenotypes were based on the UKBB sample, including longest period of feeling low/depressed, seen doctor (GP) for nerves, anxiety, tension or depression (to represent self-reported non-specific depression/low mood), and probable recurrent major depression (severe). The latter was derived from several questions based on Smith et al. (25). In addition to the above, we also included ever used cannabis, insomnia, suicide attempts (SA), neuroticism and psychotic experience in our analysis as they are closely related to many psychiatric disorders. For MDD which formed a major part of our comparisons, a recent GWAS meta-analysis was carried out based on 135,458 cases and 344,901 controls(26). Excluding 23andMe data, the released GWAS summary statistics were generated from a sample set of 59,851 cases and 113,154 controls with a higher SNP-based heritability (7.8%, se = 0.5%). The majority of MDD cases (45591 of 59851 cases, ~76.2%) in this sample were defined by clinical assessment or clinical records according to ICD/DSM criteria, although the UKBB sub-sample (14260/59851 cases) included some cases from self-reporting. While an updated study(27) included a larger sample, the majority (excluding 23andMe) was composed of the broad depression phenotype in the UK Biobank dataset (127552 out of 170756 cases); the sample also showed a lower SNP-based heritability (6.0%, se = 0.3%)(27). A recent study showed that genetic studies on depression defined by minimal or ‘broad’ phenotyping may not be specific to MDD itself(28). Such studies might identify non-specific genetic factors linked to other psychiatric conditions; this may defy our purpose of finding differential genetic markers between related disorders/traits. Identification of differential genetic markers

For further quality control, SNPs with a low imputation quality score (INFO R < 0.6) were excluded from further analysis. In addition, indels and duplicated SNPs were filtered. We then performed a harmonization step to keep the reference allele for signed test statistics consistent between each pair of GWAS datasets. Following that, the post–quality-control and harmonized summary statistics were utilized for investigating differential genetic variants for 26 comparisons of psychiatric disorders/traits (Table 3) using a statistical method presented below. We present an analytic approach capable of unravelling the genetic differences between a pair of disorders/traits, relying only on GWAS summary statistics. The method also allows overlap in study samples. In essence, we are ‘mimicking’ a case–control GWAS in which the cases are subjects affected with one disorder and controls affected with the other disorder. Suppose 𝑇 (cid:2869) and 𝑇 (cid:2870) are two binary traits under study. Let 𝑆 be a biallelic SNP, coded as 0,1 or 2. For simplicity, we first assume this is a prospective study of a population-based sample. Based on the principles of logistic regression, we have log (cid:3436)𝑃(𝑇 (cid:2869) = 1)𝑃(𝑇 (cid:2869) = 0)(cid:3440) = log ( 𝑝 (cid:2869) (cid:2869) ) = 𝛽 (cid:2868)(cid:2869) + 𝛽 (cid:2869)(cid:2869) 𝑆 + 𝜀 (cid:2869) log (cid:3436)𝑃(𝑇 (cid:2870) = 1)𝑃(𝑇 (cid:2870) = 0)(cid:3440) = log ( 𝑝 (cid:2870) (cid:2870) ) = 𝛽 (cid:2868)(cid:2870) + 𝛽 (cid:2869)(cid:2870)

𝑆 + 𝜀 (cid:2870) log (cid:3436)𝑃(𝑇 (cid:2869) = 1)𝑃(𝑇 (cid:2870) = 1)(cid:3440) = log ( 𝑝 (cid:2871) (cid:2871) ) = 𝛽 (cid:2868)(cid:2871) + 𝛽 (cid:2869)(cid:2871)

𝑆 + 𝜀 (cid:2871) 𝑝 (cid:2869) = 𝑃(𝑇 (cid:2869) = 1) and 𝑝 (cid:2870) = 𝑃(𝑇 (cid:2870) = 1) denote the probability of the corresponding traits in the collected dataset; 𝜀 (cid:3036) (𝑖 = 1,2,3) indicates the error term for corresponding regression model. Based on the definition of odds ratio (OR), for traits 𝑇 (cid:2869) and 𝑇 (cid:2870) , we have: 𝑂𝑅(𝑇 (cid:2869) 𝑣𝑠 𝑐𝑡𝑟𝑙) = 𝑒 (cid:3081) (cid:3117)(cid:3117) = (cid:2900)(cid:2928) ((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928) ((cid:3021) (cid:3117) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) / (cid:2900)(cid:2928) ((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928) ((cid:3021) (cid:3117) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) 𝑂𝑅(𝑇 (cid:2870) 𝑣𝑠 𝑐𝑡𝑟𝑙) = 𝑒 (cid:3081) (cid:3117)(cid:3118) = (cid:2900)(cid:2928) ((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928) ((cid:3021) (cid:3118) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) / (cid:2900)(cid:2928) ((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928) ((cid:3021) (cid:3118) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) Suppose the controls for the two studies come from the same population. In this regard,

Pr(𝑇 (cid:2869) = 0|𝑆 = 𝑠 + 1, 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠) and

Pr(𝑇 (cid:2869) = 0|𝑆 = 𝑠, 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠) are approximately the same as

Pr(𝑇 (cid:2870) = 0|𝑆 = 𝑠 + 1, 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠) and

Pr(𝑇 (cid:2870) = 0|𝑆 = 𝑠, 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠) respectively. Thus, the odds ratio (OR) for differential association between the two diseases can be given as:

𝑂𝑅(𝑇1 𝑣𝑠 𝑇2) = 𝑒 (cid:3081) (cid:3117)(cid:3119) = (cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) / (cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) ≈( (cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) / (cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3117) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) ) ÷( (cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046)(cid:2878)(cid:2869),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) / (cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2869)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046))(cid:2900)(cid:2928)((cid:3021) (cid:3118) (cid:2880)(cid:2868)|(cid:3020)(cid:2880)(cid:3046),(cid:3030)(cid:3042)(cid:3049)(cid:3028)(cid:3045)(cid:3036)(cid:3028)(cid:3047)(cid:3032)(cid:3046)) ) = 𝑒 (cid:3081) (cid:3117)(cid:3117) (cid:2879)(cid:3081) (cid:3117)(cid:3118) In other words, the effect size of differential association (i.e. trait 1 as case and trait 2 as control) can be derived from the difference of effect sizes of the respective traits. The variance of 𝛽 (cid:2869)(cid:2871) can be expressed as: 𝑉𝑎𝑟(𝛽 (cid:2869)(cid:2871) ) = 𝑉𝑎𝑟(𝛽 (cid:2869)(cid:2869) − 𝛽 (cid:2869)(cid:2870) ) = 𝑉𝑎𝑟(𝛽 (cid:2869)(cid:2869) ) + 𝑉𝑎𝑟(𝛽 (cid:2869)(cid:2870) ) − 2𝐶𝑜𝑣(𝛽 (cid:2869)(cid:2869) , 𝛽 (cid:2869)(cid:2870) ) 𝐶𝑜𝑣(𝛽 (cid:2869)(cid:2869) , 𝛽 (cid:2869)(cid:2870) ) depends on the actual overlap between the samples and the correlation between the 2 phenotypes. It can be derived from multiplying the SEs of the two coefficients with the intercept from cross-trait LD score regression (see equation 6 in (29)). Note that the above derivations only require the regression coefficients (beta), which is the same under a prospective (population-based) or a retrospective design (case-control design where cases may be over- or under-sampled) (30). Two studied traits (neuroticism and longest period of feeling depressed/low) were continuous traits. To be consistent with other comparisons which all involves binary traits/disorders, we considered the summary statistics of a corresponding case-control study in which subjects at top 20% of the outcome are considered as ‘cases’. The method for deriving binary-trait summary statistics was described in another work (31). After computing the differential genetic associations, to further protect against population stratification, we performed genomic control following (32) (genomic inflation factor was based on LD score regression result). To further check the validity of our approach, we also computed genetic correlation of the results from a GWAS of

BPD vs SCZ from our analytic method against those obtained by comparing the two disorders directly using individual genotype data, as reported in ref (10).

Functional annotations of identified differential genetic markers

The differential genetic variants identified were further explored for their biological functions using FUMA (https://fuma.ctglab.nl/)(33). Following the definition by FUMA, independent significant SNPs were defined as those with p <5e-8 and independent from each other at the default r threshold ( r =0.6). For the definition of genomic loci , independent significant SNPs which are correlated with each other at r ≥ 0.1 are assigned to the same risk locus. Independent significant SNPs which lie within 250 kb are also merged into one genomic risk locus. All candidate SNPs in defined risk loci that are in LD (r ≥ depletion (CADD) scores(34), chromatin states(35, 36), ANNOVAR categories(37) and RegulomeDB scores(38). Gene mapping

SNPs were mapped to genes in FUMA using three different strategies including mapping by position, expression quantitative trait loci (eQTL), and chromatin interactions (CI). In brief, the positional method maps variants to genes based on their physical position, while the eQTL strategy maps SNPs to genes with which a significant (FDR<0.05) eQTL association exists. The third strategy (CI) maps SNPs to genes based on three-dimensional (3D) DNA-DNA interaction of the SNP and gene regions.

Genome-wide gene-based association study (GWGAS) and tissue/cell-type enrichment analysis P -values from SNP-based analysis were utilized for GWGAS analysis in MAGMA (39). The program aggregates statistical significance of SNPs within a gene to output a gene-based statistic. Multiple testing was corrected by the false discovery rate (FDR) approach. In our gene-based and other analyses to follow, results with FDR<0.05 were considered significant. The biological functions of GWGAS-significant genes were further investigated via tissue and cell-type expression enrichment analysis using MAGMA (39) and Linkage Disequilibrium SCore regression (LDSC) (32). In tissue enrichment analysis, MAGMA was used to test for enrichment based on over-representation of differentially expressed genes (DEGs) in each of 53 tissues in GTEx. We observed that brain regions were predominantly enriched in the above analysis; hence we focused subsequent analyses on the brain. Next we conducted an enrichment analysis within

13 brain regions using LDSC, based on GTEx data. This is a ‘competitive’ analysis restricted to the brain; the aim was to reveal enrichment within specific brain regions when compared to other regions. Following that, all available single-cell expression datasets from human brain regions [Lateral Geniculate Nucleus (LGN), Middle Temporal Gyrus (MTG), hippocampus, cortex, prefrontal cortex, midbrain and temporal cortex] in FUMA were included for enrichment analysis to explore the specific types of contributing cells/neurons. A 2-step workflow was implemented for the enrichment analysis. The 1 st step was carried out to identify significantly enriched cell types, which were retained for the 2 nd step to determine independent signals within a dataset by stepwise conditional analysis (see also https://fuma.ctglab.nl/tutorial . We also conducted pathway and gene-set enrichment analyses to explore whether these significantly associated genes were significantly enriched in biological predefined pathways or gene ontology (GO) sets based on the ConsensusPathDB database (CPDB, human) (http://consensuspathdb.org/)(40). SNP-based heritability and genetic correlation with related traits

SNP-based heritability ( h snp ) of differential genetic associations was estimated by LDSC and SumHer (41). The former is the most widely used approach for estimating h snp , and was employed as the primary estimation method here. We also performed additional analysis with SumHer, another program for h snp estimation that allows more realistic heritability models. We also conducted ‘partitioned heritability’ analysis to identify which functional categories of genetic variants (e.g coding, promoter, histone marks, enhancers etc.) contribute the most to differentiation of the psychiatric disorders/traits (42). In addition to shedding light on genetic architecture and relative importance of different functional categories, heritability explained is connected to the predictive power of genetic variants (43). In this regard, we also estimated the maximum ‘predictive ability’ (ability to differentiate the disorders in our case) that can be achieved if all variants on the GWAS panel are accounted for. Liability-scaled SNP-based heritability ( h snp ) was calculated using LDSC, with sample and population prevalence as input. We estimated the ‘sample prevalence’ by the effective number of cases(44) in the two datasets . We then followed the methodology described in (43) to compute different predictive indices and graphs. Briefly, we computed the AUC under ROC curve, proportion of cases explained by those at the top k % of predicted risk, variance of predicted risk and the absolute risk at different percentile. The graphs included ROC curve, predictiveness curve and the probability and cumulative density function of predicted risks. The analysis on differentiating ability was performed on comparisons of selected psychiatric disorders (SCZ, BPD, ED, ASD, ADHD, anxiety disorders, PTSD, OCD) and clinical symptoms (psychotic experience) for which differential diagnosis is considered more clinically relevant. Genetic correlations (r g ) between the differential genetic variations and 42 potentially related phenotypes were calculated using LDSC(http://ldsc.broadinstitute.org/centers/)(45). Generally, r g reflects how much the non-shared or unique genetic component of the 1 st disorder is genetically correlated with a specific trait, when compared to the 2 nd disorder in the pair. The rationale of this analysis is that when we consider two disorders as different, it is important to see whether they are associated with different comorbid disorders/traits . This distinction is important to help understand the different prognosis or aetiology of different disorders. For example, SCZ is generally associated with more prominent cognitive deficits than BPD. The above analysis may help to highlight such differences. A set of 42 GWAS summary statistics were obtained from the LD-Hub(45) and grouped into nine categories of traits including neurological diseases, personality traits, sleeping, cognitive, education, brain volume, psychiatric disorders, cardiometabolic traits and aging. Ability of polygenic risk scores (PRS) from existing GWAS data to differentiate disorders

For selected traits for which differential diagnosis (DDx) are more clinically relevant, we performed another analysis to evaluate the ability of polygenic risk scores (PRS) from existing

GWAS data to distinguish psychiatric disorders. The PRS was based on a case-control study of the corresponding disorders (disorder A as ‘case’ and disorder B as ‘control’). Note that unlike above, we are not focusing on the maximum predictive power achievable from all common variants. An empirical Bayes approach has been proposed to recover the underlying effect sizes and could be used to forecast predictive ability of PRS, based on summary statistics alone(11). The method has been verified in simulations and real data applications(11). Eighteen subsets of genetic variants based on a series of P -value thresholds (1 × 10 −5 , −4 , −4 , −3 , −3 , Simulation

To verify the validity of our proposed method in uncovering genetic differences among related disorders, we simulated different sets of genotype-phenotype data assuming 300 SNPs (i.e., N snp = 300; coded as 0, 1 and 2) and two disorders. Since the proposed framework is principally a SNP-based analysis, the number of simulated genotypes will not affect the validity of the simulation. Allele frequency for each simulated SNP was randomly generated from a uniform distribution within [0.05, 0.95]. The number of subjects with each disorder (i.e., ncases ) was set to [10000, 20000, 50000, 100000] with a disease prevalence ( K ) of 10%. Here, ncases denotes the expected number of cases in the whole simulated population cohort. Given the disease prevalence, the whole simulated population cohort ( ntotal ) i.e. 𝑛𝑡𝑜𝑡𝑎𝑙 = (cid:3041)(cid:3030)(cid:3028)(cid:3046)(cid:3032)(cid:3046)(cid:3012) . The total SNP-based heritability ( h snp ) for each trait was set at 0.2 to 0.4, distributed among all SNPs. More specifically, we simulated standard normal variables z i ~ N (0,1), and set mean effect size µ = (cid:3495) (cid:3035) (cid:3118) (cid:3015) (cid:3294)(cid:3289)(cid:3291) . The actual effect size for SNP i was set at β i = µ * z i . The total liability y equals the sum of effects from each SNP plus a residual ( e ), i.e. 𝑦 = ∑ 𝛽 (cid:3036) 𝑥 (cid:3036)(cid:3036) + 𝑒 ; the total variance of y was set to one. Following the liability threshold model, subjects with total liability exceeding a certain threshold [= Φ (cid:2879)(cid:2869) (𝐾) , where K is the disease prevalence] are regarded as having the trait/disease. The non-shared genetic covariance between the two traits was set to 0.1. From the simulated population cohort, we simulated two case-control studies with trait A and B as the outcome respectively. Suppose the number of cases for trait A and B in the simulated population cohorts are respectively 𝑁 (cid:3002) and 𝑁 (cid:3003) , and 𝑁 = 𝑚𝑎𝑥(𝑁 (cid:3002) , 𝑁 (cid:3003) ) . For trait A, we picked 𝑁 (cid:3002) cases and

2𝑁 − 𝑁 (cid:3002) controls from the population. As for trait B, we picked 𝑁 (cid:3003) cases and

2𝑁 − 𝑁 (cid:3003) controls from the population. For comparison, we also simulated a “real” GWAS comparing the two disorders. All cases who are identified as having trait A but not trait B were selected as cases ( 𝑁 (cid:3002)_(cid:3042)(cid:3041)(cid:3039)(cid:3052) ); all cases who are identified as having trait B but not trait A are selected as controls ( 𝑁 (cid:3003)_(cid:3042)(cid:3041)(cid:3039)(cid:3052) ). To demonstrate the validity of our current method, we also simulated case-control samples with different overlap rate ( P ) for both traits. Here, P indicated the ratio of overlapped samples and all picked samples for each case-control study, i.e., 𝑃 = 𝑁 (cid:3030)(cid:3047)(cid:3045)(cid:3039).(cid:3042)(cid:3049)(cid:3032)(cid:3045)(cid:3039)(cid:3028)(cid:3043) . To adjust the overlap rate, we adjust the number of common controls for both traits (as in practice the overlap more often occurs in controls).

Results Please note that all supplementary tables are available at https://drive.google.com/open?id=1qrpDV6GhobffSwOtRsmAkPY_CihIpHuA

Simulation results

Table 1 demonstrates our simulation results (please also refer to Table S0). The correlations between the estimated and actual coefficients for the GWAS analysis are very high with different sample sizes of cases. As shown in Table 1, the correlation and RMSE improved with increased sample size and overlap rate. Since the sample sizes for current GWAS summary data are usually larger than 10,000, our proposed method should be sufficiently good to approximate the coefficients from GWAS summary data of corresponding traits. As expected, power increases with larger case sizes and heritability explained by SNPs. In addition, there was no observed inflated type I error at a p-value threshold of 0.05. Identification of genetic variants differentiating the psychiatric disorders/traits

For the 18 sets of included GWAS summary statistics (Table 2), we applied the proposed methods to identify differential genetic variants for a total of 26 pairs of comparisons (Table 3). In principle, we selected traits which are similar in nature or commonly comorbid for comparison. The SNP-based heritabilities are presented in Table 3.

These comparisons may be divided into five groups, including major depressive disorder (MDD) vs. other psychiatric disorders/traits, MDD vs. depression-related traits, neuroticism vs. psychiatric disorders, psychotic experiences vs. three psychiatric disorders and others (Table 3). Altogether, we identify a total of 11,410 significantly associated differential genetic variants ( P <5e-08) and these variants in each comparison could form up to 1,398 genomic risk loci based on linkage disequilibrium block (Table 3). MDD against psychiatric disorders/outcomes

In this part, we compared MDD with 12 different psychiatric disorders/outcomes, including SCZ, BPD, ED, ASD, ADHD, anxiety disorder, insomnia, alcohol dependence, ever used cannabis, SA, PTSD and OCD (Table 3). Totally 69 genomic risk loci were identified from the 12 pairs of comparisons (Table 3). Please refer to Table S1 to S12 for detailed results.

MDD against SCZ

Among the 12 pairs of comparisons, comparison of MDD and SCZ generate the largest number of genome-wide significant SNPs (2,312 SNPs, Table 3; sub-table 1 in Supplementary Table 1 (Table S1.1)) which belong to 37 genomic risk loci (Table S1.2). Although most of the candidate variants were located in intergenic and intronic regions (81% of variants) (Table S1.3), 65 SNPs were located in exons, including 32 nonsynonymous variants. Heritability enrichment analyses of 53 functional annotation categories indicated that the heritability of SNPs was not only enriched in intronic and conserved regions [Table S1.4; P <1.28E-04], but also in coding regions including transcription start site (TSS) ( P = 1.78E-04). Besides, regulatory categories such as methylation and acetylation marks were found to be enriched among significant variants [ P < 1.57E-04]. The three gene-mapping strategies (positional, eQTL and CI mapping) generated a set of 524 unique genes, 94 of which were implicated by all three methods (Table S1.5). Additionally, GWGAS analysis identified 953 significant genes (Table 3; Table S1.6). Taken together, 64 genes were implicated by all four strategies. Among them, CACNA1C was predicted to have a very high probability of loss of function mutation intolerance (pLI score=1; Table S1.5). Genes differentiating MDD and SCZ were mainly enriched in the cortex, the anterior cingulate cortex (BA24), and the frontal cortex (BA9) regions (Table S1.8;FDR<6.0E-04). Cell-type enrichment analysis suggested strong associations with several kinds of neurons in the cortex and prefrontal cortex (Table S1.9). Moreover, this analysis also identified associations with neurons in the midbrain, hippocampus, and lateral geniculate nucleus(LGN) regions(Table S1.9). Conditional analyses suggested neurons in the cortex, GABAergic neurons in the midbrain, and pyramidal neurons in the hippocampus as independent contributing neurons (after controlling for other cell types) (Table S1.10). In gene-set enrichment analysis(GEA), the 953 GWGAS significant genes were enriched in a number of biological GO sets, including generation of neurons, regulation of nervous system development and central nervous system neuron differentiation [Table S1.11; FDR< 5.88E-03]. Other enriched pathways include neuronal system, alcoholism and brain-derived neurotrophic factor (BDNF) signalling pathway [Table S1.12; FDR< 1.86E-02]. In genetic correlation analysis, SCZ was defined as ‘case’ and MDD as (pseudo-)‘control’. Note that a positive genetic correlation indicates that the ‘case’ disorder is more positively associated with the studied trait genetically than the (pseudo-)‘control’ disorder, and vice versa. For example, we observed inverse genetic correlations(rg) with insomnia, neuroticism, coronary artery disease (CAD) and mean hippocampal volume, among others. This suggested that MDD has stronger positive genetic correlations with the above traits/disorders compared to SCZ. Findings of this type may shed light on different patterns of comorbidities, but may also be clinically informative. For instance, the significant inverse rg with CAD suggested that compared to SCZ patients, MDD patients may be more genetically predisposed to CAD. MDD against BPD, ED, ASD, ADHD, Anxiety disorder, Insomnia, Alcohol dependence and Cannabis use

In these 8 pairs of comparisons, we identified 32 differential genomic loci (Table 3; detailed in Table S2-S9). The comparison between MDD and BPD revealed the largest number of significant genes based on GWGAS (174 genes; Table 3; Table S2.6). Here we just briefly highlight the comparisons of MDD with a few disorders (BPD, ADHD and anxiety disorders) which yielded the largest number of significant genes in GWGAS. In the comparison between MDD and BPD, we found 4 significant risk loci, in which the strongest signal rs17751061 was found to be highly pathogenic likely influencing the function of

SUGP1 (Table S2.2, CADD = 35). Another nonsynonymous variant, rs17420378, was located in exon 8 of

STK4 (risk loci no. 4) with high predicted pathogenicity(Table S2.2, CADD=22.7). Besides,

STK4 was implicated by all four gene-mapping strategies(Table S2.5 and S2.6). GEA revealed 174 GWGAS-significant genes, which were enriched in 17 gene ontology (GO) sets[Table S2.10; P <0.01], such as transitional metal ion binding and pre-mRNA binding. Our pathway enrichment analysis indicated that differential genetic associations between MDD and BPD were involved in neural cell adhesion molecule (NCAM) signalling for neurite outgrowth, amphetamine addiction, serotonergic and glutamatergic synapse [Table S2.11;FDR = 4.88E-02 for all four pathways], among other pathways. The differential variants were enriched in brain regions including the cerebellum, cortex, and frontal cortex (BA9) [Table S2.7; FDR < 4.03E-02]. Enrichment analysis within brain regions suggested that the cortex and frontal cortex were the most enriched compared to others[Table S2.8; FDR<5.31E-04]. The most significantly enriched cell type was GABAergic neurons from LGN. Interestingly, the differential variants (BPD vs MDD) were found to have positive correlations with childhood IQ and a higher level of education [Table S2.12;FDR<3.26E-02], but negative correlations with insomnia and CAD [FDR< 4.92E-03]. In the comparison between MDD and anxiety disorders, five genetic loci were found (Table 3), one of which involved the extended MHC (xMHC) region (46). The top mapped genes involved a set of genes in the xMHC region, and 3 other genes from other chromosomes ( LRFN5, PTPN1, FAM65C ) (Table 4). We note that due to the complex LD structure and high gene density in this region, it may be relatively difficult to identify the true casual gene/variant. GWGAS revealed 106 significant genes(Table S6.6). The most enriched tissues contributing to differential associations included the nucleus accumbens, frontal cortex and cerebellar hemisphere (Table S6.7); the most enriched cell types included GABAergic neurons from hippocampus, midbrain and temporal cortex, among others (Table S6.9). In the comparison of MDD with ADHD, 167 significant differential genetic variants were identified which formed 5 genetic loci (Table S5.1 and S5.2). Four genes were mapped by all 3 gene-mapping strategies, including

KDM4A, SLC6A9, TMEM161B and CDH8 (Table S5.5 and S5.6). Altogether 82 genes were significant in gene-based test, and pathway and GSEA shed light on pathways such as those related to DNA methylation (Table S5.10 and S5.11). The most enriched cell types included GABAergic neurons in the midbrain and prefrontal cortex, as well as dopaminergic neurons in the midbrain (Table S5.9).

MDD against depression-related traits

In this section, we tried to identify differential genetic variants from three pairs of comparisons between MDD and three depression-related phenotypes (probable recurrent severe depression, seen GP for anxiety/depression and longest period of feeling low/depressed). Ten risk loci were identified (Table 3, Table S13 to S15).

MDD against depression defined in UKBB

First we compared MDD (from PGC; majority clinically defined) against probable recurrent depression (severe)[ProbDep]. We identified 4 risk loci (Table 3;Table S13.2), including one in the xMHC region. Gene-based test revealed 110 significant genes. Tissue enrichment analysis highlighted the cerebellar hemisphere, nucleus accumbens and frontal cortex as the most enriched regions. Cell-type enrichment analysis suggested that the significant genes were associated with GABAergic, dopaminergic and other types of neurons in LGN, middle temporal gyrus (MTG), hippocampus, midbrain and cortex regions(Table S13.9). Pathway analysis mainly highlighted those related to DNA methylation and histone modification contributed by histone genes in the xMHC region; other top pathways included Rett syndrome causing genes and axon guidance pathway. GO sets enriched included regulation of long-term neuronal synaptic plasticity, central nervous system neuron development and differentiation(Table S13.11 and S13.12). Genetic correlation analysis showed that MDD-PGC was more positively genetically correlated with most other psychiatric disorders (e.g. SCZ, BPD, ASD, ADHD) as well as CAD when compared with ProbDep(Table S13.13).

We then compared MDD against seen GP for nerves/anxiety/depression(GPDep). The significant variants mapped into 3 loci, 2 of which were also observed in the above analysis (including one in the xMHC region). Gene-based analysis revealed 72 genes; we observed an overlap of 69 genes with the previous analysis with ProbDep, although the latter analysis identified 110 significant genes. Other results of comparison between MDD and GPDep are shown in detail in Tables S14.

MDD against duration of longest period of feeling low/depressed (top quintile as case)

For this comparison, functional annotation of 133 candidate SNPs formed 3 genetic risk loci, among which the

GRIK2 gene was also mapped by the three gene-mapping methods (positional, eQTL, CI mapping; Table S15.5). The gene codes the Glutamate Ionotropic Receptor Kainate Type Subunit 2, suggesting glutamatergic transmission may be one factor with differential associations between susceptibility to depression and severity (as reflected by duration) of illness. Possibly due to limited sample size, tissue and cell-type enrichment analysis did not reveal significant results.

Neuroticism against SCZ/MDD/Anxiety disorder/alcohol dependence

In this part, five sets of GWAS summary statistics were employed which formed four pairs of comparisons (neuroticism against anxiety disorder, SCZ, MDD and alcohol dependence; Table 2). The choice is based on relatively high association of neuroticism with these disorders (47-49). We identified 1,294 genomic risk loci from the four comparisons (Table 3). For space limits, we highlight the results of neuroticism vs MDD only (Table S16). Please refer to Tables S17-S19 for detailed findings of other comparisons. In the comparison of neuroticism against MDD, 20 risk loci were identified(Table S16.1 and S16.2). Functional annotations of 5,573 candidate SNPs in these loci highlighted a number of genes, among which

CRHR1, MAPT, WNT3 and

KANSL1 were mapped by all 3 gene-mapping methods and MAGMA (Table S16.5 and S16.6). They all belong to a risk locus on chr 17 but the exact causal gene(s) may require clarification in further studies. Tissue enrichment analysis observed enriched signals in most brain regions(Table S16.7); within-brain comparison showed that the cortex, frontal cortex, anterior cingulate cortex and nucleus accumbens were the most enriched [Table S16.8; FDR<1.40E-02]. Cell-type enrichment analysis of the GWGAS-significant genes suggested that they were mainly enriched in the LGN region ( P (FDR within one dataset)<4.40E-02; Table S18.9). GO set enrichment analysis revealed that axon extension, CNS neuron differentiation and regulation of neuron death may be involved (Table S16.10). Psychotic experiences against SCZ/BPD/MDD

Here we identified 10 and 2 genomic risk loci from comparison of psychotic experiences against SCZ and BPD respectively, but not from psychotic experiences against MDD (Table 3; Table S20-S22).

Psychotic experiences against SCZ

In this comparison, functional annotation of 1,749 candidate SNPs revealed 10 genomic risk loci, covering 82 genes (Table S20.2). Altogether 68 genes were mapped by all three gene-mapping strategies, over half of which (35/68) was also indicated by GWGAS analysis (Table S20.5 and S20.6). The most implicated brain regions were the cortex, frontal cortex(BA9) and anterior cingulate cortex (Table S20.8). Cell-type enrichment analysis also observed significant signals in cortex and prefrontal cortex as the top two findings (Table S20.9). Enriched pathways or gene-sets included anterograde trans-synaptic signalling, synaptic vesicle exocytosis/ localization and synaptic adhesion-like molecules (Table S20.11 and S20.12).

Psychotic experiences and BPD

Functional annotation analysis highlighted several genes (

SUGP1, GATAD2A and CILP2 ) harbouring SNPs with very high CADD scores (CADD score> 13.31, Table S21.2). The 3 gene-mapping strategies mapped variants to 10 genes(Table S21.5), all of which were also identified by GWGAS(Table S21.6). Tissue enrichment analysis suggested enrichment in the cortex, cerebellar hemisphere, frontal cortex, anterior cingulate cortex and cerebellum regions (FDR<2.99E-02; Table S21.7 and S21.8). Further analysis highlighted cocaine and amphetamine addiction, PTEN and EGF signalling as top pathways (Table S21.11). Psychotic experiences and MDD

Although the comparison did not generate any significant differential variants, a few genes were highlighted via functional annotations of candidate SNPs (Table S22.2). All of the three gene-mapping strategies linked variants to 9 protein-coding genes (

TMEFF2, SLC30A9, BEND4, PRLR, EPM2A, FBXO30, SHPRH, GRM1, ANK3 ; Table S22.5). Genetic correlation analysis suggested that MDD showed stronger rg with SCZ or BPD compared to psychotic experiences (Table S22.10).

Other pairs of comparisons

We also applied the proposed methods to the other four clinically relevant comparisons, including SCZ against BPD, ADHD against ASD, alcohol dependence against ever used cannabis and anxiety disorder against SA (Table 3). We identified 3, 7, 2 and 1 genomic risk loci from each of the comparison respectively (Table 3). Here we highlight the results from SCZ vs BPD and ADHD vs. ASD as examples (please refer to Tables S23-S26 for details).

SCZ vs BPD

Our analytic results based on GWAS summary data showed almost perfect genetic correlation with those obtained by comparing BPD and SCZ using individual genotype data (10) (r g =1.054, se=0.025). As for the actual results, we observed three significant loci in the comparison of SCZ and BPD (Table 3). GWGAS highlighted 144 significant genes which were enriched for brain regions [Table S23.7; FDR<1.87E-02]. The frontal cortex and anterior cingulate cortex were the most enriched regions compared to other brain regions [Table S23.8; FDR< 2.55E-02]. Furthermore, cell-type enrichment analysis identified an enrichment signal in three different types of neurons in the midbrain region [Table S23.9; P <4.01E-03], which could withstand multiple testing correction within the corresponding dataset(FDR=3.34E-02). The enriched pathways included the inositol metabolism pathway and those related to cellular senescence [Table S23.11; FDR = 2.51E-02]. ADHD vs ASD

In the comparison between ADHD and ASD, 7 risk loci were found (Table S24.2). Seven genes (

KDM4A, ERI1, SOX7, PINX1, XKR6, MTMR9 and SEMA6D ; Table S24.5 and S24.6) were highlighted by all three gene-mapping methods and GWGAS (Table S24.5 and S24.6). We observed that ADHD may have stronger positive rg with CAD and insomnia compared to ASD; however, the reverse was observed for years of education, parental age at death and intra-cranial volume (Table S24.12).

Ability to distinguish between disorders based on PRS

Ability to distinguish between disorders based on PRS derived from current GWAS data

The results are presented in Table 5. In the analysis, we assume each subject is either having one of the disorders. Taking SCZ vs MDD as an example, we assume the differential diagnosis(ddx) has been narrowed down to either SCZ or MDD. The prior probabilities (without genetic information) of being affected with either disorder are based on lifetime prevalence of the two disorders (14, 50). For example, here we assume that a person has ~13/0.5=26 times of being affected by MDD than SCZ (i.e. RR~26), in the absence of additional information. Our analytic framework actually allows more flexible setting of these prior probabilities, although we made simpler assumptions here. We expect that with the addition of polygenic scores, one would be able to differentiate the disorders more accurately. A good prediction model leads to more spread-out predicted risks and larger relative risks when we compare subjects at the top and bottom percentiles. Subjects at the lowest 5 th percentile of the PRS distribution (SCZ as ‘case’ and MDD as ‘control’) have a markedly lower risk of SCZ than MDD compared to the population average. In this case, RR of MDD vs SCZ was 125.3 for a person with PRS at the bottom 5 th percentile (average RR=26) (Table S1.15). With an increase in PRS, the risk of SCZ became higher, while the risk of MDD reduced. Subjects at the highest 5% of the risk score (of SCZ vs MDD) had a substantially decreased RR of 10.16. Note that at the start we assume ~26 times higher risks of MDD than SCZ based on overall lifetime risks; a reduction of this ratio to 10.16 times is a relatively large change. We also present the RR of the ‘case’ disorder by comparing individuals at the highest and lowest x th percentiles. For example, the estimated RR of SCZ was 11.31 if we compare those at the highest 5 th against those at the lowest 5 th percentile. For SCZ vs BPD and BPD vs MDD, the corresponding RR (for the 1 st disorder) was 3.29 and 2.82 respectively. For most other comparisons, the AUC based on PRS of existing GWAS data were modest, and so were the corresponding RR comparing those at high and low percentiles. However, two other pairs of traits/disorders (SCZ vs psychotic experience and ADHD vs ASD) were estimated to attain an AUC>0.6.

Maximum achievable AUC based on SNP-based heritability

The maximum AUC that can be attained (at SNP-based heritability) is presented in Table 5. The levels were much higher than the current AUC, indicating room for improving discriminating ability by increasing sample sizes. Besides, we also computed other predictive indices and graphs which are shown in the supplementary tables.

Discussion

The present study applied a simple yet useful analytic framework to identify differential genetic markers for a board range of psychiatric disorders/traits. We conducted detailed secondary analysis to identify the genes, pathways and cell-types/tissues implicated. From the 26 pairs of comparisons, we identified a total of 11,410 significantly associated differential variants, 1,398 genomic risk loci and 3,362 significant genes from GWGAS with FDR<0.05.

SNP-based heritability ( h ) of differential genetic associations Interestingly, we found that the SNP-based heritabilities were significantly different from zero for almost all comparisons, with some having moderately high heritabilities. This suggests that genetic differences (due to common variants) may at least partially underlie the differences between psychiatric disorders, even for closely related ones such as MDD and anxiety disorders, or SCZ and BPD. For MDD and comparisons with other disorders, we observed the highest h in the comparison with SCZ, BPD, ED and anxiety disorders. For instance, despite substantial symptom overlap (51) between MDD and anxiety disorders, the h is among the highest at ~36% (by SumHer; liability scale). On the other hand, h was estimated at ~1% only when comparing MDD to PTSD. A possible explanation is that environmental factors (e.g. traumatic stressors must be present for PTSD but not MDD) may play an important role in explaining the differences between the latter two disorders. For neuroticism against other psychiatric disorders, the h were in general low; however, h for neuroticism itself was only ~10%, based on one of the largest meta-analysis to date (52) . We wish to highlight a difference between genetic correlation (rg) (between two traits) and the h from the differential association test. Note that two variables can have a high correlation if there is a strong linear relationship, but the actual values of the variables can differ. It is possible that two traits have a relatively high genetic correlation (rg), but as the effect sizes of SNPs can differ, h can still be substantial. Nevertheless, there are several caveats when interpreting h First, large sample sizes are often required for SNP-based heritability analysis. However, for several disorders the samples sizes were relatively moderate (e.g. OCD and ED), as a result the estimates could be imprecise. Also, contribution of rare variants and other ‘omic’ changes such as epigenetic changes were not captured by h Moreover, estimation of h is subject to model assumptions(41) of genetic architecture, which can vary among diseases. Ability to distinguish between psychiatric disorders based on polygenic risk scores (PRS) from common variants

A potential translational aspect is to make use of PRS from SNPs to distinguish between psychiatric disorders. This is particularly relevant in psychiatry due to the lack of objective biomarkers. In a recent study, Liebers et al. studied if PRS may discriminate BPD from MDD . They found that subjects at the top decile of BPD PRS were significantly more likely to have BPD than MDD, when compared to those in the lowest decile. The estimated odds ratio was 3.39 (95% CI 2.19–5.25), which is comparable to our relative risk estimate (2.24) based on our analytic approach using GWAS summary statistics (see Table 5) (relative risk are usually smaller than ORs). Among the comparisons, differential diagnosis between MDD and BPD is perhaps one of the most clinically relevant, as it is sometimes difficult to distinguish the two from clinical symptoms alone and the treatment is different. Based on present GWAS data, the AUC for discriminating BPD vs MDD is 0.602 (at the best p-value threshold), which is modest. However, PRS may be more informative for individuals at the extreme end of the score. The discriminating power between BPD and SCZ was similarly modest (best AUC=0.618) but the AUC for SCZ vs MDD was much higher (0.694). Clinically, major depression (mainly psychotic depression) may be a DDx for first-episode psychosis(53); it may be interesting to study if PRS can help distinguish SCZ from MDD in such patients. We also estimated the maximum discriminatory ability by PRS based on h (i.e. assuming all common variants are found); the maximum estimated AUC for MDD vs BPD, SCZ vs BPD and SCZ vs MDD were 0.749, 0.726 and 0.763 respectively. These findings suggest that with larger GWAS sample sizes, PRS may become more informative and may help DDx of related disorders. Another interesting analysis is on how well PRS can differentiate ‘psychotic experience’(PE) against psychiatric disorders such as SCZ, BPD and MDD, which may all present with psychotic symptoms. Interestingly, the h of PE vs SCZ or BPD were both high (0.356 and 0.398 respectively) and the maximum AUC based on this SNP-based heritability were over 0.8. However, based on present GWAS data, the corresponding discriminatory power was weaker (all AUC<0.7), suggesting there is potential for PRS to help distinguish PE from psychiatric disorders but larger studies were required. Nevertheless, individuals with more extreme PRS show more substantial differences in disease risks; for example, in the comparison between SCZ vs PE, the risk of SCZ for subjects at the 95 th percentile of PRS was 9.19 times of those at the 5 th percentile. The corresponding RR was 5.62 when comparing the 90 th and 10 th percentiles. Several limitations are worth noting. For some pairs of comparisons (e.g. MDD vs other disorders), comorbidities are possible; whether it is clinically helpful to use PRS to guide DDx will depend on the clinical context. In practice, we expect clinical symptoms and features still remain very important in making DDx. Genomic data, for example in the form of PRS, may provide additional discriminatory power when integrated with clinical features. Also, since we relied on summary statistics, we applied an analytic approach (11) to estimate the AUC from current GWAS samples. Limitations of this methodology was detailed in (11). Mainly, we assume the predictive model will be applied to the same population as the training data. Nevertheless, as patients with the same psychiatric disorder can be heterogeneous, and PRS may need to be applied across different ethnic groups, the estimated AUCS may be optimistic in this regard. Ideally, predictive power should also be further evaluated in an independent set with individual genotype data. In addition, our analytic approach for forecasting AUC assumed a (standard) p-value thresholding and LD-clumping (P+T) approach to be employed. While this approach is widely adopted, newer PRS modelling methodologies (e.g. LDpred; see (54) for a review) may be used to further improve predictive power. Comparison of MDD-PGC with depression-related traits in UKBB

We performed another interesting comparison between MDD-PGC and other depression-related traits from UKBB. The former group was mainly composed of clinically diagnosed MDD, while the latter group was largely defined by self-reporting. For example, for recurrent severe probable depression (ProbDep), it included subjects who reported ever feeling depressed or disinterested for one week, with >=2 episodes lasting for >=2 weeks, and have visited a psychiatrist for mood problems. The other phenotype studied was having seen GP for depression/nerves/anxiety. Neither of the two traits involved assessment of clinical symptoms as described in DSM/ICD. Based on our analysis, the (non-overlapping genetic component of) MDD-PGC appeared to be more strongly genetically correlated with other psychiatric disorders (e.g. SCZ, BPD, anorexia, ASD, ADHD) and other outcomes such as CAD, when compared with non-clinically-defined depression in UKBB. It is worth noting that while rg between MDD-PGC and UKBB depression traits were very high, the SNP-based heritability from the differential genetic analysis was significantly higher than zero. One possible explanation is that while many susceptibility genes may be shared between the traits, the effect sizes may differ. Another point to note is that rg based on LDSC may be over-estimated in case-control studies due to difficulties in handling covariates(55). The latter was also reported in (56) when comparing LDSC against a more sophisticated method PCGC (55). A recent work by Cai et al.(56) suggested that depression traits defined by ‘minimal phenotyping’ (ProbDep and GPDep included here also belonged to ‘minimal phenotyping’ by Cai et al.) are genetically different from strictly defined MDD. For example, they have lower h and have worse predictive power in MDD cohorts. Cai et al. focused on comparisons of different definitions of ‘depression’ within UKBB, while here we mainly compared the genetic architecture of MDD-PGC against traits in UKBB; we also employed a different statistical approach. Our results supported differences in genetic basis between different definitions of depression, and calls for more in-depth phenotyping to study depression and related traits. Tissue/cell-type enrichment analysis

In view of the large number of comparisons performed, we just highlighted a few results for discussion. The tissue/cell-type enrichment analysis implied that the frontal cortex (BA9) and anterior cingulate cortex (ACC; BA24) may be implicated in the difference between several disorders, such as MDD against SCZ/BPD, neuroticism against MDD/alcohol dependence, and psychotic experiences against SCZ/BPD. BA9 contributes to the dorsolateral and medial prefrontal cortex, dysfunction of which underlies many cognitive and behavioural disturbances that are associated with psychiatric disorders, such as SCZ, MDD, ADHD, and ASD (57, 58). The ACC is involved in many functional roles of the brain, including affective, cognitive and motor aspects (59). A number of studies have suggested that functioning alterations in the ACC may be implicated in psychiatric disorders such as MDD(60), BPD(61) and SCZ(62). It is possible that different patterns of dysfunctioning in these brain regions may underlie the differences between disorders. Cell-type enrichment analysis may also help to pinpoint the cell-types (and brain regions) involved in differentiating the disorders. For example, when comparing MDD vs anxiety disorders, the most enriched cell types were GABAergic neurons from hippocampus, midbrain and temporal cortex. Interestingly, benzodiazepines, one of the most widely prescribed drugs for anxiety, acts on the GABAergic pathway, while antidepressants primarily target the monoamine system. With increasing amount of single-cell RNA-seq data in the future, cell-type enrichment analysis may delineate more precisely the specific type of neurons.

Genetic correlation analysis

We just briefly highlight a few examples of our findings. For example, in the comparison of BPD vs MDD, we observed positive genetic correlation (rg) with childhood intelligence and level of education (Table S2), suggesting that BPD is more strongly genetically linked to these traits. Concordant with this finding, it was reported that low IQ was associated with severe depression and SCZ but not BPD (63). Slightly unexpectedly, we also observed positive rg with anorexia nervosa (AN), although AN is more commonly comorbid with depression clinically. Similarly, when comparing SCZ vs MDD, positive rg was also observed. One hypothesis is that the high comorbidity of AN with MDD may be partially attributable to environmental factors (64).

Genetic correlation analysis may also shed light on the brain regions implicated in different disorders. For the comparison of SCZ vs MDD, we observed a significant negative correlation with hippocampal volume, which was corroborated by associations in the hippocampus region in cell-type enrichment analysis. Both SCZ and MDD patients have been reported to have smaller hippocampal volumes compared to healthy controls (65, 66). However, a comparative study showed that there was a larger reduction in hippocampal volumes in SCZ patients compared to those with MDD (67). Our genetic correlation analysis seemed to support this finding. Many limitations of this study have been detailed above. As for other limitations, we note that the methodology assumes the controls of both GWAS datasets originate from a similar population. If the heterogeneity is high (e.g. from different ethnic groups), the estimates may be biased. As a related limitation, most of the studies were based on European samples; the effects of the genetic loci identified may differ across populations, and PRS derived from European samples may have poorer predictive abilities in other ethnicities. Conclusions

In summary, our analytic framework successfully identified a number of differential genomic risk loci from 26 pairs of comparisons of psychiatric traits/disorders. Moreover, further analysis revealed many novel genes, pathways, brain regions and specific cell types implicated in the differences between disorders. We also showed that PRS may aid differentiation of some psychiatric disorders(e.g. MDD vs BPD) to a certain extent, but further clinical studies are required to confirm the findings. With the increasing size of GWAS samples, genetic information may provide useful information for more accurate diagnosis and personalized treatment of psychiatric disorders.

Supplementary materials Please note that all supplementary tables are available at https://drive.google.com/open?id=1qrpDV6GhobffSwOtRsmAkPY_CihIpHuA

Author contributions

Conceived and designed the study: HCS. Supervised the study: HCS. Methodology: HCS (lead), LY. Data analysis: SR (lead), LY, with input from YX. Simulation experiments: LY. Data interpretation: SR, LY, HCS. Writing of manuscript: HCS, SR, with input from LY.

Acknowledgements

We would like to thank Prof. Stephen Tsui and the Hong Kong Bioinformatics Center for computing support. This study was partially supported by the Lo Kwee Seong Biomedical Research Fund and a Chinese University of Hong Kong Direct Grant. We thank Mr. Carlos Chau for assistance in part of the analyses.

Conflicts of interest

The author declares no conflict of interest. Figure 1 legends Summary of our analysis framework. GWAS summary statistics of the two traits under study were harmonized and differential genetic associations were identified by the method we described in main text. The power of polygenic scores (derived from the above association test treating the first disorder as ‘case’ and the second one as ‘control’) to differentiate the two disorders was computed. We computed two sets of discriminatory power estimates, one based on existing

GWAS data, the other based on SNP-based heritability, reflecting the maximum achievable discriminatory power. We also investigated the genetic correlation (mainly using LD score regression [LDSC]) with other possible comorbid traits/disorders. We also performed genome-wide gene-based association study (GWGAS) to identify associated genes, and the most relevant tissues, cell-types and pathways implicated. As a parallel analysis, we performed functional annotations and mapped the SNPs to relevant genes based on gene positions, expression quantitative trait loci (eQTL) and chromatin interaction (CI) data. References

1. WHO_International_Consortium (2000): Cross-national comparisons of the prevalences and correlates of mental disorders. .

Bulletin of the World Health Organization . 78:413-426. 2. Ormel J, Petukhova M, Chatterji S, Aguilar-Gaxiola S, Alonso J, Angermeyer MC, et al. (2008): Disability and treatment of specific mental and physical disorders across the world.

The British journal of psychiatry : the journal of mental science . 192:368-375. 3. Insel TR (2009): Disruptive insights in psychiatry: transforming a clinical discipline.

The Journal of clinical investigation . 119:700-705. 4. Smoller JW, Andreassen OA, Edenberg HJ, Faraone SV, Glatt SJ, Kendler KS (2019): Psychiatric genetics and the structure of psychopathology.

Molecular psychiatry . 24:409-420. 5. Cross-Disorder-Group-of-the-PGC, Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, et al. (2013): Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

Nature Genetics . 45:984. 6. Lee PH, Anttila V, Won H, Feng Y-CA, Rosenthal J, Zhu Z, et al. (2019): Genome wide meta-analysis identifies genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. bioRxiv .528117. 7. Forbes MK, Tackett JL, Markon KE, Krueger RF (2016): Beyond comorbidity: Toward a dimensional and hierarchical approach to understanding psychopathology across the life span.

Dev Psychopathol . 28:971-986. 8. Cross-Disorder-Group-of-the-PGC (2013): Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.

The Lancet . 381:1371-1379. 9. Ruderfer DM, Fanous AH, Ripke S, McQuillin A, Amdur RL, Gejman PV, et al. (2014): Polygenic dissection of diagnosis and clinical dimensions of bipolar disorder and schizophrenia.

Molecular psychiatry . 19:1017-1024. 10.BPD&SCZ-Working-Group-of-the-PGC (2018): Genomic Dissection of Bipolar Disorder and Schizophrenia, Including 28 Subphenotypes.

Cell . 173:1705-1715 e1716. 11.So HC, Sham PC (2017): Improving polygenic risk prediction from summary statistics by an empirical Bayes approach.

Scientific reports . 7:41262. 12.Lim GY, Tam WW, Lu Y, Ho CS, Zhang MW, Ho RC (2018): Prevalence of Depression in the Community from 30 Countries between 1994 and 2014.

Sci Rep . 8:2861-2861. 13.Rowland TA, Marwaha S (2018): Epidemiology and risk factors for bipolar disorder.

Ther Adv Psychopharmacol . 8:251-269. 14.Messias EL, Chen C-Y, Eaton WW (2007): Epidemiology of schizophrenia: review of findings and myths.

Psychiatr Clin North Am . 30:323-338. 15.Newschaffer CJ, Croen LA, Daniels J, Giarelli E, Grether JK, Levy SE, et al. (2007): The epidemiology of autism spectrum disorders.

Annual review of public health . 28:235-258. 16.Willcutt EG (2012): The prevalence of DSM-IV attention-deficit/hyperactivity disorder: a meta-analytic review.

Neurotherapeutics : the journal of the American Society for Experimental NeuroTherapeutics . 9:490-499. 17.Smink FRE, van Hoeken D, Hoek HW (2012): Epidemiology of eating disorders: incidence, prevalence and mortality rates.

Curr Psychiatry Rep . 14:406-414. 18.Goodman WK, Grice DE, Lapidus KA, Coffey BJ (2014): Obsessive-compulsive disorder.

The Psychiatric clinics of North America . 37:257-267. 19.Roth T (2007): Insomnia: definition, prevalence, etiology, and consequences.

J Clin Sleep Med . 3:S7-S10. 20.Nock MK, Borges G, Bromet EJ, Alonso J, Angermeyer M, Beautrais A, et al. (2008): Cross-national prevalence and risk factors for suicidal ideation, plans and attempts.

The British journal of psychiatry : the journal of mental science . 192:98-105. 21.Hasin DS, Stinson FS, Ogburn E, Grant BF (2007): Prevalence, correlates, disability, and comorbidity of DSM-IV alcohol abuse and dependence in the United States: results from the National Epidemiologic Survey on Alcohol and Related Conditions.

Archives of general psychiatry . 64:830-842. 22.Anthony JC, Lopez-Quintero C, Alshaarawy O (2017): Cannabis Epidemiology: A Selective Review.

Curr Pharm Des . 22:6340-6352. 23.McGrath JJ, Saha S, Al-Hamzawi A, Alonso J, Bromet EJ, Bruffaerts R, et al. (2015): Psychotic Experiences in the General Population: A Cross-National Analysis Based on 31,261 Respondents From 18 Countries.

JAMA Psychiatry . 72:697-705. 24.Avenevoli S, Swendsen J, He J-P, Burstein M, Merikangas KR (2015): Major depression in the national comorbidity survey-adolescent supplement: prevalence, correlates, and treatment.

J Am Acad Child Adolesc Psychiatry . 54:37-44.e32. 25.Smith DJ, Nicholl BI, Cullen B, Martin D, Ul-Haq Z, Evans J, et al. (2013): Prevalence and Characteristics of Probable Major Depression and Bipolar Disorder within UK Biobank: Cross-Sectional Study of 172,751 Participants.

PLOS ONE . 8:e75362. 26.Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. (2018): Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression.

Nature genetics . 50:668-681. 27.Howard DM, Adams MJ, Clarke T-K, Hafferty JD, Gibson J, Shirali M, et al. (2019): Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions.

Nature Neuroscience . 22:343-352. 28.Cai N, Revez JAA, Adams MJ, Andlauer TF, Breen G, Byrne EM, et al. (2019): Minimal phenotyping yields GWAS hits of reduced specificity for major depression.

BioRxiv .440735. 29.Nieuwboer HA, Pool R, Dolan CV, Boomsma DI, Nivard MG (2016): GWIS: Genome-Wide Inferred Statistics for Functions of Multiple Phenotypes.

American journal of human genetics . 99:917-927. 30.Prentice RL, Pyke R (1979): Logistic Disease Incidence Models and Case-Control Studies.

Biometrika . 66:403-411. 31.Yin L, Chau CK-l, Lin Y-P, So H-C (2020): A framework to decipher the genetic architecture of combinations of complex diseases: applications in cardiovascular medicine. arXiv preprint arXiv:200308518 . 32.Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J (2015): LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. 47:291-295. 33.Watanabe K, Taskesen E (2017): Functional mapping and annotation of genetic associations with FUMA. 8:1826. 34.Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM (2014): A general framework for estimating the relative pathogenicity of human genetic variants.

Nature Genetics . 46:310-315. 35.Ernst J, Kellis M (2012): ChromHMM: automating chromatin-state discovery and characterization.

Nature methods . 9:215-216. 36.Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. (2015): Integrative analysis of 111 reference human epigenomes.

Nature . 518:317-330. 37.Wang K, Li M, Hakonarson H (2010): ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Nucleic acids research . 38:e164. 38.Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. (2012): Annotation of functional variation in personal genomes using RegulomeDB.

Genome research . 22:1790-1797. 39.de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015): MAGMA: Generalized Gene-Set Analysis of GWAS Data.

PLOS Computational Biology . 11:e1004219. 40.Kamburov A, Stelzl U, Lehrach H, Herwig R (2012): The ConsensusPathDB interaction database: 2013 update.

Nucleic Acids Research . 41:D793-D800. 41.Speed D, Balding DJ (2019): SumHer better estimates the SNP heritability of complex traits from summary statistics.

Nature Genetics . 51:277-284. Nature genetics . 47:1228-1235. 43.So H-C, Sham PC (2010): A unifying framework for evaluating the predictive power of genetic variants based on the level of heritability explained.

PLoS Genet . 6:e1001230-e1001230. 44.Boraska V, Jerončić A, Colonna V, Southam L, Nyholt DR, William Rayner N, et al. (2012): Genome-wide meta-analysis of common variant differences between men and women.

Human molecular genetics . 21:4805-4815. 45.Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L, Haycock PC, et al. (2017): LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis.

Bioinformatics (Oxford, England) . 33:272-279. 46.Horton R, Wilming L, Rand V, Lovering RC, Bruford EA, Khodiyar VK, et al. (2004): Gene map of the extended human MHC.

Nature Reviews Genetics . 5:889-899. 47.Xia J, He Q, Li Y, Xie D, Zhu S, Chen J, et al. (2011): The relationship between neuroticism, major depressive disorder and comorbid disorders in Chinese women.

Journal of affective disorders . 135:100-105. 48.van Os J, Jones P (2001): Neuroticism as risk factor for schizophrenia.

Psychological medicine . 31:1129-1134. 49.Mosher Ruiz S, Oscar-Berman M, Kemppainen MI, Valmas MM, Sawyer KS (2017): Associations Between Personality and Drinking Motives Among Abstinent Adult Alcoholic Men and Women.

Alcohol and Alcoholism . 52:496-505. 50.Kessler RC, Bromet EJ (2013): The epidemiology of depression across cultures.

Annu Rev Public Health . 34:119-138. 51.Zbozinek TD, Rose RD, Wolitzky-Taylor KB, Sherbourne C, Sullivan G, Stein MB, et al. (2012): Diagnostic overlap of generalized anxiety disorder and major depressive disorder in a primary care sample.

Depress Anxiety . 29:1065-1071. 52.Nagel M, Jansen PR, Stringer S, Watanabe K, de Leeuw CA, Bryois J, et al. (2018): Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways.

Nature Genetics . 50:920-927. 53.Owoeye O, Kingston T, Scully PJ, Baldwin P, Browne D, Kinsella A, et al. (2013): Epidemiological and Clinical Characterization Following a First Psychotic Episode in Major Depressive Disorder: Comparisons With Schizophrenia and Bipolar I Disorder in the Cavan-Monaghan First Episode Psychosis Study (CAMFEPS).

Schizophrenia bulletin . 39:756-765. 54.Kulm S, Mezey J, Elemento O (2020): Benchmarking the Accuracy of Polygenic Risk Scores and their Generative Methods. medRxiv .2020.2004.2006.20055574. 55.Weissbrod O, Flint J, Rosset S (2018): Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics.

The American Journal of Human Genetics . 103:89-99. 56.Cai N, Revez JA, Adams MJ, Andlauer TFM, Breen G, Byrne EM, et al. (2019): Minimal phenotyping yields GWAS hits of low specificity for major depression. bioRxiv .440735. 57.Siddiqui SV, Chatterjee U, Kumar D, Siddiqui A, Goyal N (2008): Neuropsychology of prefrontal cortex.

Indian J Psychiatry . 50:202-208. 58.Ray RD, Zald DH (2012): Anatomical insights into the interaction of emotion and cognition in the prefrontal cortex.

Neurosci Biobehav R . 36:479-501. 59.Yücel M, Wood SJ, Fornito A, Riffkin J, Velakoulis D, Pantelis C (2003): Anterior cingulate dysfunction: implications for psychiatric disorders?

J Psychiatry Neurosci . 28:350-354. 60.Davey CG, Harrison BJ, Yücel M, Allen NB (2012): Regionally specific alterations in functional connectivity of the anterior cingulate cortex in major depressive disorder.

Psychological medicine . 42:2071-2081. 61.Fountoulakis KN, Giannakopoulos P, Kövari E, Bouras C (2008): Assessing the role of cingulate cortex in bipolar disorder: neuropathological, structural and functional imaging data.

Brain research reviews . 59:9-21. 62.Adams R, David AS (2007): Patterns of anterior cingulate activation in schizophrenia: a selective review.

Neuropsychiatr Dis Treat . 3:87-101. Archives of general psychiatry . 61:354-360. 64.O'Brien KM, Vincent NK (2003): Psychiatric comorbidity in anorexia and bulimia nervosa: nature, prevalence, and causal relationships.

Clinical psychology review . 23:57-74. 65.Arnone D, McIntosh AM, Ebmeier KP, Munafo MR, Anderson IM (2012): Magnetic resonance imaging studies in unipolar depression: systematic review and meta-regression analyses.

European neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology . 22:1-16. 66.Arnold SJ, Ivleva EI, Gopal TA, Reddy AP, Jeon-Slaughter H, Sacco CB, et al. (2015): Hippocampal volume is reduced in schizophrenia and schizoaffective disorder but not in psychotic bipolar I disorder demonstrated by both manual tracing and automated parcellation (FreeSurfer).

Schizophrenia bulletin . 41:233-249. 67.Meisenzahl EM, Seifert D, Bottlender R, Teipel S, Zetzsche T, Jager M, et al. (2010): Differences in hippocampal volume between major depression and schizophrenia: a comparative neuroimaging study.

European archives of psychiatry and clinical neuroscience . 260:127-137. 68.Koenen KC, Ratanatharathorn A, Ng L, McLaughlin KA, Bromet EJ, Stein DJ, et al. (2017): Posttraumatic stress disorder in the World Mental Health Surveys.

Psychological medicine . 47:2260-2274. Tables

Table 1 Simulation results comparing analyses of individual-level genotype data and our presented analytic approach

Overlap rate No. cases h h Correlation RMSE Inferred Real GWAS

Beta SE Beta SE Power Type I Power Type I error error 0.15 10000 0.2 0.3 0.98769 0.99789 0.02194 0.00803 0.633 0.040 0.723 0.043 20000 0.2 0.3 0.99335 0.99807 0.01634 0.00564 0.740 0.040 0.770 0.037 50000 0.2 0.3 0.99766 0.99811 0.00939 0.00359 0.823 0.023 0.873 0.047 100000 0.2 0.3 0.99861 0.99811 0.00724 0.00253 0.877 0.047 0.903 0.047 10000 0.22 0.32 0.98766 0.99776 0.02253 0.00800 0.653 ------ 0.723 ------ 20000 0.22 0.32 0.99376 0.99794 0.01618 0.00565 0.723 ------ 0.787 ------ 50000 0.22 0.32 0.99784 0.99792 0.00941 0.00360 0.833 ------ 0.880 ------ 100000 0.22 0.32 0.99873 0.99796 0.00724 0.00254 0.877 ------ 0.910 ------ 0.25 10000 0.2 0.3 0.99022 0.99768 0.02077 0.00606 0.660 0.040 0.717 0.043 20000 0.2 0.3 0.99645 0.99764 0.01243 0.00437 0.737 0.037 0.770 0.030 50000 0.2 0.3 0.99816 0.99785 0.00898 0.00275 0.817 0.043 0.857 0.050 100000 0.2 0.3 0.99913 0.99777 0.00608 0.00194 0.870 0.040 0.900 0.027 10000 0.22 0.32 0.99144 0.99730 0.02031 0.00605 0.683 ------ 0.727 ------ 20000 0.22 0.32 0.99637 0.99754 0.01315 0.00436 0.757 ------ 0.780 ------ 50000 0.22 0.32 0.99831 0.99771 0.00888 0.00275 0.833 ------ 0.860 ------ 100000 0.22 0.32 0.99923 0.99760 0.00597 0.00194 0.887 ------ 0.903 ------

No. cases indicates the number of cases we defined for our simulation scenarios; h , heritability explained by SNPs; RMSE, root mean square error. Table 2. Summary of GWAS data of 18 psychiatric traits/disorders included in this study Traits/Disorders Abbreviation Source b Data type Cases Controls Total N Prevalence (%) d Major Depression Disorder (2018) MDD PGC binary 59,851 113,154 173,005 13.0 (12) Bipolar Disorder (2018) BPD PGC binary 20,129 21,524 41,653 2.4 (13) Schizophrenia (2018) SCZ GERAD&CRESTAR binary 40,675 64,643 105,318 0.5 (14) Autism Spectrum Disorder (2019) ASD iPSYCH&PGC binary 18,381 27,969 46,350 2.5 (15) Attention deficit hyperactivity disorder (2019) ADHD iPSYCH&PGC binary 19,099 34,194 53,293 6.5 (16) Post-traumatic Stress Disorder (2017) PTSD PGC binary 5,183 15,547 20,730 3.9 (68) Anxiety Disorder (2018) a anxiety UKBB binary 16,730 101,021 117,751 14.2 e Eating Disorder (2017) ED PGC binary 3,495 10,892 14,477 1.2 (17) Obsessive-compulsive Disorder (2018) OCD PGC binary 2,688 7,037 9,725 2.3 (18) Insomnia (2019) Insomnia UKBB&CTG lab binary 109,389 277,144 386,533 10.0 (19) Suicide Attempts in mental disorder (2018) SA iPSYCH-PGC binary 6,024 44,240 50,264 2.7 (20) Alcohol dependence (2018) alcohol PGC binary 11,476 23,080 34,556 12.0 (21) Ever used cannabis (2018) cannabis ICC binary 43,380 118,702 162,082 4.0 (22) Psychotic Experiences (2019) PE CNGG-Walters group binary 6,123 121,843 127,966 5.8 (23) Neuroticism_High_20P (2019) neuroticism CTG Lab binary c c Longest period of depression_High_20P (2018) longest depression UKBB binary c c Bipolar and major depression status: Probable Recurrent major depression ProbDep UKBB binary 6,304 80,591 86,895 3.55 (24) Seen doctor (GP) for nerves, anxiety, tension or depression GPDep UKBB binary 123,528 235,165 358,693 17.3 (12) a Anxiety Disorder: mental health problems ever diagnosed by a professional: anxiety, nerves, or generalized anxiety disorder; b PGC: psychiatric genomics consortium; UKBB: UK biobank; CTG lab: complex trait genetics lab; ICC: international cannabis consortium; CNGG: Centre for Neuropsychiatric Genetics and Genomics; c continuous neuroticism scores were transformed to binary traits, in which 20% of subjects were assigned as cases and the others as controls; population prevalence of longest depression is population prevalence of MDD multiple by 20%, i.e. 13%*20%. d Prevalence of traits/disorders refers to estimates of lifetime prevalence based on the cited references; e Prevalence of anxiety disorder is estimated from UKBB directly. 26

Table 3. Identification of differentially associated genetic variants/genes from correlated psychiatric traits/disorders.

Comparisons a Genetic correlation Differential association GWAS b rg p-value intercept Sig. SNPs Genomic risk loci Sig. Genes LDSC-h (se) SumHer-h (se) 1.MDD vs. psychiatric disorders/traits SCZ 0.3857 3.50e-46 0.0548 2,312 37 953 0.183(0.008)

BPD 0.3387 1.82e-23 0.0679 42 4 174 0.239(0.013)

ED 0.1652 1.29e-02 0.0433 94 2 5 0.258(0.035)

ASD 0.4466 6.97e-25 0.1441 76 5 17 0.147(0.015) - d ADHD 0.5573 1.33e-50 0.1703 167 5 82 0.198(0.018) anxiety 0.7851 1.87e-32 0.0341 120 5 106 0.284(0.019) insomnia 0.4706 1.75e-44 0.004 13 3 40 0.083(0.006) alcohol 0.5893 4.23e-09 0.0389 3 1 0 0.064(0.016) cannabis 0.2433 8.61e-09 -0.0009 608 7 24 0.122(0.009)

SA 0.5639 8.00e-04 0.0069 0 14 c

0 0.056(0.023)

PTSD 0.6095 7.70e-03 0.0064 0 9 c

0 0.034(0.024)

OCD 0.2272 5.00e-04 0.0103 0 12 c

1 0.344(0.050)

Recurrent probable depression 1.1036 7.99e-12 0.0617 177 4 110 0.505(0.033) seen GP for depression 0.9441 6.84e-287 0.0978 22 3 72 0.087(0.007) longest depression 1.0821 1.21e-06 0.0286 3 3 0 0.020(0.003)

3. Neuroticism vs. psychiatric disorders 27 anxiety 0.7401 7.24e-49 0.1617 4,671 57 1,232 0.161(0.006)

SCZ 0.2293 1.10e-20 0.0102 1,733 1,214 1 0.013(0.001)

MDD 0.7507 6.54e-182 0.0644 65 20 4 0.016(0.001) alcohol 0.3754 7.66e-08 0.008 37 3 40 0.022(0.004)

BPD 0.1748 2.36e-02 0.0056 25 2 30 0.398(0.033)

MDD 0.4957 1.33e-08 0.0072 0 11 c

0 0.131(0.022) - d

5. Four other comparisons SCZ vs. BPD 0.6903 2.13e-181 0.1403 45 3 144 0.202(0.015)

ASD vs. ADHD 0.3879 4.89e-17 0.3444 335 7 88 0.400(0.033) alcohol vs. cannabis 0.1482 5.20e-02 0.022 130 2 0 0.132(0.019) anxiety vs. SA 0.0705 7.31e-01 0.0068 1 1 0 0.072(0.036) a MDD: major depression disorder, SCZ: schizophrenia, BPD: bipolar disorder, ED: eating disorder (anorexia nervosa), PTSD: posttraumatic stress disorder, OCD: obsessive-compulsive disorder, ASD: autism spectrum disorder, ADHD: attention deficit hyperactivity disorder; Used cannabis: ever used cannabis; Longest depression: Longest period of depression; Seen GP for depression: Seen general practitioner (GP) for nerves, anxiety, tension or depression; Severe depression: Bipolar and major depression status: Probable Recurrent major depression; SA: suicide attempts. b Sig. SNPs: SNPs with nominal p values below 5e-08; Sig. Genes: Genes with adjusted p-value (FDR) below 0.05 in GWGAS; LDSC/SumHer-h : liability-scale SNP-based heritability calculating by the LDSC and SumHer programs, respectively. c The corresponding genomic risk loci are constructed on those lead SNPs with p-value below 5e-06 . Values of AUC above 0.70 are in bold. d SumHer returns estimates that are negative, hence we present the results from LDSC only.28

Table 4. Top 5 genes from each comparison based on gene-based analysis using MAGMA.

Comparison Top 5 genes P FDR-adjusted P SCZ vs MDD

PPP1R16B , HIST1H4L, DPYD, PITPNM2, NGEF <5.44E-12 <1.99E-08 BPD vs MDD

HAPLN4, TRANK1, VPS9D1, MAD1L1, NDUFA13 <4.24E-07 <1.33E-03 MDD vs ASD

MACROD2, XRN2, WDPCP, EGR2, FZD5 <4.00E-06 <1.27E-02 MDD vs ADHD

CDH8, MEF2C, KDM4A, PTPRF, KCNH3 <2.02E-07 <7.48E-04 MDD vs ED

ERBB3, SUOX, FAM19A2, CRTC3, RAB5B <5.58E-06 <2.09E-02 MDD vs anxiety

BTN3A2, HIST1H2BN, PTPN1, ZKSCAN4, PGBD1 <3.54E-08 <1.33E-04 MDD vs. insomnia

BTN3A2, HIST1H2BN, ZSCAN9, SYNGAP1, RAB1B <4.98E-07 <3.81E-03 MDD vs. alcohol

MTFR1, ATF6B, KREMEN2, SLC25A52, ALPK1 <1.35E-04 <4.47E-01 MDD vs. cannabis

CADM2, C10orf32-ASMT, AS3MT, ACTL8, ARID1B <3.80E-07 <1.43E-03 MDD vs. SA

NCL, ST8SIA5, COA4, PDE4B, SLBP <1.46E-04 <3.99E-01 MDD vs PTSD

ATP6V1E1, MYO5B, ZYG11A, GNA15, UBA3 <3.04E-04 <8.89E-01 MDD vs OCD

KIT , PLAG1, FGF19, PPIG, TXNL1 <3.31E-05 <8.88E-02 MDD vs. severe depression

HIST1H2BN, BTN3A2, ZKSCAN4, PGBD1, PTPN1 <3.89E-08 <1.47E-04 MDD vs. seen GP for depression

PTPN1, BTN3A2, ZKSCAN4, HIST1H2BN, PGBD1 <1.54E-07 <5.81E-04 Longest depression vs. MDD

FBXW4, C11orf42, AC079602.1, ANAPC11, HIST1H2BM <1.56E-03 <8.33E-01 Neuroticism vs. anxiety

STH, WNT3, SPPL2C, CRHR1, MAPT <4.16E-22 <1.57E-18 Neuroticism vs. SCZ

DNAJC19 , CCL20 , OR7D4 , PBX2 , REG3A <5.65E-05 <2.01E-01 Neuroticism vs. MDD

MAPT, WNT3, CRHR1, KANSL1 , NSF <1.73E-05 <6.50E-02 Neuroticism vs. alcohol

BTN3A2, HIST1H2BN, ZSCAN9, SYNGAP1, RAB1B <4.98E-07 <3.81E-03 Psychotic experiences vs. SCZ

SPATS2L, HIST1H4L, UBD, OR2B2, HIST1H2BN <6.47E-10 <2.02E-06 Psychotic experiences vs. BPD

SPATS2L, HAPLN4, TM6SF2, NDUFA13, CTC-260F20.3 <3.73E-07 <1.24E-03 Psychotic experiences vs. MDD

FAM168A, SHPRH, SPAM1, ADRB2, POC1B <1.48E-04 <3.66E-01 SCZ vs BPD

ZKSCAN3, ZSCAN31, PGBD1, HYDIN, ZSCAN12 <6.52E-08 <2.38E-04 ADHD vs ASD

XKR6, KDM4A, C8orf12, RP1L1, MSRA <1.52E-08 <5.65E-05 Cannabis vs alcohol

HS6ST1, ABHD14A-ACY1, ACY1, ENTPD4, AKR1C2 <6.81E-05 <2.10E-01 Anxiety vs SA

PDE4B, DNAJC6, LAMA2, NCL, RAVER1 <3.96E-05 <1.45E-01

Genes with FDR<0.05 are in bold. Table 5 . Ability to discriminate psychiatric disorders by polygenic risk scores (PRS)

Comparison Discriminating Ability

Relative risk of the 1st disorder comparing the top against the bottom percentiles (based on existing GWAS) AUC Polygenic risk scores (based on existing GWAS) Top 5th vs. lowest 5th Top 10th vs. lowest 10th Top 20th vs. lowest 20th Top 30th vs. lowest 30th (Max)

Best P-thres

AUC percentile percentile percentile percentile SCZ vs. MDD