[PDF] Statistical Methods for cis-Mendelian Randomization

Abstract

Mendelian randomization is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. In this paper we focus on Mendelian randomization analyses with many correlated variants from a single gene region, and particularly on cis-Mendelian randomization studies which uses protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and causal effect estimation in cis-Mendelian randomization, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis and Bayesian variable selection. In a simulation study, we show that the various methods have a comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inference than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of LDL-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions respectively.

Full PDF

SStatistical Methods for cis-Mendelian Randomization

Apostolos Gkatzionis , ∗ , Stephen Burgess , , Paul J Newcombe MRC Biostatistics Unit, University of Cambridge, UK. MRC Integrative Epidemiology Unit, University of Bristol, UK. Department of Public Health and Primary Care, School of Clinical Medicine,University of Cambridge, UK.

Abstract

Mendelian randomization is the use of genetic variants to assess the existence of a causal re-lationship between a risk factor and an outcome of interest. In this paper we focus on Mendelianrandomization analyses with many correlated variants from a single gene region, and particu-larly on cis-Mendelian randomization studies which uses protein expression as a risk factor. Suchstudies must rely on a small, curated set of variants from the studied region; using all variantsin the region requires inverting an ill-conditioned genetic correlation matrix and results in nu-merically unstable causal eﬀect estimates. We review methods for variable selection and causaleﬀect estimation in cis-Mendelian randomization, ranging from stepwise pruning and conditionalanalysis to principal components analysis, factor analysis and Bayesian variable selection. In asimulation study, we show that the various methods have a comparable performance in analyseswith large sample sizes and strong genetic instruments. However, when weak instrument bias issuspected, factor analysis and Bayesian variable selection produce more reliable inference thansimple pruning approaches, which are often used in practice. We conclude by examining twocase studies, assessing the eﬀects of LDL-cholesterol and serum testosterone on coronary heartdisease risk using variants in the

HMGCR and

SHBG gene regions respectively.

Introduction

Mendelian randomization (MR) is the use of genetic data in order to assess the existence of acausal relationship between a modiﬁable risk factor and an outcome of interest (Davey Smith andEbrahim, 2003; Burgess and Thompson, 2015). It is an application of instrumental variablesanalysis in the ﬁeld of genetic epidemiology, where genetic variants are used as instruments.The approach has received much attention in recent years and has been used to identify a largenumber of causal associations in the literature (Boef et al., 2015). For example, MR studies havestrengthened the evidence for a causal link between lipoprotein(a) and coronary heart disease(CHD) (Kamstrup et al., 2009), but have weakened the evidence for an eﬀect of C-reactiveprotein on CHD risk (CCGC (CRP CHD Genetics Collaboration), 2011).A Mendelian randomization analysis requires a set of genetic variants that satisfy the in-strumental variable assumptions: genetic variants should be (i) associated with the risk factor ∗ Corresponding author. Email: [email protected] a r X i v : . [ q - b i o . Q M ] J a n f interest, (ii) independent of confounders of the risk factor-outcome association, and (iii) theyshould only inﬂuence the outcome through their eﬀect on the risk factor (the no-pleiotropyassumption). Under these three assumptions, MR oﬀers a framework for assessing whether therisk factor is causally related to the outcome; under additional modelling assumptions, one canalso estimate the magnitude of the risk factor-outcome causal eﬀect. XG YU (cid:88) x x (1)(2) (3)

Figure 1: A causal diagram representation of the three assumptions of Mendelian randomization.Here, X represents the risk factor, Y the outcome, G the genetic instrument and U denotes con-founders of the X − Y relationship. Traditionally, Mendelian randomization studies are implemented using independent geneticvariants from across the whole genome. This is especially the case when studying complexpolygenic traits such as body mass index, cholesterol or blood pressure. Such genome-wideanalyses rely on published results from large-scale GWAS studies to identify large numbersof genetic regions associated with the risk factor studied. Genetic variants in each region arepruned for independence and only the variant with the smallest p-value, or a small number ofweakly correlated variants with small p-values are selected. Variants from diﬀerent regions arethen combined to create a genome-wide set of instruments for MR, thus increasing the powerof the analysis to detect a causal relationship.Conversely to the more traditional type of genome-wide MR described above, recent yearshave seen an increased number of MR studies based on a single gene region, and particularlycis-Mendelian randomization studies (Schmidt et al., 2019; Gill et al., 2020). Cis-MR uses aprotein as a risk factor, and investigates the gene region encoding the studied protein in order toidentify variants that regulate protein expression. Mendelian randomization is then conductedusing these variants as instruments, in order to assess the eﬀects of protein expression on oneor more disease outcomes. Cis-MR studies have grown in popularity because they can be usedfor drug target discovery and validation (Gill et al., 2020). Since proteins are the drug targetsof most medicines, an MR investigation of the eﬀects of protein expression on a disease canprovide insights on whether a drug targeting the protein may reduce the risk of suﬀering fromthe disease.Cis-MR studies typically rely on a single gene region, and therefore have to select geneticinstruments among a potentially large number of correlated variants. Through the existing cis-MR literature, a variety of techniques have been employed to accomplish this task. However, todate no review or comparison of the diﬀerent methods has been published, and applied cis-MRanalyses typically use some form of LD-pruning for variable selection. The aim of this paperis to provide an overview of statistical methods for cis-MR studies with summary-level data.After reviewing commonly used approaches such as LD-pruning and conditional analysis, wediscuss the use of principal components analysis (PCA), factor analysis and stochastic-searchvariable selection. The use of PCA was proposed by Burgess et al. (2017) for MR with correlated nstruments and can be directly extended to cis-MR. The use of factor analysis was recentlyproposed by Patel et al. (2020), building on similar methods for instrumental variables analysiswith individual-level data (Bai and Ng, 2002). Stochastic search can be implemented usingthe JAM algorithm, originally proposed by Newcombe et al. (2016) for ﬁne-mapping denselygenotyped regions and recently adapted for MR by Gkatzionis et al. (2020). We discuss how toimplement each method in the context of cis-MR and compare their performance in simulationstudies.We also provide two real-data applications, investigating associations of variants in the SHBG and

HMGCR regions with coronary heart disease risk. The

HMGCR gene encodesthe protein HMG-coenzyme A reductase, which is known to play an important role in thecholesterol-biosynthesis pathway. Its inhibition with statins is known to reduce LDL-cholesterollevels, therefore statin treatment is often prescribed to individuals in high CHD risk. The

SHBG region encodes the protein sex-hormone binding globulin, and is known to contain geneticassociations with testosterone and other sex hormones; there is some uncertainty on whethertestosterone levels are associated with CHD risk, although a recent MR study has suggested nocausal relationship (Schooling et al., 2018).

An Overview of Cis-Mendelian Randomization

Drug development is a challenging and costly process with uncertain results, and the failure ratefor drugs in clinical development can be as high as 90% (Hay et al., 2014). Since proteins aretypically the targets of common drugs, cis-Mendelian randomization uses expression data fordruggable proteins as risk factors, in order to aid in drug target discovery and validation. Thewealth of available genetic data, combined with the ability to routinely implement MR analysesusing existing software (Walker et al., 2019; Yavorska and Burgess, 2017) makes cis-MR a usefulapproach to complement and inform drug development. This can be evidenced by the increasedsuccess rate of drug development programs with genomic support (Nelson et al., 2015).Cis-MR studies can be used in order to assess the suitability of proteins as potential drugtargets, predict and inform the results of clinical trials, or investigate repurposing opportunitiesand potential side-eﬀects of existing drugs. For example, a Mendelian randomization analysisof LDL cholesterol and cardiovascular disease risk based on the

ACLY region (Ference et al.,2019) was recently used alongside a clinical trial (Ray et al., 2019) to assess the suitability ofthe

ACLY region as a potential drug target for lowering LDL-cholesterol. Similar analyses havebeen conducted using the

CETP region to instrument lipid fractions and blood pressure (Sofatet al., 2010), using the

IL6R region to assess the eﬀect of interleukin 6 reduction on coronaryheart disease risk (The Interleukin-6 Receptor Mendelian Randomisation Analysis (IL6R MR)Consortium, 2012), and using the

HMGCR (Swerdlow et al., 2015) and

PCSK9 regions (Schmidtet al., 2017) to explore potential eﬀects of LDL-cholesterol lowering treatments on body massindex and type 2 diabetes risk.Ideally, cis-MR should be conducted using protein expression data for the target proteinrisk factor (Gill et al., 2020). However, data on protein expression are not always available,and researchers often use either gene expression data for the associated gene region or geneticassociations with a downstream biomarker instead. For example, Swerdlow et al. (2015) usedLDL-cholesterol as a downstream risk factor in order to study the eﬀects of HMGCR inhibitionon the risk of type 2 diabetes.Genetic variants for cis-MR are selected from a region containing the protein-encoding gene.The gene region is deﬁned to contain the target gene, as well as variants within a few hundred housand base pairs on either side of the gene, in order to include possible cis-eQTL variants.Within the limits of a gene region, it is often possible to identify hundreds of correlated variantsmarginally associated with protein expression. Researchers must then select which of thesevariants to incorporate in a subsequent MR analysis; including all available variants is usuallyavoided as it can introduce numerical approximation errors in estimating the MR causal eﬀect(Burgess et al., 2017). Some authors have suggested prioritizing variants based on functionalannotations; however, Schmidt et al. (2019) investigated this strategy in a simulation study andfound little to no diﬀerence in causal inferences conducted using functional variants comparedto non-coding variants. Therefore, the selection of genetic variants is usually performed basedon the strength of their associations with the risk factor, while accounting for LD correlationto reduce numerical approximation errors. In the next section, we review statistical approachesfor implementing this variable selection procedure.Finally, similar to standard Mendelian randomization analyses, genetic variants used forcis-MR must satisfy the instrumental variables assumptions. Violations of the instrumentalvariable assumptions are often a concern because they have the potential to devalidate tests ofcausal association between the risk factor and the outcome and to bias causal eﬀect estimates.However, violations of the no-pleiotropy assumption are less likely to occur when studyingprotein expression, because proteins are causally upstream of metabolites and biomarkers, whichconstitute the most common risk factors used in traditional MR analyses (Swerdlow et al., 2016;Schmidt et al., 2019). Statistical Methods for cis-Mendelian Randomization

Notation

We now present various approaches for selecting genetic variants and computing causal eﬀectestimates in cis-MR studies. Let X denote the risk factor of interest, Y denote the outcome, G , . . . , G P denote a set of P putative genetic instruments from a single gene region, and let U represent confounders of the risk factor-outcome association. Also, let θ denote the causaleﬀect of the risk factor on the outcome. Our task is to obtain a subset of the genetic variantsto use as instruments, an estimate ˆ θ of the causal eﬀect parameter and an associated standarderror se(ˆ θ ).We work under the framework of two-sample summary-data MR and assume the availabilityof two sets of summary statistics, one for SNP-risk factor and one for SNP-outcome associations,obtained from diﬀerent datasets. We use ˆ β Xj , ˆ β Y j to denote the marginal associations betweengenetic variant G j and the two traits, and ˆ σ Xj , ˆ σ Y j for the corresponding standard errors. Inaddition, p Xj denotes the p-value of association between variant G j and the risk factor. Finally,we assume that we have access to a reference dataset, such as 1000 genomes or the UK Biobank,from which to estimate correlations between genetic variants. Single-Variant MR

Some studies in the cis-MR literature have used only the variant with the smallest univariatep-value in a region as an instrument (e.g. Sofat et al., 2010; Swerdlow et al., 2015). With asingle genetic variant, the MR causal eﬀect can be estimated using a ratio formula. This avoidsthe need to account for genetic correlations and to perform complex numerical tasks such asinverting the genetic covariance matrix. On the other hand, if the region contains multiple ariants with independent associations with the risk factor, a single-variant MR analysis willnot be able to model all the genetic eﬀects in the region. This typically translates into a lossof power for the cis-MR analysis. Therefore, most recent cis-MR papers have utilised variableselection or dimension reduction approaches in order to include several variants per region. LD-Pruning

Perhaps the most common approach in the literature for selecting genetic variants for inclusioninto a cis-MR study is LD-pruning (see e.g. Dudbridge and Newcombe, 2015; Schmidt et al.,2019). Given a set of candidate genetic variants G , . . . , G P , the stepwise pruning procedureiterates between: • Selecting the variant with the smallest p-value and including it in the analysis. • Identifying all variants whose correlation with the previously selected variant exceeds acorrelation threshold ρ and removing them from further consideration.This is repeated until all the candidate variants have either been selected for inclusion or dis-carded. Alternatively, the process may be stopped when there are no remaining variants withp-values lower than a signiﬁcance threshold τ , in order to guard against the inclusion of weakinstruments into the analysis. The threshold τ is often taken to be the GWAS signiﬁcancethreshold τ = 5 × − , although there is some evidence that using more relaxed thresholdsmay be beneﬁcial (Dudbridge, 2013).Once LD-pruning is implemented and a suitable set of genetic instruments is selected, theMR causal eﬀect can be estimated using an inverse variance weighted (IVW) estimate. If thecorrelation threshold ρ is small enough, the selected variants can be considered approximatelyindependent and the standard IVW estimate,ˆ θ ivw = (cid:80) Pj =1 ˆ β Xj ˆ β Y j ˆ σ − Y j (cid:80) Pj =1 ˆ β Xj ˆ σ − Y j se(ˆ θ ivw ) = 1 (cid:80) Pj =1 ˆ β Xj ˆ σ − Y j can be used. This approach is typically used in genome-wide MR analyses. However, forsingle-region MR, the use of a small correlation threshold may result in the exclusion of causalvariants, especially when studying gene regions with multiple independent associations with therisk factor. On the other hand, if a larger threshold is used, the standard IVW estimate willunderestimate the causal standard error, since it does not adjust for correlations. In this case itis better to use a modiﬁed version of of the IVW estimator that takes genetic correlations intoaccount (Burgess et al., 2016): ˆ θ cor = (cid:16) ˆ β TX Ω − ˆ β X (cid:17) − (cid:16) ˆ β TX Ω − ˆ β Y (cid:17) (1)se(ˆ θ cor ) = (cid:16) ˆ β TX Ω − ˆ β X (cid:17) − (2)where Ω is (an estimate of) the genetic covariance matrix: Ω i,j = Cov( G i , G j ). The estimator(1) can be motivated from the literature on meta-analysis. It is sometimes referred to asthe generalized least squares method (Schmidt et al., 2019) because it can be derived usinggeneralized linear regression to model the genetic associations with the outcome in terms of thegenetic associations with the risk factor (Burgess et al., 2016, 2017). he correlated-instruments IVW estimator requires the genetic covariance matrix Ω to becomputed and inverted, a process which can become numerically unstable if highly correlatedvariants are used (Burgess et al., 2017). This is the main reason why cis-MR analyses cannotuse all available variants within a region. Numerical instabilities are less likely to occur whenthe IVW estimator is used in combination with pruning, since the pairwise correlations ofvariants selected by LD-pruning are all lower than the speciﬁed correlation threshold ρ , althoughproblems can still occur if the genetic covariance matrix is near-singular.A potential criticism for the LD-pruning approach is that there is no consensus in theliterature on how to select the correlation threshold. Implementations of the method usingdiﬀerent correlation thresholds can give rise to diﬀerent results, especially when large thresholdsare used (Burgess et al., 2017). To address this issue, it has been proposed that stepwise pruningis implemented with a variety of pruning thresholds, as a form of sensitivity analysis (Schmidtet al., 2019). In addition, stepwise methods for variable selection have been criticised for beingunstable and getting stuck in local maxima of the model space (Hocking, 1976; Tibshirani,1996; Miller, 2002) and more ﬂexible variable selection approaches are often preferred in someapplications of biostatistics (Newcombe et al., 2016; Benner et al., 2016; Asimit et al., 2019),although it is not clear whether the added ﬂexibility provides a substantial beneﬁt in Mendelianrandomization studies. Conditional Analysis

Conditional analysis is a process similar to LD-pruning, except variant selection is based ongenetic associations with the risk factor conditional on any previously selected variants insteadof univariate marginal p-values. Perhaps the most established algorithm for conditional analysisis the Conditional and Joint (CoJo) algorithm proposed in Yang et al. (2012). The algorithmstarts by selecting the genetic variant with the smallest univariate p-value, and then in eachiteration it iterates between the following steps: • Calculate the p-values of all candidate genetic variants conditional on any variants thathave already been selected in previous steps. As with stepwise pruning, the p-values areonly computed for genetic variants whose correlations with previously selected variants donot exceed a pre-speciﬁed correlation threshold ρ . • Select the variant with the smallest conditional p-value, provided that its p-value is smallerthan a signiﬁcance threshold τ . • Regress the risk factor on all previously selected variants and compute the multivariatep-values for each selected variant. • If any of the p-values in the multivariate model is larger than the signiﬁcance threshold τ ,drop the variant with the largest p-value from the model.This variable selection algorithm is run until it can no longer add or remove any variants.Upon ﬁnishing, it provides a set of genetic variants which can subsequently be used to con-duct Mendelian randomization. A causal eﬀect estimate based on the selected variants is thenobtained using correlated-instruments IVW (1)-(2).Yang et al. (2012) provide details on how to compute the conditional and joint-model p-values using marginal summary statistics. The fact that the algorithm only requires summary-level data, along with its implementation as part of the GCTA software, have contributed toconditional analysis becoming a popular option for selecting variants from single genetic regions. n practice, conditional analysis often has a similar performance to LD-pruning (Dudbridgeand Newcombe, 2015). Both approaches rely on a correlation threshold parameter ρ and canproduce inconsistent results for diﬀerent values of the correlation parameter, especially for largevalues (e.g. ρ ≥ . Principal Component Analysis

Principal components analysis (PCA) is widely used in GWAS studies to adjust for populationstratiﬁcation. In the context of correlated-instruments summary-data MR, the use of princi-pal components was proposed by Burgess et al. (2017). The method aims to identify linearcombinations of genetic variants that are orthogonal to each other and explain as much of thevariance in the genetic dataset as possible. These linear combinations are then used as geneticinstruments to assess the causal relationship between the risk factor and outcome, using anIVW estimate.In Burgess et al. (2017), principal components analysis is applied to a transformed versionΨ of the covariance matrix, with elementsΨ ij = ˆ β Xi ˆ β Xj ˆ σ − Y i ˆ σ − Y j ρ ij = ˆ β Xi ˆ β Xj ˆ σ − Y i ˆ σ − Y j Ω ij where ρ ij = Cor( G i , G j ). The matrix Ψ is preferred over the untransformed matrix of geneticcorrelations because its entries depend on the precisions ˆ σ − Y j of univariate estimates and thus, iftwo genetic variants are almost perfectly correlated, the variant with the highest precision willbe prioritized.Principal components analysis uses an eigendecomposition for the matrix Ψ and identiﬁesthe diagonal matrix of eigenvalues Λ = ( λ ii ) and the corresponding matrix of eigenvectors W so that Ψ = W T Λ W . By transforming the summary statistics¯ β X = W T ˆ β X , ¯ β Y = W T ˆ β Y and the genetic covariance matrix ¯Ω = W T Ω W one can rewrite the IVW estimate (1) in terms of these transformed vectors:ˆ θ cor = (cid:0) ¯ β TX ¯Ω − ¯ β X (cid:1) − (cid:0) ¯ β TX ¯Ω − ¯ β Y (cid:1) (3)Evaluating (3) is subject to the same numerical diﬃculties as (1). However, one can constructa numerically stable approximation for ˆ θ cor by using only the ﬁrst k principal components. If˜ W is the sub-matrix of W containing only the ﬁrst k columns and ˜ β X = ˜ W T ˆ β X , ˜ β Y = ˜ W T ˆ β Y ,˜Ω = ˜ W T Ω ˜ W , then the causal eﬀect estimate based on the ﬁrst k principal components isˆ θ cor ≈ (cid:16) ˜ β TX ˜Ω − ˜ β X (cid:17) − (cid:16) ˜ β TX ˜Ω − ˜ β Y (cid:17) The number k of principal components to use is a tuning parameter for the method. In practice,it is often speciﬁed so that the selected principal components explain a speciﬁc proportion ofvariation in the genetic dataset. The criterion used in Burgess et al. (2017) is to select principalcomponents that explain either 99% or 99 .

9% of the variation in the data. he JAM Algorithm Joint analysis of marginal summary statistics (JAM) is a Bayesian stochastic-search variableselection algorithm that was introduced by Newcombe et al. (2016) for the purpose of statisticalﬁne-mapping from summary GWAS results. The algorithm aims to identify genetic variantsrobustly and independently associated with a trait of interest, among a large set of correlatedcandidate variants. JAM only requires the availability of GWAS summary-level data, andgenetic correlations which can be estimated from an external reference dataset. Using thesesummary data, JAM identiﬁes sets of genetic variants that are most plausibly associated withthe trait of interest.The algorithm uses linear regression to model trait X in terms of genetic variants G =( G , . . . , G P ): X ∼ N (cid:0) Gb, σ X (cid:1) (4)where b denotes the vector of multivariate associations between each genetic variant and thetrait (in contrast to β X , β Y , which represent marginal associations). In order to implement thisregression using summary statistics, JAM employs a transformation z = G T X , for which (4)implies that z ∼ N (cid:0) G T Gb, G T Gσ X (cid:1) (5)Assuming Hardy-Weinberg equilibrium and that the trait measurements and genetic data havebeen centered, each element z j of the vector z can be approximated using the marginal summarystatistics ˆ β Xj and eﬀect allele frequencies f j , z j = 2 f j (1 − f j ) N ˆ β Xj where N is the sample size from which the G-X associations are inferred. In practice, anadditional Cholesky decomposition is implemented to avoid modelling under the correlated errorstructure of (5); we refer to Newcombe et al. (2016) for the technical details. In addition, notethat G T G is ( N − (cid:96) ( z, G ref | b, σ X ).A similar likelihood is used to conduct the joint analysis step in Yang et al. (2012). JAMemploys Bayesian analysis instead, and uses conjugate normal-inverse-gamma g-priors p ( b, σ X )to facilitate Bayesian inference for the genetic associations with the trait.If the trait studied is binary, z can be derived by mapping univariate log-odds ratios tothe univariate eﬀects that would have been estimated if the binary trait was modelled by linearregression, as has been done in other linear-based summary data algorithms (Vilhjalmsson et al.,2015).The likelihood (5) relies on a ﬁxed set of genetic variants. For variable selection, we assumethat the likelihood has been conditioned on a speciﬁc set of variants γ and use a Beta-Binomialprior p ( γ ) for the model coeﬃcient in order to obtain the posterior p ( γ, b, σ X | z, G ref ) ∝ (cid:96) ( z, G ref | b, σ X , γ ) p ( b, σ X | γ ) p ( γ ) (6)This posterior is diﬃcult to evaluate analytically, but JAM implements stochastic-search modelselection via a reversible-jump Markov Chain Monte Carlo (RJMCMC) algorithm to samplefrom (6) and identify the most suitable sets of genetic variants. The stochastic search procedurestarts from a set γ (0) containing only the variant with the smallest p-value. Then in eachiteration, given a current set of genetic variants γ ( k ) , JAM randomly proposes a new set γ ( k +1) by either Adding a new variant to the current set γ ( k ) . • Removing a variant from γ ( k ) . • Swapping a variant in γ ( k ) for a variant not in γ ( k ) .The algorithm then decides whether to accept the new set by evaluating how well it ﬁts thetrait, according to the posterior (6). The process is repeated for a large number of iterations,and the various sets of variants are assigned posterior probabilities according to how often theywere visited. The stochastic search procedure allows JAM to evaluate a wide range of diﬀerentmodels and eﬃciently explore the parameter space of genetic conﬁgurations.For Mendelian randomization, the above procedure can be used to identify variants stronglyand independently associated with the risk factor of interest (Gkatzionis et al., 2020). An IVWcausal eﬀect estimate can then be obtained for each set of variants visited by JAM, and thesemodel-speciﬁc estimates can be combined into an aggregate causal eﬀect estimate by modelaveraging.Since it jointly models all available genetic variants and can account for genetic correlations,JAM is naturally suited for cis-MR. The algorithm discards variants which are associated withthe risk factor only through their correlation with other present variants, and its posteriormodels include variants independently associated with the risk factor. JAM is not completelyfree of numerical issues when implemented on near-collinear variants, but these issues can beovercome by employing pruning as a ﬁrst step before calling the algorithm. An advantage overthe LD-pruning approach is that JAM is more robust to the choice of pruning thresholds, sinceit implements a second layer of variable selection after pruning is ﬁnished. The pruning step canbe implemented using a high correlation threshold (e.g. ρ = 0 .

9) and no signiﬁcance threshold.As an alternative, JAM can avoid the need for pruning by using a ridge term for the geneticcovariance matrix to make its inversion more stable. The choice of a pruning threshold is thenreplaced by the tuning of the ridge term parameter. Here we focus on implementations usingpruning as a ﬁrst step and do not explore the use of a ridge term.Outside of the context of cis-MR, Gkatzionis et al. (2020) extended JAM to handle violationsof the no-pleiotropy assumption when working with multiple independent genetic variants, byaugmenting its variable selection with a heterogeneity loss function to penalize and downweightpleiotropic variants. Furthermore, Jiang et al. (2020) proposed a hierarchical version of thealgorithm that is useful for multivariable MR, as well as transcriptome-wide association studies.

Factor-based Methods

Recently, Patel et al. (2020) proposed the use of factor analysis for MR with correlated weakinstruments. The authors model the genetic variants in terms of a factor model, G = Λ f + (cid:15) f (7)where f is a vector of k < P latent factors, Λ is a matrix of factor loadings and (cid:15) f is a vectorof idiosyncratic errors. The matrix Λ can be estimated using the ﬁrst k eigenvectors in theeigendecomposition of the (rescaled) genetic covariance matrix G T G . This covariance matrixand its eigendecomposition can be approximated if one has access to a reference dataset. Thenumber k of latent factors can be selected either empirically, as is common in factor analysis,or in a more disciplined way, for example using the factor penalization method of Bai and Ng(2002). he factors are then used as genetic instruments in a standard MR model: X = b TX f + (cid:15) X Y = b TY f + (cid:15) Y β Y = θβ X where (cid:15) X , (cid:15) Y are mean-zero error terms. An estimate of the causal eﬀect parameter θ can beobtained by limited-information maximum likelihood (LIML), minimizing the criterionˆ g ( θ ) = 1 P ˆΛ T ˆ D Y ˆ β Y − θ P ˆΛ T ˆ D X ˆ β X where ˆΛ is the estimated matrix of factor loadings and ˆ D X , ˆ D Y are the diagonal matrices whosediagonal terms are the sample variances of each genetic instrument; these can be approximatedfrom the reference dataset. Minimization of (a weighted version of) ˆ g ( θ ) yields the factor LIML(F-LIML) estimate.Patel et al. (2020) describe conditions that guarantee the consistency and asymptotic nor-mality of the F-LIML estimator. Unfortunately, these conditions are often violated under a weakinstruments framework and F-LIML may suﬀer from weak instrument bias. In this scenario,Patel et al. (2020) propose a range of hypothesis tests, using the factors as genetic instruments,in order to assess the existence of a causal eﬀect. These tests are derived from the instrumentalvariables literature and adapted to work with summary-level data. They include the Anderson-Rubin (Anderson and Rubin, 1949), Lagrange multiplier (Kleibergen, 2002) and conditionallikelihood ratio (Moreira, 2003) tests. The various tests are more robust to weak instrumentbias than the F-LIML estimator, and have higher power than the principal components methodunder a weak instruments setting. Simulation Study

Simulation Design

To compare the various cis-MR approaches, we implemented a simulation study. We generateddata under the following simulation model: X = P (cid:88) j =1 b Xj G j + α X U + (cid:15) X (8) Y = θX + α Y U + (cid:15) Y (9) U, (cid:15) X , (cid:15) Y ∼ N (0 , σ ) independently of each other. (10)Our simulation design was informed by the real-data applications in the next section. Weobtained genetic variants from two gene regions: the SHBG region, which encodes the proteinsex-hormone binding globulin and is known to be associated with sex hormone traits, and the

HMGCR region, which encodes the protein HMG coenzyme-A reductase, plays an importantrole in the metabolic pathway that produces cholesterol and is the drug target of statins. Werefer to the real-data applications for more information about these regions.

SHBG and

HMGCR represent two distinctly diﬀerent genetic correlation structures; correlation heat-maps for thetwo regions are provided in Figures 2 and 3 below. By basing simulations on two diﬀerent egions we guarded against spurious observations on comparative performance of the methodsthat may arise due to subtle properties of the correlation structure in any one region.We deﬁned the two gene regions as spanning 100Kb pairs on either side of the correspondinggene and used all variants in these regions for which individual-level data were available in theUK Biobank and the minor allele frequencies were higher than 1%. This resulted in P = 717genetic variants for the SHBG region and P = 590 variants for the HMGCR region. Weextracted individual-level genetic observations for the selected variants from the UK Biobank,using a sample of N ref = 367643 non-related individuals of European origin. Aiming to replicatea two-sample Mendelian randomization analysis, we then bootstrapped the UK Biobank dataand obtained two genetic matrices G , G of sizes N , N respectively. Using the genetic matrix G , we simulated risk factor measurements according to (8) and used them to obtain G − X summary statistics. Using the matrix G and (8)-(9), we simulated outcome measurements andused them to generate G − Y summary statistics. Finally, the reference correlation matrix wascomputed using the entire UK Biobank dataset.To generate risk factor measurements, we used information from our real-data applicationsand the literature. For the SHBG region, we assumed that six independent genetic eﬀectswere present in the region. This was inspired by Coviello et al. (2012) who investigated theassociations of

SHBG variants with concentration of the SHBG protein, and suggested thatas many as nine independent signals may be present in the region. Three of the nine causalvariants identiﬁed by Coviello et al. (2012) were not included in our dataset, as they were locatedmore than 100Kb pairs away from the

SHBG gene, therefore we used the remaining six variantsin our simulations. For these six variants, the eﬀects of the risk-increasing allele on the riskfactor were generated according to b Xj ∼ | N (0 , . | + 0 .

1. This created a pattern of univariateSNP-exposure associations similar to that we observed in our real-data application.For the

HMGCR region, we also assumed the existence of six variants independently as-sociated with the risk factor. The six causal signals were placed at genetic variants used byFerence et al. (2019) to construct a genetic score for LDL-cholesterol based on the

HMGCR region. Eﬀects were generated according to b Xj ∼ | N (0 , . | + 0 .

03 for the risk-increasingallele; again, this closely resembled the univariate summary statistics obtained in our real-dataapplication.We considered three diﬀerent values for the causal eﬀect parameter: θ = 0 , . , .

1. Weset the G − X sample size to be N = 10000, which is fairly small for modern MR standardsbut may be reasonable for a cis-MR study using protein expression data for the risk factor. Insupplementary material, we report simulation results using a larger sample size of N = 100000.The sample size for SNP-outcome associations was set equal to N = 180000, similar to that ofthe CARDIoGRAMplusC4D dataset (CARDIoGRAMplusC4D Consortium, 2015) used in ourreal-data applications. Finally, we assumed that all the genetic variants are valid instrumentsand did not generate pleiotropic eﬀects.One of our goals in the simulations was to assess the eﬀect of weak instruments bias onthe performance of the various methods. This form of bias can be of particular concern incis-MR studies, as focusing on a single region means that usually there will be far fewer geneticinstruments to use. A common diagnostic for weak instrument bias is to compute the F statisticfor the regression of the risk factor on all the genetic variants; a rule-of-thumb is that F ≤ SHBG region (Jin et al., 2012) and 2% for the

HMGCR region (Krauss et al., SHBG regionand 35 for the

HMGCR region. In this scenario, the eﬀect of weak instrument bias on theperformance of cis-MR methods should be small.In our second scenario, we reduced the proportion of variation in the risk factor that isexplained by the genetic variants to 10% of its value in the previous scenario. For the

SHBG region, we assumed that the genetic variants only explain v G = 0 .

3% of variation in the riskfactor, while for the

HMGCR region, the genetic variants explained 0 .

2% of variation. Thisresulted in a “weak instruments” scenario in which the “oracle” MR analysis had an average Fstatistic of 6 . SHBG region and 4 . HMGCR region.For each scenario, the simulation was replicated 1000 times per region and per value of thecausal eﬀect.

Methods

In our simulations we implemented the following cis-MR methods: • Single-instrument MR using only the minimum p-value variant in the region. • LD-pruning. • Principal components analysis. • The JAM algorithm. • The F-LIML algorithm. • The factor-based conditional likelihood ratio (CLR) test.The performance of the conditional method is often quite similar to that of stepwise pruning(Dudbridge and Newcombe, 2015), hence the method was not implemented. A brief comparisonbetween conditional analysis, stepwise pruning and PCA can be found in Burgess et al. (2017).Many of the above methods depend on additional tuning parameters. To assess the sensitiv-ity of each method to the speciﬁcation of its tuning parameter(s), we used a range of diﬀerentvalues. For stepwise pruning, we set a correlation threshold of ρ = 0 . , . , . , . , .

9. For thesigniﬁcance threshold, we use the standard GWAS threshold ( τ = 5 × − ) in simulations with“strong instruments”; with “weak instruments” there were simulations in which no variantspassed the GWAS signiﬁcance threshold, so we used a lower value ( τ = 10 − ) instead. Forthe PCA method, we used principal components that explained k = 0 .

99 or k = 0 .

999 of thevariation in the genetic data. For JAM, we implemented a pre-pruning step with a correlationthreshold of ρ = 0 . , . , . , .

95 and no signiﬁcance threshold - note that JAM does not needa signiﬁcance threshold for the pre-pruning step because its variable selection discards variantsweakly associated with the risk factor anyway. The algorithm was run for 1 million iterationsin each instance. Finally for the F-LIML method, we allowed the algorithm to determine thenumber of latent factors to use.The simulations were implemented in the statistical software R . Stepwise pruning and correlated-instruments IVW were implemented manually. To implement the PCA approach, we used codeprovided in the appendix of Burgess et al. (2017). The JAM algorithm was implemented usingthe R package R2BGLiMS . The F-LIML algorithm and the summary-data CLR test were im-plemented using code provided to us by the authors of the relevant publication (Patel et al.,2020). Stepwise pruning was the fastest algorithm to implement and the JAM algorithm wasthe slowest, although the diﬀerences were small on an absolute scale and none of the methodsrequired more than a few seconds to run for each dataset. he various MR methods in our simulations were subject to two sources of bias. The ﬁrstsource is weak instrument bias, which is known to act towards the direction of the observationalassociation in one-sample MR analyses and towards the null in two-sample MR (Burgess et al.,2011, 2016). As discussed earlier, weak instrument bias should be a concern in our secondsimulation scenario, where the F statistic is below 10. Note however that even in the ﬁrstscenario, weak instrument bias may still aﬀect methods using many nuisance variants to obtaina causal eﬀect estimate, as using such variants reduces the value of the F statistic.Second, the performance of various methods in our simulations can be aﬀected by numericalissues, in particular related to inverting the genetic covariance matrix G T G . Inaccurate estima-tion of ( G T G ) − may lead to spurious increases or decreases in genetic correlations. This formof bias can be detected by computing the condition number of the genetic covariance matrix,but its direction is not straightforward to assess. It is more likely to aﬀect methods using highlycorrelated variants, such as stepwise pruning with high correlation thresholds. Results

Simulation results are reported in Tables 1-2. For each method and each value of the causaleﬀect parameter, we report median causal eﬀect estimates and estimated standard errors. Forsimulations with θ = 0 we also report the empirical Type I error rates, deﬁned as the proportionof simulations in which the method rejected the null causal hypothesis. For simulations with θ (cid:54) = 0 we also report the empirical coverage of conﬁdence intervals and the empirical power toreject the null causal hypothesis. All methods had very high power in the “strong instruments”scenario with θ = 0 .

1, and very low power in the “weak instruments” scenario with θ = 0 . ρ values, exhibiting bias towards the null. Thebias was more pronounced for θ = 0 . θ = 0 . θ = 0 suggests that any issues in their performanceare due to weak instrument bias, which acts towards the null and hence would only aﬀectsimulations with θ (cid:54) = 0. Accordingly, any biases observed for θ (cid:54) = 0 were towards the null.For LD-pruning, large values of ρ make the inclusion of weak instruments more likely andthe numerical computation and inversion of the correlation matrix more challenging. For smallvalues of ρ the method is more robust to the inclusion of weak instruments. On the otherhand, when the genetic region studied contains multiple causal signals, using a small correlationthreshold may discard some of the causal variants from the analysis. In our simulations, thistranslated into a fairly small increase in causal standard errors and a decrease in the method’spower to detect a causal eﬀect. We also note that in the simulations of Table 1, LD-pruningwas implemented after discarding genetic variants whose p-values did not reach genome-widesigniﬁcance. This is a “best-case scenario” for the method in terms of avoiding weak instrumentbias; in practice, pruning is often implemented using less stringent thresholds (Dudbridge, 2013),and the eﬀects of weak instrument bias can be more severe.As expected, an MR analysis using only the variant with the smallest p-value in the regionwas unbiased, but had larger standard errors and lower power than other methods. This was = 0 θ = 0 . θ = 0 . θ se (ˆ θ ) Type I ˆ θ se (ˆ θ ) Cov Power ˆ θ se (ˆ θ ) Cov SHBG

RegionTop SNP —– 0.000 0.019 0.051 0.050 0.019 0.956 0.731 0.096 0.020 0.927Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 -0.001 0.015 0.063 0.049 0.015 0.954 0.914 0.099 0.016 0.938 k = 0 .

999 0.000 0.014 0.050 0.048 0.014 0.941 0.933 0.096 0.015 0.935JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.014 0.045 0.047 0.014 0.956 0.933 0.094 0.015 0.933F-LIML —– 0.000 0.014 0.066 0.051 0.015 0.942 0.936 0.100 0.016 0.953CLR —– —– —– 0.054 —– —– 0.947 0.926 —– —– 0.958

HMGCR

RegionTop SNP —– 0.001 0.018 0.048 0.049 0.019 0.953 0.776 0.099 0.019 0.924Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.017 0.049 0.049 0.017 0.949 0.830 0.099 0.018 0.930 k = 0 .

999 0.000 0.016 0.052 0.048 0.017 0.953 0.826 0.098 0.017 0.934JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.001 0.017 0.035 0.047 0.018 0.957 0.777 0.095 0.018 0.939F-LIML —– 0.000 0.017 0.064 0.050 0.017 0.945 0.850 0.101 0.019 0.946CLR —– —– —– 0.055 —– —– 0.947 0.834 —– —– 0.945

Table 1: Simulation results for the “strong instruments” scenario.14 ore pronounced for the

SHBG region and less so for the

HMGCR region, since genetic corre-lation were stronger in the

HMGCR region and using only a single variant could partly accountfor the eﬀects of other variants through correlation. We also note that in our simulations, allcausal variants had G − X eﬀects in the same direction and of similar magnitude, and there wereno heterogeneous eﬀects towards the outcome. This is a best-case scenario for single-variantanalysis; diﬀerences between it and other methods are likely to be larger in practice.The performance of the PCA method was similar for k = 99% and k = 99 .

9% and was quiteaccurate. With a causal eﬀect of θ = 0 .

1, the algorithm exhibited a small reduction in coveragedue to weak instrument bias but still performed better than LD-pruning. In general, the eﬀectof weak instruments bias on the algorithm is more pronounced for larger values of the tuningparameter k . Here, the standard values of k = 99% and k = 99 .

9% worked reasonably well in allsimulation scenarios. The power to reject the null causal hypothesis was greater for k = 99 . SHBG region.The JAM algorithm exhibited similar performance to PCA, with a small attenuation ofcausal eﬀect estimates for θ = 0 .

1. JAM requires a correlation threshold to be speciﬁed forthe pruning step before running Bayesian variable selection, but the algorithm’s performancewas quite robust to the value of that tuning parameter, certainly more so than that of LD-pruning. Its empirical power was slightly higher than PCA for the

SHBG region, but lower forhe

HMGCR region.The F-LIML method does not depend on a tuning parameter, as it can automatically deter-mine the number of latent factors to use. Compared to JAM and PCA, the algorithm yieldedslightly more accurate causal eﬀect estimates and slightly better calibrated conﬁdence intervalsfor θ = 0 . θ = 0. The latter issue was addressedby the conditional likelihood ratio test, at the expense of no causal eﬀect estimates and a slightlylower power than F-LIML.Table 2 reports simulation results from the “weak instruments” scenario. Weak instrumentsbias had a much higher impact in these simulations, with several methods facing attenuationof their causal eﬀect estimates. As expected, any bias occurred only for θ (cid:54) = 0 and was towardsthe null, while all methods also had increased standard errors and low power.Top-SNP analysis, LD-pruning, PCA and JAM all suﬀered from weak instrument bias inthis scenario. The magnitude of bias was similar for these methods. For pruning and PCA,the bias caused poor coverage properties for conﬁdence intervals and Type I error rates abovenominal levels. JAM only selected a small number of genetic variants (it selected an averageof 1 . = 0 θ = 0 . θ = 0 . θ se (ˆ θ ) Type I ˆ θ se (ˆ θ ) Cov ˆ θ se (ˆ θ ) Cov Power SHBG

RegionTop SNP —– -0.002 0.051 0.045 0.033 0.052 0.926 0.072 0.054 0.891 0.259Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.040 0.050 0.035 0.041 0.924 0.068 0.042 0.876 0.392 k = 0 .

999 -0.002 0.033 0.035 0.027 0.034 0.902 0.054 0.036 0.736 0.363JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 -0.002 0.070 0.005 0.035 0.071 0.984 0.071 0.076 0.968 0.096F-LIML —– -0.003 0.037 0.183 0.051 0.039 0.782 0.101 0.041 0.802 0.667CLR —– —– —– 0.041 —– —– 0.939 —– —– 0.948 0.404

HMGCR

RegionTop SNP —– -0.001 0.051 0.055 0.037 0.054 0.939 0.077 0.054 0.916 0.309Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.003 0.047 0.054 0.040 0.049 0.946 0.080 0.051 0.919 0.359 k = 0 .

999 0.001 0.043 0.054 0.033 0.045 0.936 0.067 0.046 0.854 0.303JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.064 0.011 0.040 0.070 0.985 0.079 0.071 0.975 0.131F-LIML —– 0.002 0.047 0.165 0.051 0.050 0.859 0.105 0.054 0.862 0.524CLR —– —– —– 0.049 —– —– 0.955 —– —– 0.952 0.331

Table 2: Simulation results for the “weak instruments” scenario.16 ariant MR analysis can be reliable in simulations where weak instrument bias is of lesserconcern, but should be used with caution in cis-MR analyses where weak instrument bias issuspected. Even when LD-pruning is used, MR causal eﬀect estimates should be computed andreported for a range of diﬀerent correlation thresholds as a form of sensitivity analysis. BetweenJAM, PCA and F-LIML, the diﬀerences were small in the “strong instruments” simulation.With weak instruments, F-LIML provided accurate causal eﬀect estimates but poor uncertaintyquantiﬁcation, JAM yielded biased causal eﬀect estimates, reasonable conﬁdence intervals butlow power, the CLR test oﬀered nominal conﬁdence intervals and decent power at the expenseof no point estimates and PCA was the method most aﬀected by weak instruments bias. Atheoretical advantage of JAM compared to the other approaches is that JAM’s variable selectiongives an indication of which genetic variants are more likely to be causally associated with therisk factor, although it is not clear whether this additional information would be useful in anMR study where the objective is to perform inference about the X − Y causal eﬀect. Finally,it may be possible to improve the performance of some of the methods by removing very weakvariants (e.g. with p > .

05) from the analysis prior to implementing the methods, but we havenot considered that here.

Additional Simulations

In supplementary material, we report results from a range of additional simulation scenarios.Speciﬁcally, we conducted three additional simulations, on which we comment here brieﬂy.First, we used a larger sample size of N = 100000 from which to obtain risk factor-outcome summary statistics. This could be reminiscent of a cis-MR analysis using a down-stream biomarker as a proxy for protein expression, and obtaining G − X summary data froma large-scale GWAS. Our results (Supplementary Table 1) suggested that the various methodsperform similarly in this case and issues such as weak instrument bias and robustness to thechoice of the pruning threshold are less concerning with large sample sizes.In the second simulation, we experimented with diﬀerent numbers of causal variants. Weused the SHBG region, simulated under the “strong instruments” scenario, and assumed thatthe region contains either one or three genetic variants causally associated with the risk factor.The performance of the various methods (Supplementary Table 2) was similar to that observedin our baseline simulations with six causal variants, suggesting that the number of causal signalsin the region has little impact on their performance relative to each other, except for the single-variant approach which did expectedly better in simulations with only one causal variant.In our third supplementary simulation, we assessed the impact of only having access to asmall reference dataset on the various methods. In each iteration, we bootstrapped the UKBiobank data in order to obtain a small reference sample of size N ref = 1000 and implementedcis-MR using that reference dataset (Supplementary Table 3). The JAM algorithm was the mostsensitive method to this change. The algorithm selected more variants on average, exhibitedslight diﬀerences in its performance for diﬀerent values of the correlation threshold and per-formed poorly for ρ ≥ .

9. In such datasets, we recommend using a more stringent correlationthreshold for the algorithm. The other methods were less aﬀected in comparison.We refer to the supplement for more details on these simulations. pplications Causal Eﬀect of LDL-cholesterol on CHD Risk Using Variants inthe

HMGCR

Region

Introduction

We now compare the various cis-MR methods in two real-data applications. In the ﬁrst ap-plication, we use Mendelian randomization to investigate the relationship between low densitylipoprotein (LDL) cholesterol concentration and the risk of coronary heart disease (CHD) usinggenetic variants from the

HMGCR region. The

HMGCR region is located in chromosome 5and encodes the protein HMG coenzyme-A reductase. The protein plays an important rolein the metabolic pathway that produces cholesterol, and its inhibition by statins is a commontreatment to reduce LDL-cholesterol levels.This analysis is performed mainly for illustrative purposes: the causal connection betweenLDL-cholesterol and CHD risk has already been explored in several papers in the literature.Genome-wide MR studies of the eﬀect of LDL-cholesterol on CHD risk have been performedby Linsel-Nitschke et al. (2008); Waterworth et al. (2010); Burgess et al. (2014); Holmes et al.(2014); Allara et al. (2019) among others. In the ﬁeld of cis-MR, Swerdlow et al. (2015) consid-ered associations between

HMGCR variants and a range of biomarkers in order to investigatewhether statin treatment increases the risk of type 2 diabetes. Ference et al. (2016) comparedthe eﬀects on type 2 diabetes for variants in the

HMGCR and

PCSK9 regions, while Ferenceet al. (2019) compared the LDL-CHD causal eﬀects obtained using variants in the

HMGCR and

ACLY regions in order to explore the suitability of

ACLY as a drug target. The

HMGCR region has also been used as an applied example in methodological papers (Schmidt et al., 2019).

Datasets and Methods

Since a clear link between HMGCR protein expression and lowering LDL-cholesterol has been es-tablished, we used LDL-cholesterol as a risk factor instead of protein expression. We estimatedgenetic associations between variants in the

HMGCR region and LDL-cholesterol using datafrom the UK Biobank. Genetic associations were computed based on a sample of N = 349795unrelated individuals of European ancestry. Associations with coronary heart disease risk wereobtained from the CARDIoGRAMplusC4D consortium (CARDIoGRAMplusC4D Consortium,2015), based on a sample of N = 184305 individuals. Finally, genetic correlations were com-puted using individual-level data for 367643 unrelated Europeans from the UK Biobank (thesample size for genetic associations with LDL-cholesterol was slightly smaller than the referencesample due to missing values).We used a wider region than in our simulations and extracted information on variants within500Kb pairs from the HMGCR gene. In total, 2915 variants were present in both the UKBiobank and the CARDIoGRAMplusC4D dataset. Of those 2915 variants, 20 were discardedbecause they either had missing associations with LDL-cholesterol in the UK Biobank datasetor concerned multiple alleles on the same locus that were not detected in both datasets. Wedid not discard variants with low eﬀect allele frequencies. Our analysis was therefore based on P = 2895 genetic variants. In Figure 2 we visualize the genetic correlations in the HMGCR region, and present a Manhattan plot of genetic association with LDL-cholesterol levels.We then conducted cis-MR using the minimum p-value variant, LD-pruning, principal com-ponents analysis, the JAM algorithm and the factor-based methods. The various methods wereimplemented using a range of diﬀerent parameter speciﬁcations, similar to those used for our

HMGCR region. Right: Manhattan plot of p-values for G − X associations. simulation study. For stepwise pruning, we used the values ρ = 0 . , . , . , . , . τ = 5 × − . Of the 2895 variantsin the region, 1424 had p-values below the GWAS threshold. For the PCA method, we usedprincipal components that explained k = 99% or k = 99 .

9% of variation in the genetic data. ForJAM, we implemented a pre-pruning step with a correlation threshold of ρ = 0 . , . , . , . Results

Results of our analysis are presented in Table 3. We report causal eﬀect estimates, standarderrors and 95% conﬁdence intervals obtained by each method. The reported estimates representlog-odds ratios of increase in CHD risk per standard deviation increase in LDL-cholesterol levels.Genetically elevated LDL-cholesterol concentration based on variants in the

HMGCR regionis known to be associated with increased risk of coronary heart disease Cholesterol TreatmentTrialists (CTT) Collaboration (2010); Ference et al. (2019). This was conﬁrmed by the prun-ing, JAM, F-LIML and CLR methods. For example, the JAM algorithm using a pre-pruningthreshold of ρ = 0 . .

355 (odds ratio 1 . . , . .

423 for k = 99% and 0 .

448 for k = 99 .

9% and the null causal hypothesis was rejected on bothoccasions. ethod ˆ θ se (ˆ θ ) 95% CITop-SNP —– 0.555 0.167 (0.228 , 0.882)Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 -0.030 0.026 (-0.081 , 0.020) k = 0 .

999 0.000 0.022 (-0.043 , 0.042)JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.358 0.117 (0.129 , 0.586)F-LIML —– 0.501 0.165 (0.178 , 0.824)CLR —– —– —– (0.169 , 0.839)

Table 3: Cis-MR analysis of the eﬀect of LDL-cholesterol on CHD risk, using genetic variants in the

HMGCR region.

The remaining methods were more consistent in their results. LD-pruning exhibited theusual attenuation towards the null for ρ = 0 . ρ = 0 .

9, as a result of numerical errors. Bias due to numerical errors is more likely to be aconcern here than in our simulation study, due to the larger number of variants used. For theJAM algorithm, there were small diﬀerences between implementations for ρ = 0 . − Causal Eﬀect of Testosterone on CHD Risk Using Variants inthe

SHBG

Region

Introduction

In our second application, we apply Mendelian randomization using variants in the

SHBG generegion in order to assess the causal relationship between testosterone levels and coronary heartdisease risk. The

SHBG region is located in chromosome 17 and encodes sex-hormone bindingglobulin, a protein that inhibits the function of sex hormones such as testosterone and estradiol.The region has been shown to contain strong genetic associations with testosterone levels, aswell as a number of sex hormone traits (Jin et al., 2012; Coviello et al., 2012; Schooling et al.,2018; Ruth et al., 2020). In addition, previous research has suggested that the region is likelyto contain several genetic variants independently associated with testosterone; Coviello et al.(2012) claimed that as many as nine independent signals may be present. This implies that anaive approach using only the variant with the smallest p-value would underestimate the geneticcontributions to testosterone levels. The causal relationship between sex hormone traits andCHD risk using variants in the

SHBG region has been studied by Burgess et al. (2017) and chooling et al. (2018) with evidence mostly suggesting no causal relationship. Here, we aimedto replicate the analysis of Burgess et al. (2017) using larger datasets. Datasets and Methods

We used serum testosterone levels as the exposure for our MR analysis. We obtained geneticassociations with testosterone using summary-level data from the Neale Lab website . Thesesummary data were derived from the UK Biobank, using a sample of N = 312102 unrelatedindividuals of European descent. We deﬁned the genetic region to include genetic variantswithin 500Kb pairs on either side of the SHBG gene.As in the previous application, we obtained genetic associations with CHD risk from theCARDIoGRAMplusC4D dataset (CARDIoGRAMplusC4D Consortium, 2015), using a sampleof N = 184305 individuals. A total of 3053 genetic variants were present in both datasets andwere therefore included in our analysis. We did not conduct separate analyses on males andfemales, since we did not have access to sex-speciﬁc associations with CHD risk. Finally, weused the UK Biobank as a reference dataset from which to extract LD correlations betweengenetic variants. Figure 3 presents a Manhattan plot of associations with testosterone, as wellas the genetic correlations in the region. Figure 3: Left: genetic correlations in the

SHBG region. Right: Manhattan plot of p-values for G − X associations. We then implemented the various cis-MR methods in order to select genetic instruments andassess whether

SHBG variants are causally associated with CHD risk. We used the same settingas in the

HMGCR application for the tuning parameters of each method. The stepwise pruningmethod was implemented using only variants with GWAS-signiﬁcant associations with serumtestosterone levels. A total of 1156 variants had p-values lower than the 5 × − threshold.The remaining methods were implemented using all 3053 variants in the region. https://docs.google.com/spreadsheets/d/1kvPoupSzsSFBNSztMzl04xMoSC3Kcx3CrjVf4yBmESU/edit?ts=5b5f17db esults Method ˆ θ se (ˆ θ ) 95% CITop-SNP —– -0.071 0.036 (-0.141 , -0.001)Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 -0.052 0.029 (-0.109 , 0.004) k = 0 .

999 -0.018 0.024 (-0.065 , 0.028)JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 -0.035 0.024 (-0.081 , 0.012)F-LIML —– -0.037 0.025 (-0.087 , 0.013)CLR —– —– —– (-0.087 , 0.014)

Table 4: Mendelian randomization analysis of the eﬀect of serum testosterone levels on CHD risk,using genetic variants in the

SHBG region.

Table 4 reports causal eﬀect estimates, standard errors and 95% conﬁdence intervals foreach method. Once again, estimates are reported in the log-odds ratio scale and representincreases in CHD risk per standard deviation increase in serum testosterone levels. All methodswith the exception of top-SNP and LD-pruning at 0 . . SHBG region, although point estimates wereconsistently in the risk-decreasing direction.In this example, the performance of the various methods was less seriously aﬀected by weakinstruments bias because there was apparently no causal relationship between serum testosteronelevels and CHD risk. This is consistent with our simulation design, where no bias was observedunder the “null causal eﬀect” scenario.A notable inconsistency was that an MR analysis using only the genetic variant with thesmallest p-value in the region suggested a risk-decreasing, statistically signiﬁcant causal eﬀect.This may be suggestive of heterogeneity in evidence from diﬀerent variants within the

SHBG region. Therefore, this example empirically demonstrates the pitfalls of using simplistic singlevariant analyses when in fact multiple signals exist within a region. The top variant in ouranalysis was rs1799941, which is known to be associated with testosterone levels (Ruth et al.,2020).The LD-pruning method was rather inconclusive in this application. Implementations witha low correlation threshold suggested no causal eﬀect. However, the method suggested an eﬀectin the risk-decreasing direction for ρ = 0 . ρ = 0 . − ignals in the region, conﬁrming claims by Coviello et al. (2012). The F-LIML estimate and theCLR conﬁdence interval were in line with the results obtained by JAM.Our results obtained in this application are similar to those reported in Burgess et al. (2017).Similar to the results reported in Table 4, pruning estimates in Burgess et al. (2017) suggestedno causal eﬀect for small correlation thresholds bt were unstable for large ρ (in fact, they weremore unstable than in our results, possibly due to the smaller sample sizes used in that paper).Estimates from the PCA method were in the risk-decreasing direction but did not achievestatistical signiﬁcance. Overall, both Burgess et al. (2017) and our analysis suggested no causalassociation between testosterone and CHD risk based on variants in the SHBG region; furtherevidence of no causality was provided by Schooling et al. (2018), which also used data from theUK Biobank.

Discussion

Cis-Mendelian randomization is emerging as a widely applicable approach to support drug de-velopment and inform clinical trials. In this manuscript we have outlined and compared a rangeof methods from the cis-MR literature to select variants from a genetic region of interest andestimate the MR causal eﬀect. In particular, we have considered MR using only the minimump-value variant, and also LD-pruning, which is one of the most commonly used approaches inpractice, principal components analysis, factor-based methods and stochastic-search variable se-lection via the JAM algorithm. We compared the various methods across a range of simulationscenarios, which were based on real genetic correlation structures of two diﬀerent gene regionsdrawn from the UK Biobank. We also assessed the performance of the diﬀerent methods intwo cis-MR case studies, investigating the eﬀect of LDL-cholesterol on CHD risk using variantsin the

HMGCR region and the eﬀect of testosterone on CHD risk using variants in the

SHBG region.We have found that LD-pruning can be reliable when using large sample sizes and stronggenetic instruments, but can be susceptible to weak instrument bias. Moreover, results obtainedusing LD-pruning can be sensitive to the correlation threshold used, and using a high correlationthreshold can result in numerical instabilities. Based on these results, we would thereforerecommend that when the pruning method is used, it is implemented using a range of correlationthresholds as a form of sensitivity analysis.Although not completely free from weak instrument bias, methods such as PCA, F-LIML,the JAM algorithm and the conditional likelihood ratio test were less aﬀected by such bias thanLD-pruning. JAM was also more robust to the choice of its tuning parameter than pruning,while PCA and F-LIML achieved good performance in a range of diﬀerent scenarios with thedefault values for their tuning parameters. Therefore, we recommend that some of these methodsare implemented as sensitivity tools in applied cis-MR analyses.Compared to each other, JAM, PCA, F-LIML and CLR exhibited similar performance insimulations with strong genetic instruments. With weak instruments, the CLR test was leastaﬀected by weak instruments bias and provided a valid test for the causal null hypothesis, albeitno point estimates. F-LIML obtained accurate causal eﬀect estimates but yielded conﬁdence in-tervals with poor coverage, while JAM had biased causal eﬀect estimates but better uncertaintyquantiﬁcation. The principal components IVW approach was aﬀected by weak instruments biasto a greater extent than the other methods, although our real-data applications suggest thatPCA can still perform well if very weak variants are removed from the analysis prior to itsimplementation. he use of multiple methods as a form of sensitivity analysis can increase the reliabilityof results of a cis-MR study. This can be further reinforced by triangulating evidence fromdiﬀerent study designs (Gill et al., 2020). This includes colocalization, an approach which hasbeen shown to yield beneﬁts when used in conjunction with MR to prioritize proteins for drugdevelopment (Zheng et al., 2019). Here, we have only focused on Mendelian randomization andnot considered alternative approaches.Our analysis has a number of limitations. Our simulations were by no means exhaustive;for example, they were based on only two genetic regions. In addition, our simulations havenot considered other forms of bias that may arise in cis-MR applications, such as selectionbias, population stratiﬁcation or pleiotropy. Bias due to pleiotropy is worth discussing in moredetail, as it is one of the main concerns in many MR studies. Pleiotropy is less of a concernin cis-MR compared to traditional MR studies (Schmidt et al., 2019), but would arise if dif-ferent variants from the same region were involved in diﬀerent biological pathways relevant tothe outcome, resulting in heterogeneous causal eﬀect estimates. When pleiotropy does occur,standard algorithms for pleiotropy-robust MR (including the JAM algorithm) can be used toaddress it. However, care must be taken when extrapolating the assumptions made by suchalgorithms to the framework of many correlated genetic variants. For example, suppose that aregion contains a single pleiotropic variant, but that variant is strongly correlated with half ofthe other variants in the region. This will induce bias in the marginal causal eﬀect estimates ofthe correlated variants, and hence pleiotropy-robust methods requiring more than 50% of thegenetic variants in the region to be valid instruments will be biased.Cis-Mendelian randomization is an active area of research, both methodological and applied.As the ﬁeld develops further, it is likely that new statistical approaches will be created toaid applied investigators in their work. By providing an overview and evaluation of existingmethods, we hope that our current paper will contribute to that direction. Acknowledgements

We would like to thank Ashish Patel for many helpful discussions and comments. This workwas supported by the UK Medical Research Council (Core Medical Research Council Biostatis-tics Unit Funding Code: MC UU 00002/7). Apostolos Gkatzionis and Paul Newcombe weresupported by a Medical Research Council Methodology Research Panel grant (Grant NumberRG88311). Stephen Burgess was supported by a Sir Henry Dale Fellowship jointly funded bythe Wellcome Trust and the Royal Society (Grant Number 204623/Z/16/Z). This research hasbeen conducted using the UK Biobank Resource under Application Number 30931.

References

Allara, E., G. Morani, P. Carter, A. Gkatzionis, V. Zuber, C. N. Foley, J. M. Rees, A. M. Mason,S. Bell, D. Gill, S. Lindstrom, A. S. Butterworth, E. Di Angelantonio, J. Peters, S. Burgess,and INVENT consortium (2019). Genetic determinants of lipids and cardiovascular diseaseoutcomes: A wide-angled Mendelian randomization investigation.

Circulation. Genomic andprecision medicine 12 (12), e002711.Anderson, T. W. and H. Rubin (”1949, 3). Estimation of the parameters of a single equation ina complete system of stochastic equations.

Annals of Mathematical Statistics 20 (1), 46–63. simit, J. L., D. B. Rainbow, M. D. Fortune, N. F. Grinberg, L. S. Wicker, and C. Wallace(2019). Stochastic search and joint ﬁne-mapping increases accuracy and identiﬁes previouslyunreported associations in immune-mediated diseases. Nature Communications 10 .Bai, J. and S. Ng (2002). Determining the number of factors in approximate factor models.

Econometrica 70 (1), 191–221.Benner, C., C. C. A. Spencer, A. S. Havulinna, V. Salomaa, S. Ripatti, and M. Pirinen (2016).FINEMAP: eﬃcient variable selection using summary data from genome-wide associationstudies.

Bioinformatics 32 (10), 1493–1501.Boef, A. G. C., O. M. Dekkers, and S. le Cessie (2015). Mendelian randomization studies: areview of the approaches used and the quality of reporting.

International Journal of Epi-demiology 44 (2), 496–511.Burgess, S., N. M. Davies, and S. G. Thompson (2016). Bias due to participant overlap intwo-sample Mendelian randomization.

Genetic Epidemiology 40 (7), 597–608.Burgess, S., F. Dudbridge, and S. G. Thompson (2016). Combining information on multiple in-strumental variables in Mendelian randomization: comparison of allele score and summarizeddata methods.

Statistics in Medicine 35 (11), 1880–1906.Burgess, S., D. F. Freitag, H. Khan, D. N. Gorman, and S. G. Thompson (2014, 10). Usingmultivariable Mendelian randomization to disentangle the causal eﬀects of lipid fractions.

PLOS ONE 9 (10), 1–8.Burgess, S. and S. G. Thompson (2015).

Mendelian randomization: methods for using geneticvariants in causal estimation . Chapman & Hall, Boca Raton, FL.Burgess, S., S. G. Thompson, and CRP CHD Genetics Collaboration (2011, 03). Avoidingbias from weak instruments in Mendelian randomization studies.

International Journal ofEpidemiology 40 (3), 755–764.Burgess, S., V. Zuber, E. Valdes-Marquez, B. B. Sun, and J. C. Hopewell (2017). Mendelianrandomization with ﬁne-mapped genetic data: Choosing from large numbers of correlatedinstrumental variables.

Genetic Epidemiology 41 (4), 714–725.CARDIoGRAMplusC4D Consortium (2015). A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease.

Nature Genetics 47 (10), 1121–1130.CCGC (CRP CHD Genetics Collaboration) (2011). Association between C reactive protein andcoronary heart disease: Mendelian randomisation analysis based on individual participantdata.

British Medical Journal 342 .Cholesterol Treatment Trialists (CTT) Collaboration (2010). Eﬃcacy and safety of more in-tensive lowering of LDL cholesterol: a meta-analysis of data from 170 000 participants in 26randomised trials.

The Lancet 376 (9753), 1670–1681.Coviello, A. D., R. Haring, M. Wellons, D. Vaidya, T. Lehtim¨aki, S. Keildson, and Other Authors(2012). A genome-wide association meta-analysis of circulating sex hormone–binding globulinreveals multiple loci implicated in sex steroid hormone regulation.

PLOS Genetics 8 (7), 1–12. avey Smith, G. and S. Ebrahim (2003). ‘Mendelian randomization’: can genetic epidemiologycontribute to understanding environmental determinants of disease? International Journalof Epidemiology 32 (1), 1 – 22.Dudbridge, F. (2013, 03). Power and predictive accuracy of polygenic risk scores.

PLOSGenetics 9 (3), 1–17.Dudbridge, F. and P. J. Newcombe (2015). Accuracy of gene scores when pruning markers bylinkage disequilibrium.

Human Heredity 80 (4), 178 – 186.Ference, B. A., K. K. Ray, A. L. Catapano, T. B. Ference, S. Burgess, D. R. Neﬀ, C. Oliver-Williams, A. M. Wood, A. S. Butterworth, E. Di Angelantonio, J. Danesh, J. J. P. Kastelein,and S. J. Nicholls (2019). Mendelian Randomization Study of ACLY and CardiovascularDisease.

New England Journal of Medicine 380 (11), 1033–1042.Ference, B. A., J. G. Robinson, R. D. Brook, A. L. Catapano, M. J. Chapman, D. R. Neﬀ,S. Voros, R. P. Giugliano, G. Davey Smith, S. Fazio, and M. S. Sabatine (2016). Variation inPCSK9 and HMGCR and risk of cardiovascular disease and diabetes.

New England Journalof Medicine 375 (22), 2144–2153.Gill, D., M. K. Georgakis, V. M. Walker, A. F. Schmidt, A. Gkatzionis, D. F. Freitag, C. Finan,A. D. Hingorani, J. C. M. Howson, S. Burgess, D. I. Swerdlos, M. Dichgans, B. M. Psaty,G. Davey Smith, M. V. Holmes, R. A. Scott, J. Zheng, and N. M. Davies (2020). Mendelianrandomization for studying the eﬀects of perturbing drug targets: a practical guide. Inpreparation.Gkatzionis, A., S. Burgess, D. V. Conti, and P. J. Newcombe (2020). Bayesian variable selectionwith a pleiotropic loss function in Mendelian randomization. bioRxiv .Hay, M., D. W. Thomas, J. L. Craighead, C. Economides, and J. Rosenthal (2014). Clinicaldevelopment success rates for investigational drugs.

Nature Biotechnology 32 , 40–51.Hocking, R. R. (1976). A biometrics invited paper. the analysis and selection of variables inlinear regression.

Biometrics 32 (1), 1–49.Holmes, M. V., F. W. Asselbergs, T. M. Palmer, F. Drenos, M. B. Lanktree, C. P. Nelson,and Other Authors (2014, 01). Mendelian randomization of blood lipids for coronary heartdisease.

European Heart Journal 36 (9), 539–550.Jiang, L., S. Xu, N. Mancuso, P. J. Newcombe, and D. V. Conti (2020). A hierarchical approachusing marginal summary statistics for multiple intermediates in a Mendelian randomizationor transcriptome analysis. bioRxiv .Jin, G., J. Sun, S.-T. Kim, J. Feng, Z. Wang, S. Tao, Z. Chen, L. Purcell, S. Smith, W. B. Isaacs,R. S. Rittmaster, S. L. Zheng, L. D. Condreay, and J. Xu (2012). Genome-wide associationstudy identiﬁes a new locus JMJD1C at 10q21 that may inﬂuence serum androgen levels inmen.

Human Molecular Genetics 21 (23), 5222–5228.Kamstrup, P. R., A. Tybjærg-Hansen, R. Steﬀensen, and B. G. Nordestgaard (2009). Geneticallyelevated lipoprotein(a) and increased risk of myocardial infarction.

JAMA 301 (22), 2331–2339. leibergen, F. (2002). Pivotal statistics for testing structural parameters in instrumental vari-ables regression. Econometrica 70 (5), 1781–1803.Krauss, R. M., L. M. Mangravite, J. D. Smith, M. W. Medina, D. Wang, X. Guo, M. J. Rieder,J. A. Simon, S. B. Hulley, D. Waters, M. Saad, P. T. Williams, K. D. Taylor, H. Yang, D. A.Nickerson, and J. I. Rotter (2008). Variation in the 3-hydroxyl-3-methylglutaryl coenzymeA reductase gene is associated with racial diﬀerences in low-density lipoprotein cholesterolresponse to simvastatin treatment.

Circulation 117 (12), 1537–1544.Linsel-Nitschke, P., A. G¨otz, J. Erdmann, I. Braenne, P. Braund, C. Hengstenberg, and OtherAuthors (2008, 08). Lifelong reduction of LDL-cholesterol related to a common variant in theldl-receptor gene decreases the risk of coronary artery disease—a Mendelian randomisationstudy.

PLOS ONE 3 (8), 1–9.Miller, A. J. (2002).

Subset Selection in Regression . Monographs on Statistics and AppliedProbability. London: Chapman and Hall.Moreira, M. J. (2003). A conditional likelihood ratio test for structural models.

Economet-rica 71 (4), 1027–1048.Nelson, M. R., H. Tipney, J. L. Painter, J. Shen, P. Nicoletti, Y. Shen, A. Floratos, P. C. Sham,M. J. Li, J. Wang, L. R. Cardon, J. C. Whittaker, and P. Sanseau (2015). The support ofhuman genetic evidence for approved drug indications.

Nature Genetics 47 (8), 856–860.Newcombe, P. J., D. V. Conti, and S. Richardson (2016). JAM: A scalable Bayesian frameworkfor joint analysis of marginal SNP eﬀects.

Genetic Epidemiology 40 (3), 188 – 201.Patel, A., S. Burgess, D. Gill, and P. J. Newcombe (2020). Inference with many correlated weakinstruments and summary statistics. arXiv:2005.01765.Ray, K. K., H. E. Bays, A. L. Catapano, N. D. Lalwani, L. T. Bloedon, L. R. Sterling, P. L.Robinson, and C. M. Ballantyne (2019). Safety and eﬃcacy of bempedoic acid to reduce LDLcholesterol.

New England Journal of Medicine 380 (11), 1022–1032.Ruth, K. S., F. R. Day, D. J. T. Jessica Tyrrel and, A. R. Wood, A. Mahajan, R. N. Beaumont,L. Wittemans, S. Martin, A. S. Busch, A. M. Erzurumluoglu, B. Hollis, T. A. O’Mara, TheEndometrial Cancer Association Consortium, M. I. McCarthy, C. Langenberg, D. F. Easton,N. J. Wareham, S. Burgess, A. Murray, K. K. Ong, T. M. Frayling, and J. R. B. Perry (2020).Using human genetics to understand the disease impacts of testosterone in men and women.

Nature Medicine 26 (2), 252–258.Schmidt, A. F., C. Finan, M. Gordillo-Mara˜n´on, F. W. Asselbergs, D. F. Freitag, R. S. Patel,B. Tyl, S. Chopade, R. Faraway, M. Zwierzyna, and A. D. Hingorani (2019). Genetic drugtarget validation using Mendelian randomization. bioRxiv .Schmidt, A. F., D. I. Swerdlow, M. V. Holmes, R. S. Patel, Z. Fairhurst-Hunter, D. M. Lyall,and Other Authors (2017). PCSK9 genetic variants and risk of type 2 diabetes: a Mendelianrandomisation study.

The Lancet Diabetes and Endocrinology 5 (2), 97–105. chooling, C. M., S. Luo, S. L. A. Yeung, D. J. Thompson, S. Karthikeyan, T. R. Bolton,A. M. Mason, E. Ingelsson, and S. Burgess (2018). Genetic predictors of testosterone andtheir associations with cardiovascular disease and risk factors: A Mendelian randomizationinvestigation. International Journal of Cardiology 267 , 171 – 176.Sofat, R., A. D. Hingorani, L. Smeeth, S. E. Humphries, P. J. Talmud, J. Cooper, and OtherAuthors (2010). Separating the mechanism-based and oﬀ-target actions of cholesteryl estertransfer protein inhibitors with cetp gene polymorphisms.

Circulation 121 (1), 52–62.Swerdlow, D. I., K. B. Kuchenbaecker, S. Shah, R. Sofat, M. V. Holmes, J. White, J. S. Mindell,M. Kivimaki, E. J. Brunner, J. C. Whittaker, J. P. Casas, and A. D. Hingorani (2016, 06).Selecting instruments for Mendelian randomization in the wake of genome-wide associationstudies.

International Journal of Epidemiology 45 (5), 1600–1616.Swerdlow, D. I., D. Preiss, K. B. Kuchenbaecker, M. V. Holmes, J. E. L. Engmann, and OtherAuthors (2015). HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight:evidence from genetic analysis and randomised trials.

The Lancet 385 (9965), 351–361.The Interleukin-6 Receptor Mendelian Randomisation Analysis (IL6R MR) Consortium (2012).The interleukin-6 receptor as a target for prevention of coronary heart disease: a Mendelianrandomisation analysis.

The Lancet 379 (9822), 1214–1224.Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) 58 (1), 267–288.Vilhjalmsson, B. J., J. Yang, H. K. Finucane, A. Gusev, S. Lindstr¨om, S. Ripke, G. Genovese,P.-R. Loh, G. Bhatia, R. Do, T. Hayeck, H.-H. Won, Schizophrenia Working Group of the Psy-chiatric Genomics Consortium, DRIVE study, S. Kathiresan, M. Pato, C. Pato, R. Tamimi,E. Stahl, N. Zaitlen, B. Pasaniuc, G. Belbin, E. E. Kenny, M. H. Schierup, P. De Jager, N. A.Patsopoulos, S. McCarroll, M. Daly, S. Purcell, D. Chasman, B. Neale, M. Goddard, P. M.Visscher, P. Kraft, N. Patterson, and A. L. Price (2015). Modeling linkage disequilibriumincreases accuracy of polygenic risk scores.

The American Journal of Human Genetics 97 (4),576 – 592.Walker, V. M., N. M. Davies, G. Hemani, J. Zheng, P. C. Haycock, T. R. Gaunt, G. Davey Smith,and R. M. Martin (2019). Using the MR-Base platform to investigate risk factors and drugtargets for thousands of phenotypes.

Wellcome Open Research .Waterworth, D. M., S. L. Ricketts, K. Song, L. Chen, J. H. Zhao, S. Ripatti, and Other Authors(2010). Genetic variants inﬂuencing circulating lipid levels and risk of coronary artery disease.

Arteriosclerosis, Thrombosis, and Vascular Biology 30 (11), 2264–2276.Yang, J., T. Ferreira, A. P. Morris, S. E. Medland, G. I. of ANthropometric Traits (GI-ANT) Consortium, D. G. Replication, M. analysis (DIAGRAM) Consortium, P. A. F. Mad-den, A. C. Heath, N. G. Martin, G. W. Montgomery, M. N. Weedon, R. J. Loos, T. M.Frayling, M. I. McCarthy, J. N. Hirschhorn, M. E. Goddard, and P. M. Visscher (2012). Con-ditional and joint multiple-SNP analysis of GWAS summary statistics identiﬁes additionalvariants inﬂuencing complex traits.

Nature Genetics 44 (4), 369–375.Yavorska, O. O. and S. Burgess (2017). MendelianRandomization: an R package for performingMendelian randomization analyses using summarized data.

International Journal of Epi-demiology 46 (6), 1734–1739. heng, J., V. Haberland, D. Baird, V. Walker, P. Haycock, A. Gutteridge, T. G. Richardson,J. Staley, B. Elsworth, S. Burgess, B. B. Sun, J. Danesh, H. Runz, J. C. Maranville, H. M.Martin, J. Yarmolinsky, C. Laurin, M. V. Holmes, J. Liu, K. Estrada, L. McCarthy, M. Hurle,D. Waterworth, M. R. Nelson, A. S. Butterworth, G. D. Smith, G. Hemani, R. A. Scott, andT. R. Gaunt (2019). Phenome-wide Mendelian randomization mapping the inﬂuence of theplasma proteome on complex diseases. bioRxiv . upplementary Material This ﬁle contains Supplementary material for the manuscript “Statistical Methods for cis-Mendelian randomizaton”.

Proportion of Genetic Variation Explained

To ﬁx the proportion v G of variation in the risk factor explained by the genetic variants, weadjusted the residual variance σ in our simulations accordingly. We ﬁrst speciﬁed a value for v G and generated SNP-risk factor associations β Xj as described in the simulation design sectionof the manuscript. The value of σ was then set equal to σ = 1 − v G v G N ref β TX G Tref G ref β X where G ref is the reference genetic matrix. This can be justiﬁed by recalling that in oursimulation design, X = P (cid:88) j =1 β Xj G j + α X U + (cid:15) X with U, (cid:15) X ∼ N (0 , σ ). Taking variances, we have that Var( X ) = Var( Gβ X )+2 σ and therefore, v G = Var( Gβ X )Var( X ) = Var( Gβ X )Var( Gβ X ) + 2 σ = N ref β TX G T Gβ X N ref β TX G T Gβ X + 2 σ which yields the previous expression for σ . A similar formula was derived in the appendix ofYang et al. (2012). Additional Simulations - Large G-X Sample Size

To augment the analysis presented in the paper, we compared the various cis-MR methods in arange of additional simulations. This section contains simulation results for a simulation using alarger sample size of N = 100000 to compute summary-level risk factor-outcome associations.This scenario may represent an applied analysis using a downstream biomarker as a proxy forprotein expression, as this would allow researchers to obtain summary-level data from existinglarge-scale GWAS studies whose sample size often ranges in the hundreds of thousands. Thiswas the case for the two real-data applications presented in the main part of our manuscript.We conducted simulations both for the SHBG and for the

HMGCR region, with six causalvariants per region, as in our main simulations. For brevity, we only considered the “stronginstruments” scenario, where v G = 3% for the SHBG region and v G = 2% for the HMGCR region. Otherwise, the simulations reported here were set up in the same way as those reportedin the main part of our paper. Results are reported in Table 5.A notable diﬀerence in the results compared to our main simulations was that a larger samplesize resulted in stronger instruments. A tenfold increase in the sample size resulted in a similar,tenfold increase in the values of the F statistics. For a regression using only the six causalvariants, the average F statistic was 514 for the

SHBG and 340 for the

HMGCR region. As aresult, the various cis-MR methods were less aﬀected by weak instruments bias and performedquite well. The performance of the stepwise pruning method was was much more consistent fordiﬀerent values of ρ than in the corresponding simulation with N = 10000. PCA and JAM = 0 θ = 0 . θ = 0 . θ se (ˆ θ ) Type I ˆ θ se (ˆ θ ) Cov Power ˆ θ se (ˆ θ ) Cov SHBG

RegionTop SNP —– -0.001 0.019 0.063 0.050 0.020 0.951 0.710 0.101 0.020 0.952Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.015 0.038 0.051 0.015 0.950 0.909 0.100 0.016 0.957 k = 0 .

999 0.000 0.014 0.040 0.051 0.015 0.944 0.939 0.100 0.015 0.959JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.013 0.038 0.051 0.014 0.952 0.951 0.100 0.014 0.960F-LIML —– -0.001 0.014 0.046 0.051 0.015 0.952 0.919 0.100 0.015 0.961CLR —– —– —– 0.044 —– —– 0.949 0.919 —– —– 0.958

HMGCR

RegionTop SNP —– 0.000 0.019 0.055 0.050 0.019 0.950 0.738 0.100 0.020 0.951Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.017 0.047 0.050 0.017 0.956 0.815 0.100 0.018 0.947 k = 0 .

999 0.000 0.017 0.049 0.050 0.017 0.956 0.819 0.100 0.018 0.948JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.017 0.049 0.050 0.017 0.951 0.825 0.099 0.018 0.942F-LIML —– 0.000 0.017 0.049 0.051 0.017 0.952 0.825 0.100 0.018 0.949CLR —– —– —– 0.049 —– —– 0.947 0.823 —– —– 0.947

Table 5: Simulation results for the “strong instruments” scenario and a large G − X sample size( N = 100000). 31 ere both unaﬀected by weak instrument bias for θ = 0 .

1. Conﬁdence intervals constructedusing these methods practically attained nominal coverage, and so did the conﬁdence intervalsbased on F-LIML and CLR. A small reduction in power for single-SNP analysis and LD-pruningwith low correlation thresholds were the only real diﬀerence between the various methods. Thiswas the case both for the

SHBG and for the

HMGCR region.These results imply that weak instrument bias is less likely to aﬀect analyses based on large G − X sample sizes, and stepwise pruning does not underperform other methods in such analyses. Additional Simulations - Fewer Causal Variants

In our second set of additional simulations, we modiﬁed the number of causal variants that werepresent in each region. Focusing on the

SHBG region, we considered two additional scenarios.First, we assumed the existence of only a single causal variant in the region, placing thecausal signal at the variant that had the smallest univariate p-value in our real-data application(rs1799941). That variant was assumed to have an eﬀect of β Xj = 0 .

38 on the risk factor, thesame as that observed in our real SHBG-testosterone dataset.In the second scenario, we generated three independent genetic eﬀects on the risk factor. Theeﬀects were placed at genetic variants suggested as independently causal by Jin et al. (2012),who analysed genetic associations of variants in the

SHBG region with serum testosterone levels.Genetic eﬀects for the causal variants were drawn randomly according to β Xj ∼ | N (0 , . | + 0 . v G = 3% and a G − X sample size of N = 10000. Simulation results arereported in Table 6.The results closely resembled the ones for our original simulation scenario with “stronginstruments” and six causal variants. All methods were quite accurate for θ = 0. When θ (cid:54) = 0, stepwise pruning was subject to weak instrument bias to some extent, especially for largecorrelation thresholds. The performance of the method depended on the correlation thresholdused. PCA and JAM were more consistent in terms of their tuning parameters and weregenerally quite accurate, while factor-based methods were even more accurate, with a smallinﬂation of Type I error rates under the null for F-LIML.Most methods performed slightly better with one than with three causal variants, but thediﬀerences in their performance were small and it seems that the number of causal signals inthe region should have little impact on the choice of which cis-MR method to use. An exceptionrelates to the use of the top-SNP approach, which was expectedly quite accurate when only asingle causal variant existed in the region. With three causal variants, top-SNP analysis wassubject to the same issues as in our original simulations (namely larger standard errors andlower power than other methods), but to a lesser degree. Additional Simulations - Small Reference Dataset

In our third set of additional simulations, we wanted to assess the impact of the referencedataset on the various MR methods. In particular, we assessed whether the use of a smallreference dataset, which would imply inaccurate genetic correlation estimates, aﬀects some cis-MR methods worse than others. We therefore repeated the simulations from the main part ofour paper, but bootstrapped the rows of the UK Biobank matrix and only selected N ref = 1000individuals from which to compute genetic correlations, instead of using the entire UK Biobank = 0 θ = 0 . θ = 0 . θ se (ˆ θ ) Type I ˆ θ se (ˆ θ ) Cov Power ˆ θ se (ˆ θ ) CovOne Causal VariantTop SNP —– 0.001 0.013 0.046 0.051 0.014 0.952 0.966 0.100 0.014 0.937Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.015 0.053 0.050 0.016 0.946 0.882 0.099 0.016 0.924 k = 0 .

999 0.001 0.014 0.051 0.048 0.014 0.952 0.922 0.096 0.015 0.919JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.001 0.013 0.046 0.051 0.014 0.953 0.966 0.100 0.014 0.938F-LIML —– 0.001 0.014 0.061 0.051 0.015 0.944 0.936 0.100 0.016 0.942CLR —– —– —– 0.047 —– —– 0.951 0.928 —– —– 0.955Three Causal VariantsTop SNP —– 0.000 0.016 0.051 0.050 0.016 0.956 0.852 0.098 0.017 0.934Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.015 0.051 0.049 0.015 0.941 0.884 0.096 0.016 0.932 k = 0 .

999 0.000 0.014 0.047 0.047 0.014 0.944 0.905 0.095 0.015 0.925JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.013 0.044 0.049 0.014 0.946 0.948 0.097 0.014 0.929F-LIML —– 0.000 0.014 0.060 0.050 0.014 0.932 0.933 0.100 0.016 0.949CLR —– —– —– 0.049 —– —– 0.940 0.919 —– —– 0.959

Table 6: Simulation results for the

SHBG region with either 1 or 3 causal variants.33 = 0 θ = 0 . θ = 0 . θ se (ˆ θ ) Type I ˆ θ se (ˆ θ ) Cov Power ˆ θ se (ˆ θ ) Cov SHBG

RegionTop SNP —– -0.001 0.019 0.039 0.048 0.019 0.941 0.676 0.095 0.020 0.917Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 0.000 0.015 0.041 0.049 0.015 0.939 0.883 0.097 0.015 0.922 k = 0 .

999 0.000 0.014 0.052 0.048 0.014 0.925 0.904 0.094 0.015 0.919JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 0.000 0.014 0.093 0.047 0.014 0.888 0.885 0.091 0.014 0.843F-LIML —– 0.000 0.014 0.052 0.050 0.014 0.921 0.907 0.099 0.016 0.929CLR —– —– —– 0.040 —– —– 0.929 0.895 —– —– 0.933

HMGCR

RegionTop SNP —– -0.001 0.018 0.047 0.048 0.018 0.951 0.753 0.099 0.019 0.921Pruning ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 . k = 0 .

99 -0.001 0.017 0.047 0.048 0.017 0.950 0.816 0.098 0.018 0.921 k = 0 .

999 0.000 0.016 0.051 0.047 0.017 0.949 0.814 0.097 0.017 0.922JAM ρ = 0 . ρ = 0 . ρ = 0 . ρ = 0 .

95 -0.001 0.017 0.150 0.044 0.017 0.824 0.759 0.092 0.018 0.791F-LIML —– -0.001 0.016 0.052 0.049 0.017 0.941 0.825 0.101 0.019 0.945CLR —– —– —– 0.049 —– —– 0.945 0.810 —– —– 0.942

Table 7: Simulation results for the “strong instruments” scenario, with a small reference sample( N ref = 1000). dataset of N bb = 367643 individuals. In practice, using small datasets as reference data is notuncommon: for example, the 1000 genomes dataset is commonly used for this purpose. Theuse of a small reference dataset does not systematically bias the estimated genetic covariancematrix, as would be the case with population stratiﬁcation.We used both genetic regions and the “strong instruments” scenario. Again, the simulationdesign was identical to those reported in the main part of the paper, except using a smallerreference dataset. Results are reported in Table 7.The JAM algorithm proved to be more sensitive to the reference dataset. The algorithmadjusts marginal SNP-trait associations using the reference genetic correlations. By using aninaccurate correlation pattern, the algorithm’s adjustment was adversely aﬀected and this re-sulted in selecting more genetic variants than in runs with a larger reference dataset. This wasespecially the case for large values of the correlation threshold, in which case the pre-pruningstep only discards a small number of variants before running JAM. For example, for the SHBG region with θ = 0 and ρ = 0 .

95, the algorithm had a posterior model size of 9 .

27 comparedto 3 .

75 when the entire UK Biobank was used as a reference dataset. This meant that the lgorithm was more susceptible to weak instrument bias, made JAM causal eﬀect estimatesslightly more variable for larger ρ values and increased Type I error rates. For smaller valuesof the correlation threshold the algorithm’s performance was aﬀected less.The performance of LD-pruning also attenuated for large values of its correlation threshold,but this attenuation was in line with what we observed in our original simulations, using a largereference dataset. Likewise, principal components analysis and factor-based methods seemedfairly robust to using a smaller reference dataset.values and increased Type I error rates. For smaller valuesof the correlation threshold the algorithm’s performance was aﬀected less.The performance of LD-pruning also attenuated for large values of its correlation threshold,but this attenuation was in line with what we observed in our original simulations, using a largereference dataset. Likewise, principal components analysis and factor-based methods seemedfairly robust to using a smaller reference dataset.