HLA predictions from the bronchoalveolar lavage fluid samples of five patients at the early stage of the Wuhan seafood market COVID-19 outbreak
aa r X i v : . [ q - b i o . GN ] A p r H L A
PR EDICTIONS FROM THE B RONC HOALVEOLAR LAVAGEFLUID SAMPLES OF FIVE PATIENTS AT THE EAR LY STAGE OF THE W UHAN SEAFOOD MAR KET
COV ID -19
OUTB R EAK
A P
REPRINT
René L. Warren
Genome Sciences Centre, BC CancerVancouver, BC, V5Z 4S6, Canada [email protected]
Inanç Birol
Genome Sciences Centre, BC CancerVancouver, BC, V5Z 4S6, Canada [email protected]
April 28, 2020 A BSTRACT
We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. TheHuman Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predictedHLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolarlavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified theHLA-I allele A*24:02 in four out of five patients, which is higher than the expected frequency(17.2%) in the South Han Chinese population. The difference is statistically significant with a p-value less than − . Our analysis results may help provide future insights on disease susceptibility. Availability:
The tool used to perform the reported analysis results, HLAminer, is availablefrom https://github.com/bcgsc/hlaminer . Predictions are available for download from K eywords COVID-19 · SARS-Cov-2 · Human Leukocyte Antigen (HLA) · RNA-Seq · bronchoalveolar lavage fluid · HLA-A*24:02 · HLA-DPA1*02:02 · HLA-DPB1*05:01
SARS-CoV-2 infections have reached global pandemic proportions in early 2020, affecting over 2M people worldwide(as of this writing) and showing no signs of easing, except in a few jurisdictions where strict quarantine measures wereimplemented early on. The resulting coronavirus disease (COVID-19) has a relatively high ( ∼ PREPRINT - A
PRIL
28, 2020For over a decade, high-throughput transcriptome sequencing (RNA-Seq) has proven a worthy instrument for measur-ing changes of gene expression in human diseases and beyond [5]. Transcriptome analysis has the potential to revealkey genes that are modulated in response to infections, but also to reveal the HLA composition of affected individuals.A few years ago, our group developed an approach for mining high-throughput next-generation shotgun sequencingdata for the purpose of HLA determination [6], which has since been applied in a broader clinical context [7].Here, we report our initial observations based on transcriptome sequencing libraries prepared from the bronchoalveolarlavage fluid samples of five patients at the early stage of the Wuhan seafood market pneumonia coronavirus outbreak(see Methods). We identified the HLA class I allele A*24:02 and class II haplotype DPA1*02:02-DPB1*05:01 in fourout of five individuals. Although HLA-A*24:02 is common in some populations, the prevalence observed (80%) ishigher than the allele frequency in the Chinese population (17.2%) — the presumed ethnicity of the patients in thereported study.
We downloaded MGISEQ-2000RS paired-end (150 bp) RNA-Seq reads from libraries prepared from the bronchoalve-olar lavage fluid samples of five patients (
Accessions:SRX7730880-SRX7730884 denoted in the tables below as Patient 1-5, respectively). We note that these are metage-nomics RNA samples, and, although not explicitly noted, we think that they were prepared for the primary purposeof identifying and characterizing the novel coronavirus at the outbreak epicentre. On each dataset, we ran HLAminer[6] in targeted assembly mode with default values (v1.4; contig length ≥ ≥ ≥ HLAminer predictions are shown for HLAclass I genes A, B and C. Common HLA alleles between two or more patients are highlighted in bold face.
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5
A*01:01P A*30:01P
A*24:02P A*24:02P
A*29
A*24:02P
A*02:06P A*26:01P -
A*24
B*35/B*57
B*51:01P
B*15:01P B*40:01P B*54:01PB*48 B*13:02P
B*51:01P
B*13:01P B*07:05PC*08:72P C*14:02P C*15:02P C*04:03P C*15
C*06:02P C*06:02P
C*03:03P C*03:04P -
We predicted and compiled the likely HLA class I (Table 1) and II (Table 2) alleles for five patients at the early stageof the COVID-19 outbreak in Wuhan, China. Although the bronchoalveolar lavage (BAL) fluid samples were initiallyutilized to identify and characterize the novel coronavirus (similar justification in [8]), BAL metagenomics samplesare expected to contain host cells / nucleic acids (DNA/RNA). Because HLA genes are expressed at the surface of allhuman nucleated cells, RNA-Seq data can be employed to determine HLA profiles from BAL samples.We observe the HLA-A*24:02 allele in four out of five (80%) patients (Table 1). HLA-A*24 is a common group ofalleles in Southeastern Asian populations, and the frequency of HLA-A*24:02 allele is high, especially in indigenousTaiwanese populations, reaching as high as 86.3%. However, our understanding is that the five patients are from theWuhan market area, and the associated A*24:02 frequency in Chinese population is typically 17.2%, a value that isstatistically significantly less than our observed frequency of 80% (p < − , 1-sided z-test). Also of note, the HLAclass II DPA1*02:02 and DPB1*05:01 haplotype predicted in patients 1 to 4 (80%), genes DQB1*03:01 in patients 1and 4, and DRB4*01:01 in patients 2 and 3 (Table 2). HLA-A*24 has not been previously reported as a risk factor for SARS infection [9], but there are reports of diseaseassociation with HLA-A*24:02, notably with diabetes [10] [11] [12] [13], which is a reported potential risk factorin COVID-19 patients [14]. Both DPA1*02:02 and DPB1*05:01 occur at relative high frequency (44.8% and 31.3%,2
PREPRINT - A
PRIL
28, 2020Table 2: HLA-II predictions from the bronchoalveolar lavage fluid samples of five patients at the early stage of theWuhan seafood market pneumonia coronavirus outbreak. Highest-scoring
HLAminer predictions are shown for HLAclass II genes DPA1, DPB1, DQA1, DQB1, DRA, DRB1, DRB3, DRB4 and DRB5. Missing class II genes or (-)denote the absence of predictions. Common HLA alleles between two or more patients are highlighted in bold face.
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5DPA1*02:02P
DPA1*02:01P
DPA1*02:02P DPA1*02:02P
DPA1*04:01P-
DPA1*02:02P - - -
DPB1*05:01P
DPB1*13:01P
DPB1*05:01P DPB1*05:01P -DPB1*02:02P
DPB1*05:01P - - -DQA1*01:03P DQA1*01:02P DQA1*03:01P DQA1*01:01P -DQA1*05 DQA1*02:01P - DQA1*05:01P -
DQB1*03:01P
DQB1*05:02P DQB1*04:01P
DQB1*03:01P -DQB1*06:01P DQB1*02:01P DQB1*03:02P - -
DRA*01:01P DRA*01:01P DRA*01:01P - -- - - - -DRB1*08:03P DRB1*07:01P DRB1*04:03P DRB1*15:02P DRB1*04- DRB1*16:02P DRB1*04:05P DRB1*11:01P -- - - DRB3*02:02P -- - - - --
DRB4*01:01P DRB4*01:01P - -- - - - -- DRB5*02:02P - DRB5*01:02 -- - - - -n=1490) in Han Chinese [15], and associations of those HLA type II alleles with narcolepsy [16] and Graves’ disease[15], both autoimmune disorders, have been reported in that population. Further, a GWAS study found a link betweenHLA-DPB1*05:01 and chronic hepatitis B in Asians, and it has been suggested that this risk allele may impact one’sability to clear viral infections [16] [17].HLA also informs vaccine development. This knowledge would help prioritize SARS-CoV-2 derived epitopes pre-dicted to be stable HLA binders [18] [19]. Future studies into host susceptibility and resistance to SARS-CoV-2infections are sorely needed as they may help us better manage and mitigate risks of infections. We chose to commu-nicate our early findings in this domain to facilitate rapid development of response strategies.
Grant information
This work was supported by Genome BC and Genome Canada [281ANV]; and the National Institutes of Health[2R01HG007182-04A1]. The content of this paper is solely the responsibility of the authors, and does not necessarilyrepresent the official views of the National Institutes of Health or other funding organizations.
References [1] Rajgor, D.D. et al. (2020) The many estimates of the COVID-19 case fatality rate. The Lancet.doi:10.1016/S1473-3099(20)30244-9[2] Mizumoto, K. et al. (2020) Estimating the asymptomatic proportion of corona-virus disease 2019 (COVID-19)cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020. Euro Surveill. 25, pii=2000180.doi:10.2807/1560-7917.ES.2020.25.10.2000180[3] Nishiura, H. et al. (2020) Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19). Int.Journal of Infectious Diseases, doi:10.1016/j.ijid.2020.03.020[4] Lin, M. et al. (2003) Association of HLA class I with severe acute respiratory syndrome coronavirus infection.BMC Med Genet. 4, 9[5] Wang, Z. et al. (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews Genetics. 10, 57-63[6] Warren, R.L. et al. (2012) Derivation of HLA types from shotgun sequence datasets. Genome Med. 4, 953
PREPRINT - A
PRIL
28, 2020[7] Brown, S.D. et al. (2014) Neo-antigens predicted by tumor genome meta-analysis correlate with increased patientsurvival. Genome Res. 24, 743-750[8] Wu F. et al. (2020) Complete genome characterisation of a novel coronavirus associated with severe humanrespiratory disease in Wuhan, China. bioRxiv. doi:10.1101/2020.01.24.919183[9] Sun, Y. and Xi, Y. (2014) Association between HLA gene polymorphism and the genetic susceptibility of SARSinfection. Book: HLA and associated important diseases. IntechOpen. 12, doi:10.5772/57561[10] Adamashvili, I. et al. (1997) Soluble HLA class I antigens in patients with type I diabetes and their familymembers. Human Immunol. 55, 176–83[11] Noble, J.A. et al. (2002) The HLA class I A locus affects susceptibility to type 1 diabetes. Human Immunol. 63,657–664[12] Nakanishi, K. and Inoko, H. (2006) Combination of HLA-A24, -DQA1*03, and -DR9 contributes to acute-onsetand early complete beta-cell destruction in type 1 diabetes: longitudinal study of residual beta-cell function.Diabetes. 55, 1862–1868[13] Kronenberg, D. et al. (2012) Circulating preproinsulin signal peptide-specific CD8 T cells restricted by the sus-ceptibility molecule HLA-A24 are expanded at onset of type 1 diabetes and kill ββ