SOS: Online probability estimation and generation of T and B cell receptors
Giulio Isacchini, Carlos Olivares, Armita Nourmohammad, Aleksandra M. Walczak, Thierry Mora
SSOS: Online probability estimation and generation of T and B cell receptors
Giulio Isacchini,
1, 2
Carlos Olivares, Armita Nourmohammad,
2, 3, 4
Aleksandra M. Walczak, ∗ and Thierry Mora ∗ Laboratoire de physique de l’ ´Ecole normale sup´erieure (PSL University),CNRS, Sorbonne Universit´e, and Universit´e de Paris, 75005 Paris, France Max Planck Institute for Dynamics and Self-organization, Am Faßberg 17, 37077 G¨ottingen, Germany Department of Physics, University of Washington,3910 15th Avenue Northeast, Seattle, WA 98195, USA Fred Hutchinson cancer Research Center, 1100 Fairview ave N, Seattle, WA 98109, USA
Recent advances in modelling VDJ recombination and subsequent selection of T and B cell recep-tors provide useful tools to analyze and compare immune repertoires across time, individuals, andtissues. A suite of tools—IGoR [1], OLGA [2] and SONIA [3]—have been publicly released to thecommunity that allow for the inference of generative and selection models from high-throughputsequencing data. However using these tools requires some scripting or command-line skills andfamiliarity with complex datasets. As a result the application of the above models has not beenavailable to a broad audience. In this application note we fill this gap by presenting Simple OLGA& SONIA (SOS), a web-based interface where users with no coding skills can compute the gener-ation and post-selection probabilities of their sequences, as well as generate batches of syntheticsequences. The application also functions on mobile phones.
Availability and implementation:
SOS is freely available to use at sites.google.com/view/statbiophysens/sos with source code at github.com/statbiophys/sos . INTRODUCTION
The adaptive immune systems recognises pathogensthrough the generation of a highly diverse repertoire of Tand B cell receptors (TCR and BCR) which have the po-tential to recognise even unknown pathogen and initiatean immune response. In order to produce this diversityit exploits a highly stochastic process named V(D)J re-combination. In addition, to block possible auto-reactivereceptors, a selection process is mounted in the thymusfor T cells, and a similar process of central tolerance isimplemented for B cells. Probabilistic models of TCRand BCR have been proposed [4–6] based on immunerepertoire sequencing data [7–10]. Software has been de-velopped to infer the probability of generation of anyB- or T-cell receptor (IGoR [1]), and to evaluate thisprobability for both nucleotide and amino-acid sequences(OLGA [2]). Another tool (SONIA [3]) was released toinfer the selective pressures acting on the receptors andused to predict the probability of naive sequences in theperiphery [11]. In order to make these tools availableto a broader audience, we provide a new web tool whichallows for the analysis of single TCR and BCR sequences.
FEATURES
As explained in the introductory “About” tab, the webtool evaluates the generation and post-selection proba-bility of single naive T and B cell receptors in differ-ent species based on the specific sequence the user in-puts manually. The engine is based on two pieces ofpython software, OLGA and SONIA, and shipped withpre-trained models of recombination and selection for the following loci: human alpha and beta chains or TCR(TRA and TRB), human heavy and light chain of unmu-tated BCR (IGH, IGK, and IGL), and mouse TRB.After choosing the species and receptor chain in the“Evaluate” tab, the user inputs a Complementary De-termining Region 3 (CDR3), either as a nucleotide or anamino acid sequence, and optionally V and J germlinegenes from dropdown lists. The server outputs the gen-eration probability ( P gen , conditioned on sequence pro-ductivity), and the post selection probability ( P post ), asshown in Fig. 1 (left). When V and J are not specified,the program sums over all possibilites for these segmentsto calculate the total probability of the CDR3.To help interpret the result and assess how the se-quence of interest compares to others, P gen , P post , andthe selection factor Q = P post /P gen are plotted as greenvertical lines on histograms of random sequences takenpre- (blue line) and post- (orange line) selection (Fig. 1,right). That feature only works when V and J and spec-ified. The tool also provides an estimation of the prob-ability to observe the sequence in a generic repertoire.The user inputs the size N of the sequenced repertoire(unique productive nucleotide sequences), and the tooloutputs the probability of observing the sequence withina repertoire of that size, given by 1 − (1 − P post ) N .Using the “Generate” tab, the user can synthetizea specified number of receptor sequences from P gen or P post , after choosing the species and chain type fromdropdown lists. The file with the generated sequences,composed of the CDR3 sequence (nucleotide and amino-acid translation), V and J segments, is available for down-load as a CSV file. The user may fix the seed of therandom number generator for reproducibility. a r X i v : . [ q - b i o . GN ] M a r FIG. 1:
SOS web interface.
The user inputs a CDR3 sequence (amino acid or nucleotides) and V and J segments. Theprogram outputs the generation probability P gen , the probability in the periphery P post , and evaluates a p-value correspondingto the probability of finding that sequence by chance in a repertoire of size N (input by user). An additional tab allows for thegeneration of synthetic repertoires. DISCUSSION
The interface can be used by investigators to evaluatehow surprised one should be to find a given sequence inone or multiple repertoires. It could help distinguish re-ceptors with a specific function from chance detections.The tool can also be used to evaluate the potential of cer-tain receptors (in particular antibodies, albeit in their un-mutated version) for vaccination or therapeutic purposes.The web interface is also available on mobile phones with-out the plotting options.
Acknowledgments.
This work was partially sup-ported by the European Research Council Proof of Con-cept Grant n. 824735 ∗ These authors contributed equally.[1] Marcou Q, Mora T, Walczak AM (2018) High-throughput immune repertoire analysis with IGoR.
Na-ture Communications
Bioinformatics arXiv:2001.02843 pp 1–17.[4] Murugan A, Mora T, Walczak AM, Callan CG (2012)Statistical inference of the generation probability of T-cell receptors from sequence repertoires.
Proceedings ofthe National Academy of Sciences
PLoS Com-putational Biology
Proceedings of the National Academy of Sci-ences
Nature biotechnology
Briefings in Bioinformatics
Trans-plant International pp 0–2.[10] Bradley P, Thomas PG (2019) Using T Cell Recep-tor Repertoires to Understand the Principles of Adap-tive Immune Recognition.
Annual Review of Immunology arXiv:1911.12279arXiv:1911.12279