bioRxiv | 2021

A multi-objective based clustering for inferring BCR clones from high-throughput B cell repertoire data

 
 
 
 
 

Abstract


The adaptive B cell response is driven by the expansion, somatic hypermutation, and selection of B cell clones. A high number of clones in a B cell population indicates a highly diverse repertoire, while clonal size distribution and sequence diversity within clones can be related to antigen’s selective pressure. Identifying clones is fundamental to many repertoire studies, including repertoire comparisons, clonal tracking and statistical analysis. Several methods have been developed to group sequences from high-throughput B cell repertoire data. Current methods use clustering algorithms to group clonally-related sequences based on their similarities or distances. Such approaches create groups by optimizing a single objective that typically minimizes intra-clonal distances. However, optimizing several objective functions can be advantageous and boost the algorithm convergence rate. Here we propose a new method based on multi-objective clustering. Our approach requires V(D)J annotations to obtain the initial clones and iteratively applies two objective functions that optimize cohesion and separation within clones simultaneously. We show that under simulations with varied mutation rates, our method greatly improves clonal grouping as compared to other tools. When applied to experimental repertoires generated from high-throughput sequencing, its clustering results are comparable to the most performing tools. The method based on multi-objective clustering can accurately identify clone members, has fewer parameter settings and presents the lowest running time among existing tools. All these features constitute an attractive option for repertoire analysis, particularly in the clinical context to unravel the mechanisms involved in the development and evolution of B cell malignancies.

Volume None
Pages None
DOI 10.1101/2021.10.01.462736
Language English
Journal bioRxiv

Full Text