Artificial neural networks for 3D cell shape recognition from confocal images
G. Simionato, K. Hinkelmann, R. Chachanidze, P. Bianchi, E. Fermo, R. van Wijk, M. Leonetti, C. Wagner, L. Kaestner, S. Quint
AArtificial neural networks for 3D cell shape recognition from confocal images
G. Simionato,
1, 2, ∗ K. Hinkelmann, ∗ R. Chachanidze,
1, 3
P. Bianchi, E. Fermo, R. van Wijk, M. Leonetti, C. Wagner,
1, 6
L. Kaestner,
1, 7 and S. Quint
1, 8, † Saarland University, Department of Experimental Physics, Campus E2.6 , 66123 Saarbrücken, Germany Saarland University, Department of Experimental Surgery,Campus University Hospital, Building 61.4, 66421 Homburg, Germany University Grenoble Alpes, CNRS, Grenoble INP, LRP, 38000 Grenoble, France Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milano, Italy University Medical Center Utrecht, Department of Clinical Chemistry & Haematology, Utrecht, The Netherlands. University of Luxembourg, Physics and Materials Science Research Unit, Luxembrourg City, Luxembourg Saarland University, Theoretical Medicine and Biosciences,Campus University Hospital, Building 61.4, 66421 Homburg, Germany Cysmic GmbH, Geretsrieder Str. 10, 81379 München, Germany (Dated: May 29, 2020)We present a dual-stage neural network architecture for analyzing fine shape details from microscopyrecordings in 3D. The system, tested on red blood cells, uses training data from both healthy donorsand patients with a congenital blood disease. Characteristic shape features are revealed from thespherical harmonics spectrum of each cell and are automatically processed to create a reproducibleand unbiased shape recognition and classification for diagnostic and theragnostic use.
I. MAIN
Cell morphology is a phenotypic characteristic reflectingthe cell cycle, metabolic state or cellular activity [1–3].While brightfield imaging is affected by the orientation ofcells on the microscopy slide, which determines a certainprojection, 3D confocal microscopy allows to investigatethe whole cell surface without loss of information.The analysis of shapes is related to feature detection inprocessed images. Deep-learning-based approaches canpotentially be employed for such tasks, avoiding man-ual procedures that are time consuming, subjective andprone to human error. For this reason, we developed amethod that provides recognition at high shape-detailresolution of 3D objects that are similar in shape andnature, thus having the potential to be universally useddue to its precision. This method is implemented with alow computational cost.To evaluate our system, we employed red blood cells(RBCs), representing one the most deformable cell types.In healthy subjects, RBCs in stasis are typically biconcavedisks, but external factors such as the pH or osmolarity ofthe suspension medium or interaction with surfaces canstimulate a shape transition. Such transformations appearin a distinct order and are described as the stomatocyte-discocyte-echinocyte (SDE) sequence [4]. In case of blooddiseases, additional morphological abnormalities appear,defining certain blood disorders (e.g., hereditary spherocy-tosis, sickle cell disease, acanthocytosis or stomatocytosis)[5, 6]. The investigation of RBC morphology for thediagnosis of hematological diseases relies on the visualexamination of blood smears. Advances in automation of ∗ These two authors contributed equally to this work † [email protected] the analysis have especially involved convolutional neuralnetworks (CNNs) for white blood cell recognition [7] and,in some cases, for RBC detection and shape classificationin 2D, both in stasis and flow [8–10]. However, in bloodsmear preparation the smearing and drying proceduresaffect cells, leading to unwanted morphological deforma-tion [11] and loss of the 3D information of the originalcell shape.Instead of this technique, we performed fixation of RBCs,followed by fluorescent staining. Confocal microscopy wasused to capture the 3D representations of cells by meansof z-stacks (Fig. 1a). After offset elimination, intensitynormalization and adaptation of the resolution in the x/yand z direction by interpolation (Fig. 1b), we discrim-inated the cell membrane as an isosurface defined by aconstant intensity threshold (Fig. 1c). In contrast to pre-existing classification approaches for such kinds of data,e.g., 3D-CNNs [12] or voxelwise processing techniques[13], we transformed and subsequently collapsed the volu-metric data to exclusively access the features of interest.This data reduction was achieved by decomposing thespatial information of the cell surface into the respectivespherical harmonics (SH) spectrum (Fig. 1d) [14]. Thus,a one-dimensional data-vector was obtained, encodingthe prevalent features of the 3D shape and characterizedby rotation and translation invariance (see Methods). Asubsequent normalization mapped the SH spectrum intothe range from 0 to 1, rendering the data suitable to trainartificial neural networks (ANNs).For cell shape recognition, we used a dual-stage ANNarchitecture (Fig. 1e). The first stage was designed tosort out distinct RBC shapes that do not fit the SDEspectrum. Such shapes particularly occur in samples frompatients with blood diseases or other pathologies [5, 6].An additional class of “unknown” cells was added to reflecthuman uncertainty regarding unclear or rare shapes notyet defined in the literature. This class included all cells a r X i v : . [ q - b i o . Q M ] M a y classified by the first-stage ANN with an identificationaccuracy below a threshold of 75%.The second stage served to discriminate all SDE shapes.Previously, the SDE sequence was described by assigningdifferent shapes to pseudodiscrete classes, such as sphero-cytes, stomatocytes type I/II, discocytes and echinocytestype I/II/III, to serve as reasonable support for manualclassification [4]. By employing supervised training forboth ANNs, we used the state-of-the art classificationscheme to create training data for carefully selected setsof non-SDE (Fig. 1g) and ideal SDE (Fig. 1h) shapes.In between the pseudodiscrete SDE classes, extra transi-tion shapes were observed. For this reason, we assumedthat the shape transformation occurs in a continuousmanner and introduced a linear scale to automatically as-sign any identified SDE shape to an interval ranging from − SLC4A1 , ANK1 , Fig. 2, Supplementary Note and SupplementaryTable 1). As highlighted in a confusion matrix createdfor healthy controls and patients (Supplementary Fig. 4),the comparison of manual classification and automaticclassification (first-stage ANN) resulted in a variable mis-match due to the rare occurrence of related shapes inblood samples and limited available training data. Onthe other hand, we observed a high agreement betweenthe automatic allocation of cells on the SDE scale (secondstage), ranging from 78% to 100%.From a clinical perspective, patients’ data resulted in adifferent statistical output compared to those of healthydonors (Fig. 1f and Supplementary Fig. 3), with wider shape distributions within the SDE scale and an expectedvalue toward stomatocytes for P5 (Fig. 2). The tendencyof RBCs to form a round shape is a hallmark of hereditaryspherocytosis, and spherocytes are particularly expectedin blood smears of affected subjects. A previous studyreported that 2.6% of blood smears in a set of 300 pa-tients did not exhibit detectable spherocytes, leading to apossible misdiagnosis [15]. However, the evaluation in 3Dindicated that spherocytes in the tested set of patientswere very rare or completely absent and comparable innumber to those found in healthy subjects. These resultsconfirmed the dependence of blood smear shape analysison cell rotation (Fig. 2, Supplementary Fig. 5), provingthat spherocytes are not the main shape in hereditaryspherocytosis.Some indications of the presence of other shape defor-mations in blood smears were reported [16, 17] and mayhave an association with the different genetic mutationscausing the disease. The fine recognition of shape detailsby the automated dual-stage ANN resulted in a differen-tial shape profile for various mutations. This representsadditional information compared to that obtained fromblood smears, where solely the type of blood disease canbe discriminated. Finally, the prevalence of shapes oc-curring in non-SDE classes, especially in the unknownclass, underlined the high morphological variability inpatients, highlighting the demand for further RBC shapedefinitions.In conclusion, the proposed approach describes an au-tomated evaluation system for cell morphology in addi-tion to or instead of manual methods. Its application inhematology revealed that conventional microscopy haslimitations with regard to cell morphology that may leadto erroneous interpretations and shows the superiority of3D visualization and characterization of cell shapes. Inaddition to the unbiased outcome, automation by ANNallowed both the recognition of small shape details andthe possibility of using a regression-based approach forcells undergoing continuous shape transitions. Owingto the details revealed using 3D imaging combined withANNs as a universal tool for shape recognition, thoroughtests on anemic subjects may render our method suitablefor diagnostic purposes. In addition, from the resultsobtained with the tested pool of patients, we observedpotential applicability with larger datasets to relate theANN output to a particular mutation. While genetic anal-ysis is the gold standard for detection, cell imaging can beof additional interest in the investigation of the severityand state of a disease [18]. Moreover, it can be appliedfor personalized theragnostics when the effectiveness ofa specific treatment is tracked [19]. Our method maybe adapted for other cell biological applications or evenindustrial purposes. b z-interpolation a single cell cropping c iso-surface vectorization d spherical harmonics analysis den s e ( ) + s o ft m a x e dual-stage neural network structure den s e ( ) + R eLu + 6 shape classes SDE shapes i npu t ( ) den s e ( ) + R eLu den s e ( ) + li nea r i npu t ( ) f output statistics]-1,1[1000000 h examples of SDE shapes g examples of shape classesinput (SH components)cell clusters(CC) keratocytes(KE) knizocytes(KN) multilobate(ML) acanthocytes(AC) unknown(U) unknown(U)spherocytes(SP) stomatocytes II(STII) stomatocytes I(STI) discocytes(D) echinocytes I(EI) echinocytes II(EII) echinocytes III(EIII) Figure 1.
Workflow for automatic classification by the dual-stage ANN. a
After sample staining and imaging byconfocal microscopy, each cell is cropped individually and the full stack is interpolated in the z direction to achieve isotropicresolution ( b ). c The isosurface is retrieved by applying a constant threshold to each cell. d The vectorized data are transformedwith respect to their spherical harmonics ( L -norm of 32 radii and corresponding 16 frequencies, see Methods), representing arotation invariant form of the cell shape. e Data are fed to the dual-stage ANN, with the first-stage resulting in a classificationoutput (bottom). This stage consists of a three-layer architecture providing an input layer (544 neurons), a fully connectedhidden layer with a ReLU activation function (54 neurons) and a fully connected output layer (softmax activation). The outputis represented by a vector of size 7, which is subjected to discrimination by a given threshold. If the confidence is higher than75 % (threshold), the output is assigned to a certain existing class. Otherwise, the output is assigned to unknown (U) shapes.All detected SDE shapes are forwarded to the second-stage ANN, with an anatomy similar to that of the first ANN, except forthe hidden layer that exhibits 544 neurons. The regression-type output layer assigns each cell a score between − f Example of a typical resulting shape distribution in a healthy subject.Almost all RBCs are discocytes. g Representative images for 6 mutually exclusive shape classes (see Methods for description):two out of many different examples of unknown shapes are shown. h Representative cells of the SDE scale. The training dataincluded SDE shapes artificially induced by changing the osmolarity of the suspension medium. -1.00SP -0.67ST II -0.33ST I 0.00D 0.33E I 0.67E II 1.00E III
SDE distribution p r o b a b ili t y d e n s i t y total: 512 = 0.03CI = [-0.37,0.26] SLC4A1 total: 615 = 0.01CI = [-0.46,0.25]
SLC4A1 total: 842 = 0.02CI = [-0.42,0.30] n/a total: 469 = 0.02CI = [-0.42,0.28] n/a total: 769 = -0.23CI = [-0.84,0.17]
ANK1 total: 1179 = -0.00CI = [-0.56,0.29]
SLC4A1 total: 493 = 0.05CI = [-0.39,0.28]
SLC4A1 total: 922 = -0.08CI = [-0.56,0.22]
SLC4A1 total: 517 = 0.02CI = [-0.43,0.26]
SLC4A1 total: 441 = 0.10CI = [-0.18,0.31]
SLC4A1
SDE CC KE KN ML AC U classification p r o b a b ili t y .
489 0 .
021 0 .
155 0 .
002 0 .
015 0 .
023 0 . total: 1047P1 .
779 0 .
009 0 .
084 0 .
001 0 .
003 0 .
011 0 . total: 789P2 .
399 0 .
023 0 .
060 0 .
021 0 .
001 0 .
174 0 . total: 2108P3 .
442 0 .
029 0 .
098 0 .
034 0 .
016 0 .
131 0 . total: 1062P4 .
683 0 .
004 0 .
039 0 .
000 0 .
078 0 .
021 0 . total: 1126P5 .
530 0 .
005 0 .
025 0 .
000 0 .
018 0 .
218 0 . total: 2223P6 .
595 0 .
017 0 .
063 0 .
011 0 .
001 0 .
111 0 . total: 828P7 .
831 0 .
006 0 .
024 0 .
002 0 .
002 0 .
010 0 . total: 1109P8 .
597 0 .
067 0 .
042 0 .
003 0 .
010 0 .
057 0 . total: 866P9 .
465 0 .
005 0 .
090 0 .
003 0 .
023 0 .
141 0 . total: 948P10 Figure 2.
Automatic 3D RBC shape recognition for patients affected by hereditary spherocytosis.
The totalnumber of cells analyzed in both the classification and SDE distribution per patient are indicated. The probability densitydistributions within the SDE range show the expected value µ (dashed red line) and highlight 95% confidence intervals (darkgray area). The results demonstrate that most of the patients have expected values related to discocytes, with a tendency towardstomatocytes. A population of spherocytes (score −
1) is lacking in all 10 samples, proving that such a shape is not a hallmarkof the disease. Additionally, shape profiles among patients are different, suggesting a relation to the various genetic mutations(
SLC4A1 , and in one case
ANK1 ). In particular, P1 and P2 as well as P3 and P4 (nonidentified mutation) are relatives andshow similar profiles, especially within the SDE range. P5 harbors a mutation that affects the cytoskeletal protein ankyrin,resulting in the highest number of stomatocytes, including some spherocytes. P6, P7 and P9 are affected by mutations in theband 3 protein, as is P8, who also has a double mutation in spectrin alpha (see Supplementary Table 1), although this lattermutation is not known to be pathogenic. The differences among these patients may depend on the particular mutation alteringthe same gene: P6 showed 22% acanthocytes, which occur in variable numbers in P1 and P2, P7 and P9. The phenotypicassociated defect in fact, in some cases, causes band 3 deficiency, while in others, it leads to spectrin deficiency (SupplementaryTable 1). Other classes showed rare occurrences, and shape deviations were classified as unknown cells in all the tested samples,suggesting that a larger amount of shape deformations were detected by the ANNs.
II. ACKNOWLEDGMENTS
This work was supported by the Volkswagen Experiment!grant, the Deutsche Forschungsgemeinschaft (DFG) inthe framework of the research unit FOR 2688 and theEuropean Union’s Horizon 2020 Research and Innova-tion Programme under the Marie Skłodowska-Curie grantagreement no 860436 – EVIDENCE.
III. AUTHOR CONTRIBUTIONS
G.S. performed the experimental procedures, microscopy,training data generation by manual classification andmanuscript writing. K.H. performed 3D rendering ofcells, video editing, programming, process automationand system validation. R.C. performed preprocessingof microscopy images, additional data acquisition andreviewing of the manuscript. P.B., E.F. and R.W. per-formed diagnostic analysis, data collection and patientblood sampling. M.L. and C.W. discussed the results;C.W. provided laboratory infrastructure and consum-ables for experiments. L.K. and S.Q. designed the project.S.Q. conceived the method; performed programming andsystem optimization; supervised data evaluation, valida- tion, manuscript writing; and acquired funds. All au-thors contributed to the editing and proofreading of themanuscript.
IV. COMPETING INTERESTS
The authors declare no competing interests.
V. ADDITIONAL INFORMATION
Supplementary information available.
VI. DATA AVAILABILITY
The raw data supporting findings of this study are avail-able at: https://gir1.de/cytoShapeNet/data.html . VII. CODE AVAILABILITY
The code is open source under GNU General Public Li-cense (GPL), available on GitHub at: https://github.com/kgh-85/cytoShapeNet . [1] P.-H. Wu, D. M. Gilkes, J. M. Phillip, A. Narkar, T. W.-T.Cheng, J. Marchand, M.-H. Lee, R. Li, and D. Wirtz,Science Advances , eaaw6938 (2020).[2] M. F. Cutiongco, B. S. Jensen, P. M. Reynolds, andN. Gadegaard, Nature Communications , 1 (2020).[3] A. Prasad and E. Alizadeh, Trends in biotechnology ,347 (2019).[4] G. Lim H. W., M. Wortis, and R. Mukhopadhyay, SoftMatter (Wiley-VCH Verlag GmbH & Co. KGaA, 2009)pp. 139–204.[5] M. Diez-Silva, M. Dao, J. Han, C.-T. Lim, and S. Suresh,MRS Bulletin , 382–388 (2010).[6] R. J. Hardie, QJM: An International Journal of Medicine , 291 (1989).[7] A. T. Sahlol, P. Kollmannsberger, and A. A. Ewees,Scientific Reports , 1 (2020).[8] K. Yao, N. D. Rochman, and S. X. Sun, Scientific reports , 1 (2019).[9] A. Kihm, L. Kaestner, C. Wagner, and S. Quint, PLoScomputational biology , e1006278 (2018).[10] M. Xu, D. P. Papageorgiou, S. Z. Abidi, M. Dao, H. Zhao,and G. E. Karniadakis, PLoS computational biology ,e1005746 (2017).[11] R. Wenk, The American journal of medical technology , 71 (1976). [12] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson,A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker,Medical image analysis , 61 (2017).[13] C. R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, and L. J.Guibas, 2016 IEEE Conference on Computer Vision andPattern Recognition (CVPR) , 5648 (2016).[14] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, in Symposium on geometry processing , Vol. 6 (2003) pp. 156–164.[15] M. Mariani, W. Barcellini, C. Vercellati, A. P. Marcello,E. Fermo, P. Pedotti, C. Boschetti, and A. Zanella,haematologica , 1310 (2008).[16] S. Perrotta, P. G. Gallagher, and N. Mohandas, TheLancet , 1411 (2008).[17] S. Eber and S. E. Lux, in Seminars in hematology , Vol. 41(Elsevier, 2004) pp. 118–141.[18] L. Kaestner and P. Bianchi, Frontiers in Physiology ,387 (2020).[19] O. Alvarez, N. S. Montague, M. Marin, R. O’Brien, andM. M. Rodriguez, Fetal and pediatric pathology , 149(2015). ethods I. SAMPLE PREPARATION FOR TRAININGDATA GENERATION
Blood was drawn with informed consent from 5healthy donors via finger prick blood sampling intubes containing 5 µ l of 1 . . .
4% NaCl was used to inducespherostomatocytes formation. Intermediate shapeswere obtained by suspending cells in 0 .
5% NaCl (stom-atocyte types I and II) and 2 .
5% NaCl (echinocytetypes I and II) solutions (for echinocyte type III, seebelow).A total of 400 µ l of each cell suspension was fixedin 1 ml of 1% glutaraldehyde (Sigma-Aldrich, SaintLouis, USA) solution in NaCl. To fix cells with thedesired shape, each glutaraldehyde solution was pre-pared to have a total osmolarity equal to the osmolar-ity of the NaCl solution used to induce each shape [1].Fixed cells were placed in a rotator mixer (Grant-bioPTR-35, Grant Instruments, Cambridge, England)at room temperature overnight. Samples were latercentrifuged at 4000 g for 5 minutes (Eppendorf Mi-cro Centrifuge 5415 C, Brinkmann Instruments, NY,USA), washed 3 times with 1 ml of each respectiveNaCl solution used to induce the different shapes andeventually resuspended in 1 ml of the same solution.Five microliters of CellMask™Deep Red plasma mem-brane stain 0 . µ l of bloodwas suspended in 1 ml of PBS and labeled with Cell-Mask™Deep Red. After washing, the cells were re-suspended in PBS and finally placed on a glass slidefor confocal microscopy. For later testing, 5 µ l of freshblood drawn by finger pricks from 10 volunteers and10 patients with hereditary spherocytosis was fixedin 1 ml of either 1% or 0 .
1% glutaraldehyde in PBS,stained as described above and imaged. Samples frompatients were fixed and shipped from Fondazione IR-CCS Ca’ Granda Ospedale Maggiore Policlinico ofMilan (Italy) and University Medical Center Utrecht(The Netherlands).
II. IMAGING BY CONFOCAL MICROSCOPY
Each labeled sample was placed between two glassslides for imaging (VWR rectangular coverglass, 24 ×
60 mm) on top of a 60 X objective (CFI Plan Apoc- hromat Lambda 60 X Oil, NA = 1 .
4, Nikon, Tokyo,Japan) of an inverted microscope (Nikon Eclipse Ti).A solid-state laser ( λ = 647 nm, Nikon LU-NV LaserUnit) was used as a light source for imaging. Z-stackscanning was realized by employing a 300 nm piezostepper for a 20 µ m z-range. Confocal image gener-ation was performed with a spinning-disk based con-focal head (CSU-W1, Yokogawa Electric Corporation,Tokyo, Japan). Image sequences were acquired with adigital camera (Orca-Flash 4.0, Hamamatsu Photon-ics, Hamamatsu City, Japan). III. IMAGE PREPROCESSING
A custom written MATLAB™(MathWorks, MA,USA) routine was used to crop single cells from eachimage and perform their 3D reconstruction to enablevisualization of the 3D shape. Each single-cell 3Dimage contained 68 individual planes with an extentof 100 px ×
100 px and a lateral (x/y) resolution of0 . µ m/px. The piezo stepper had a minimal stepwidth of 0 . µ m, defining the z-resolution accordingly.To compensate for the difference in resolution in thex/y and z directions, we adapted the scale in z bymeans of linear interpolation. Thus, the obtained z-stack had dimensions of 100 px ×
100 px ×
185 px. Theimage stacks were then passed to a custom writtenImageJ script. By applying a fixed threshold for ev-ery image, the script binarized the confocal z-stack toretrieve the cell membrane as an isosurface. After vec-torization, the origin of the cell always correspondedto the center of its bounding box. Therefore, this stephad the benefit of introducing an inherent translationinvariance, i.e., for a given rotation, the form and sizeof the bounding box is invariant regarding translationsin space. The obj-files (Wavefront Technologies, CA,USA) generated in such a manner were then automat-ically transformed into the polygon file format (ply)and passed to the shape descriptor for the sphericalharmonics analysis.
IV. SPHERICAL HARMONICS ANALYSIS
The spherical harmonics analysis was performed byusing the high-performance software implementationdescribed by Kazhdan et al. [2]. The algorithm firstmaps a given 3D object (ply format) onto a 3D voxelgrid of defined size. For our purpose, we kept the stan-dard parameters of the algorithm, using a voxel gridof 64 px ×
64 px ×
64 px. Within this cube, at least 32spherical functions of different radii can be arrangedaround the center point. Each spherical function canthen be decomposed as the sum of its harmonics: a r X i v : . [ q - b i o . Q M ] M a y f (Θ , φ ) = ∞ X l =0 m = l X m = − l a lm Y ml (Θ , φ ) (1)Keeping the standard settings, for every radius, 16frequencies (harmonics) were calculated. In addition,the first- and second-order components of each decom-position were expressed in a Euclidean manner using3 scalars a , a , a : f + f = a x + a y + a z . (2)In the case of principal component analysis (PCA),these values could potentially be used for alignmentpurposes. However, in our case, these factors werenot employed for further PCA investigations but wereincluded in the training process. After transforma-tion, each frequency component was accumulated bycalculating the corresponding L -norm. This resultedin a vector of size 32 × (3 + 14) = 544, where 32corresponded to the number of radii, 3 to the scalarsdescribing f and f , and 14 to the number of remain-ing frequencies for each radius. Finally, each cell wasexpressed by means of a one-dimensional vector. Forfurther signal processing, this vector was normalizedand mapped onto a numeric range from 0 to 1. Theintrinsic rotation invariance with respect to the origi-nal 3D data was the key factor in choosing this kind oftransformation for shape description. This is becausecells can have any orientation after sedimentation onthe microscopy slide, and a general expression of theirshape is required for further analysis. V. MANUAL CLASSIFICATION
Each vectorized cell was rendered using Blenderfor cell shape visualization to perform a manualshape classification. The samples fixed in solutionsat different osmolarities were used to classify thevarious shapes of the SDE scale based on Bessis’criteria [3, 4]: (1) discocytes, meaning biconcaveand symmetric disks; (2) stomatocytes type I andII, characterized by a lighter (I) and deeper (II)monoconcavity; (3) spherocytes, i.e., spherical cells;(4) echinocyte type I, crenated cells preservingbiconcavity; (5) echinocyte type II, cells with formingspicules; and (6) echinocyte type III, cells with morethan 25 spikes. Following the observation of othercell morphologies, the chosen shapes for additionalcategories were knizocytes, i.e. trilobal cells resultingfrom high shear stress or observed in some blooddiseases; keratocytes, a category including damagedRBCs with variable shapes; acanthocytes, includingechinocytes with irregular spicules and/or a sphericalbody; and multilobate cells, i.e., young reticulocytes.A class for the exclusion of an artifact occurringupon cell fixation, named cell clusters, was added.Each class contained a minimum of 10 cells to a maximum of 200 cells. Any other shape beyond thechosen classes was not considered and therefore notintroduced in the training process.
VI. TRAINING AND VALIDATION OF THEDUAL-STAGE ANN
Keras with TensorFlow as the backend was used tobuild and train the dual-stage ANN. A set of repre-sentative RBC shapes was chosen for the supervisedtraining of both ANN stages. Each of the selectedcells was then transformed according to the previ-ously discussed steps. In addition, data augmentationof the spectra was performed by creating 1000 linearinterpolations between randomly picked spectra be-longing to the same class, in the case of the first-stageANN (classification) and between spectra of neighbor-ing pseudodiscrete classes in the case of the second-stage ANN (regression) to cover the whole SDE shapespectrum. The augmentation also served to compen-sate for the different number of ideal cell shapes thatwere found for certain classes, ensuring a balancedtraining dataset (1000 total data per class). Simi-lar to the k-fold cross validation approach, we trainedboth ANN stages with 100 different random startingconditions, finally selecting the best-performing ANNfor each stage. The training data constituted 80 % ofthe whole set of data, while the remaining 20 % wasused for ANN validation. The training was performedin batches of 100 spectra for both ANNs, finalizingthe process at 100 and 40 epochs for the first-stageand second-stage ANN, respectively. The related lossfunctions were the crossentropy for the first and themean squared error (MSE) for the second ANN, whilethe chosen optimizer for both was Adam [5].
VII. AUTOMATIC CLASSIFICATION OF 3DSHAPES IN HEALTHY INDIVIDUALS ANDPATIENTS
Samples from 10 healthy subjects and 10 patientsaffected by hereditary spherocytosis were automati-cally classified by the dual-stage ANN. The numberof tested cells per sample ranged from about 1000 toover 2000 cells. The results were manually verifiedin Blender. Samples from 5 healthy donors and 5 pa-tients were chosen for the manual selection of 200 cellseach for comparison with the automatic shape recog-nition and were plotted in a related confusion matrix.
VIII. BLOOD SMEAR
Five milliliters of whole blood was drawn in EDTAanticoagulant. Thin peripheral blood smears wereset up within 4 hours from blood sampling, driedand stained according to the May Grünwald-Giemsamethod [6]. RBC morphology and the number ofspherocytes were independently assessed by twolaboratory experts. The number of spherocytes was expressed as a percentage of the total number ofRBCs observed. [1] Asena Abay, Greta Simionato, Revaz Chachanidze,Anna Bogdanova, Laura Hertz, Paola Bianchi,Emile van den Akker, Marieke von Lindern,Marc Leonetti, Giampaolo Minetti, et al. ,“Glutaraldehyde–a subtle tool in the investiga-tion of healthy and pathologic red blood cells,”Frontiers in physiology (2019), 10.3389/fphys.2019.00514.[2] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz,“Rotation invariant spherical harmonic rep-resentation of 3D shape descriptors,” in Symposium on geometry processing , Vol. 6 (2003)pp. 156–164.[3] Gerald Lim H. W., Michael Wortis, and RanjanMukhopadhyay, “Stomatocyte–discocyte–echinocyte sequence of the human red blood cell: Evidencefor the bilayer– couple hypothesis from membranemechanics,” , 16766–16769 (2002).[4] Timothy J Larkin, Guilhem Pages, Bogdan EChapman, John EJ Rasko, and Philip WKuchel, “Nmr q-space analysis of canoni-cal shapes of human erythrocytes: stomato-cytes, discocytes, spherocytes and echinocytes,”European Biophysics Journal , 3–16 (2013).[5] D. P. Kingma and J. Ba, “Adam: A methodfor stochastic optimization,” arXiv preprintarXiv:1412.6980 (2014).[6] B.J. Bain, I. Bates, M. Laffan, and Lewis S.M., Prac-tical Haematology (Churchill Livingstone, 2012). upplementary informationSupplementary a r X i v : . [ q - b i o . Q M ] M a y c r o ss e n t r o p y a loss and accuracy W U D L Q L Q J Y D O L G D W L R Q c a t e g o r i c a l a cc u r a c y b S D E s h a p e s c e ll c l u s t e r s k e r a t o c y t e s k n i z o c y t e s m u l t il o b a t e c e ll s a c a n t h o c y t e s un k n o w n predicted classSDE shapescell clusterskeratocytesknizocytesmultilobate cellsacanthocytes a c t u a l c l a ss c confusion matrix Figure 1.
Training and validation loss and accuracy for the first-stage ANN. a
During training, the employedloss function (crossentropy) was minimized throughout 100 epochs. b The categorical accuracy of cell classification forthe training and validation sets converged close to 100 %. The validation split was 20 %. c The accuracy was furtherevaluated by means of a confusion matrix. “Predicted” versus “actual” (manually classified) cells demonstrate very goodconcordance, ranging from 91 % to 100 %. m e a n s q u a r e e rr o r ( M S E ) a training and validation loss W U D L Q L Q J Y D O L G D W L R Q -1.00SP -0.67ST II -0.33ST I 0.00D 0.33E I 0.67E II 1.00E IIISDE p r o b a b ili t y d e n s i t y b benchmark on discocytes . ' ( K L V W R J U D P Figure 2.
Training and validation loss for the second-stage ANN. a
Within 40 epochs, the mean squared error(MSE) of the system was minimized. b Test on a representative set of 308 independent discocytes. The head of thedistribution is allocated at 0 .
05, and the vast majority of cells is located within the range of − .
15 to +0 . -1.00SP -0.67ST II -0.33ST I 0.00D 0.33E I 0.67E II 1.00E III SDE distribution p r o b a b ili t y d e n s i t y total: 1091 = -0.01CI = [-0.39,0.14] total: 1102 = -0.00CI = [-0.43,0.17] total: 1225 = -0.01CI = [-0.47,0.25] total: 889 = 0.05CI = [-0.20,0.24] total: 1002 = -0.01CI = [-0.35,0.16] total: 962 = -0.02CI = [-0.24,0.11] total: 1146 = 0.04CI = [-0.10,0.23] total: 952 = 0.06CI = [-0.32,0.28] total: 926 = -0.02CI = [-0.42,0.16] total: 1132 = -0.04CI = [-0.55,0.14]SDE CC KE KN ML AC U classification p r o b a b ili t y .
979 0 .
000 0 .
005 0 .
003 0 .
000 0 .
000 0 . total: 1114C1 .
968 0 .
002 0 .
004 0 .
005 0 .
001 0 .
000 0 . total: 1138C2 .
969 0 .
002 0 .
008 0 .
004 0 .
000 0 .
004 0 . total: 1264C3 .
950 0 .
007 0 .
012 0 .
007 0 .
000 0 .
000 0 . total: 936C4 .
976 0 .
000 0 .
004 0 .
001 0 .
001 0 .
000 0 . total: 1027C5 .
978 0 .
001 0 .
008 0 .
004 0 .
000 0 .
000 0 . total: 984C6 .
979 0 .
000 0 .
008 0 .
002 0 .
000 0 .
001 0 . total: 1170C7 .
872 0 .
002 0 .
040 0 .
023 0 .
001 0 .
001 0 . total: 1092C8 .
918 0 .
009 0 .
019 0 .
013 0 .
000 0 .
003 0 . total: 1009C9 .
992 0 .
003 0 .
003 0 .
000 0 .
001 0 .
000 0 . total: 1141C10 Figure 3.
Automatic 3D shape recognition of RBCs from 10 healthy subjects.
As expected, in healthyindividuals, shape distributions are centered around discocytes (0 . µ ranging from − .
04 to+0 .
06. The 95 % confidence interval (CI . ) demonstrates that shape distributions do not involve echinocytes, exceptfor C8, which extends to the range of echinocytes I (+0 . Confusion matrix controls a c t u a l c l a ss predicted classConfusion matrix patients a c t u a l c l a ss predicted class A C AC A C AC Figure 4.
Confusion matrices for healthy controls and patients with hereditary spherocytosis comparing“predicted” and “actual” (manually classified) shape classes.
SDE shapes resulted in excellent recognitionaccuracy, ranging from 79 % up to 100 %, based on 2000 randomly picked cells from 5 healthy subjects and 5 patients(10 donors ×
200 cells) that were not included in the training process. Cells belonging to specific classes showed a higherrecognition error, partly due to the higher amount of unknown shapes, especially those occurring in patients, partly dueto the less well-defined shape features that allow us to clearly distinguish each class. The observed error in such classesis related to the intrinsic inaccuracy of the training data.
NOTE ON HEREDITARY SPHEROCYTOSIS AND TESTED PATIENTS
Hereditary spherocytosis is the most common hemolytic anemia in subjects of northern Eu-ropean ethnicity. This disease is mostly inherited in an autosomal dominant manner, althoughin approximately 20 % of cases, inheritance is autosomal recessive or due to de novo muta-tions [1]. The mutated genes code for RBC cytoskeletal proteins, most commonly the anionexchanger band 3 and the cytoskeletal protein ankyrin, as well as spectrin and proteins 4.1and 4.2. Occasionally, patients are affected by a combination of multiple mutations in thesegenes. The defect mostly translates into a deficiency of the mutated protein. However, thegenetic mutation may result in the deficiency of a different protein. An example is defects inband 3 and ankyrin occasionally causing the lack of spectrin integration in the membrane anddegradation of free spectrin molecules, resulting in spectrin deficiency ([2, 3], SupplementaryTable I). Different mutations may occur in the same gene, leading to variations in diseaseseverity [4].Mutations in the abovementioned genes eventually lead to RBC membrane vesiculation thatreduces the cell surface-to-volume ratio, transforming the RBC from a discocyte to a sphere-likeshape [5]. These RBCs are named spherocytes and typically appear in blood smears as cellswith a smaller and circular projected area, devoid of the characteristic central pallor observedin discocytes (4). Such RBCs are less deformable, resulting in a reduced lifespan, which mayeventually lead to anemia. This latter may be severe, moderate, mild or even absent whenRBC loss is balanced by enhanced erythropoiesis.Typical complications involve splenomegaly, reticulocytosis and hemolytic anemia, which canrequire exchange transfusions [6]. The only existing treatment is splenectomy, which improvescell survival and, reduces anemia, reticulocyte count and hyperbilirubinemia, but not the pres-ence of spherocytes. Additional prophylaxis against infections is recommended [7].The variable symptomatology and the numerous mutations make hereditary spherocytosis ahighly heterogeneous disease. Moreover, spherocytes can be observed in other diseases [8] andcan also be present as an artifact of the blood smear technique. Therefore, establishment ofthe correct diagnosis is dependent on several different tests.For the 10 patients presented in this work, the diagnostic criteria were based on the followingevaluations: presence of chronic hemolytic anemia, RBC morphology examination on bloodsmears, eosin-5’-maleimide (EMA) binding test, osmotic fragility test or altered osmotic gradi-ent ektacytometry curve. Confirmation tests included sodium dodecyl sulfate-polyacrylamidegel electrophoresis (SDS-PAGE) analysis of RBC membrane proteins and next-generation se-quencing (NGS) for mutation identification. The mutations and related detected protein defectsare summarized in Table I. Patients P1 and P2 are relatives, as are P3 and P4. The latterwere diagnosed with hereditary spherocytosis, but their mutation could not be detected. Allpatients were heterozygous for their main mutation. P8 was additionally found to be het-erozygous for a point missense mutation (Supplementary Table I) and homozygous for the α LELY low-expression alpha-spectrin variant. The missense mutation is considered a variant ofuncertain significance (VUS), similar to homozygosity for LELY, but the latter may contributeto the eventual defective protein expression. The band 3 mutation is likely pathogenic and themain cause of the disease. a b c d no 'true spherocytes'P5(mutation in ANK-1)P8(mutations inSPTA1 and SLC4A1) P6(mutation in SLC4A1) Figure 5.
Typical RBCs from patients affected by hereditary spherocytosis. Each box shows three differentrotations of the same cell; scale bar=4 µ m. a RBCs from a patient with a mutation affecting ankyrin-1 and the respectiveblood smear in b ; scale bar=10 µ m. While several spherocytes appear on the smear (arrows), 3D reconstructions showdifferent kinds of shapes. Top boxes: mutated proteins are indicated. The top right box in each panel shows a “true”spherocyte from 3 different viewing angles. c and d are patients affected by other mutations. No “true” spherocyteswere observed in 3D in c . d Patient with a mutation in band 3, mostly showing stomatocytes rather than spherocytes.
Table I. Information on the tested hereditary spherocytosis patients.
Patient Mutated gene Corresponding Mutation Phenotypicalprotein defect P1 SLC4A1 band 3 c.2423G>A (p.R808H) spectrin deficiencyP2
SLC4A1 band 3 c.2423G>A (p.R808H) spectrin deficiencyP3 not identified n.a. n.a. spectrin deficiencyP4 not identified n.a. n.a. spectrin deficiencyP5
ANK-1 ankyrin-1 c.2559-2A>G (splicing) predicted skipping of exon 26and frameshift, lossor truncated ankyrinP6
SLC4A1 band 3 c.620delG (p.G207fs) loss or truncated band 3P7
SLC4A1 band 3 c.2279G>A (p.R760Q) band 3 deficiencyP8
SLC4A1 , band 3, c.2279G>A (p.R760Q) band 3 deficiency
SPTA1 spectrin c.3841C>T (p.R1281C)and α LELYP9
SLC4A1 band 3 c.163delC (p.H55TfsX11) band 3 deficiencyP10
SLC4A1 band 3 c.2510C>A (p.T837K) band 3 deficiency osmotic pressure a b c
Figure 6.
SDE training data were obtained by exposing RBCs from healthy donors to a different osmoticpressure.
In isotonic solution, most of the RBCs are discocytes, b . Upon decreasing the osmotic pressure (hypotonicsolution), the RBCs exhibit swelling and transform into stomatocytes and further into spherocytes, a . On the otherhand, echinocytes develop in hypertonic solution, c . Scale bar=2 µ m.[1] K. R. Bridges, H. A. Pearson, et al. , Anemias and other red cell disorders/edited by Kenneth R. Bridges, Howard A.Pearson. (New York: McGraw-Hill Medical, 2008).[2] S. Perrotta, P. G. Gallagher, and N. Mohandas, The Lancet , 1411 (2008).[3] A. Iolascon and R. A. Avvisati, Haematologica , 1283 (2008).[4] D. Dhermy, C. Galand, O. Bournier, L. Boulanger, T. Cynober, P. O. Schismanoff, E. Bursaux, P. Tchernia Boivin,Gil, and M. Garbarz, British journal of haematology , 32 (1997).[5] I. Bernhardt and J. C. Ellory, Red cell membrane transport in health and disease (Springer Science & Business Media,2013).[6] H. Hassoun and J. Palek, Blood reviews , 129 (1996).[7] S. Eber and S. E. Lux, in Seminars in hematology , Vol. 41 (Elsevier, 2004) pp. 118–141.[8] Z. Deng, L. Liao, W. Yang, and F. Lin, Clinica Chimica Acta441