[PDF] A Multi-Scale Conditional Deep Model for Tumor Cell Ratio Counting

Abstract

We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide. We use deep fully convolutional neural network models trained to detect and classify cells on images of H&E-stained tissue sections. Pathologists' labels consisting of exhaustive nuclei locations and tumor regions were used to trained the model in a supervised fashion. We show that combining two models, each working at a different magnification allows the system to capture both cell-level details and surrounding context to enable successful detection and classification of cells as either tumor-cell or normal-cell. Indeed, by conditioning the classification of a single cell on a multi-scale context information, our models mimic the process used by pathologists who assess cell neoplasticity and tumor extent at different microscope magnifications. The ratio of tumor cells can then be readily obtained by counting the number of cells in each class. To analyze an entire slide, we split it into multiple tiles that can be processed in parallel. The overall tumor cell ratio can then be aggregated. We perform experiments on a dataset of 100 slides with lung tumor specimens from both resection and tissue micro-array (TMA). We train fully-convolutional models using heavy data augmentation and batch normalization. On an unseen test set, we obtain an average mean absolute error on predicting the tumor cell ratio of less than 6%, which is significantly better than the human average of 20% and is key in properly selecting tissue samples for recent genetic panel tests geared at prescribing targeted cancer drugs. We perform ablation studies to show the importance of training two models at different magnifications and to justify the choice of some parameters, such as the size of the receptive field.

Full PDF

AA Multi-Scale Conditional Deep Modelfor Tumor Cell Ratio Counting *Eric Cosatto a , Kyle Gerard a , Hans-Peter Graf a , Maki Ogura b , Tomoharu Kiyuna b , Kanako CHatanaka c,d,e , Yoshihiro Matsuno f,e , and Yutaka Hatanaka d,ea Dept. of Machine Learning, NEC Laboratories America, Princeton, NJ, USA b Digital Healthcare Business Development Oﬃce, NEC Corporation, Tokyo, Japan c Clinical Research and Medical Innovation CenterHokkaido University Hospital, Hokkaido, Japan d Research Division of Genome Companion DiagnosticsHokkaido University Hospital, Hokkaido, Japan e Center for Development of Advanced DiagnosticsHokkaido University Hospital, Hokkaido, Japan f Department of Surgical PathologyHokkaido University Hospital, Hokkaido, Japan

ABSTRACT

We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide. We usedeep fully convolutional neural network models trained to detect and classify cells on images of H&E-stainedtissue sections. Pathologists’ labels consisting of exhaustive nuclei locations and tumor regions were used totrained the model in a supervised fashion. We show that combining two models, each working at a diﬀerentmagniﬁcation allows the system to capture both cell-level details and surrounding context to enable successfuldetection and classiﬁcation of cells as either tumor-cell or normal-cell. Indeed, by conditioning the classiﬁcationof a single cell on a multi-scale context information, our models mimic the process used by pathologists whoassess cell neoplasticity and tumor extent at diﬀerent microscope magniﬁcations. The ratio of tumor cells canthen be readily obtained by counting the number of cells in each class. To analyze an entire slide, we split it intomultiple tiles that can be processed in parallel. The overall tumor cell ratio can then be aggregated. We performexperiments on a dataset of 100 slides with lung tumor specimens from both resection and tissue micro-array(TMA). We train fully-convolutional models using heavy data augmentation and batch normalization. On anunseen test set, we obtain an average mean absolute error on predicting the tumor cell ratio of less than 6%,which is signiﬁcantly better than the human average of 20% and is key in properly selecting tissue samples forrecent genetic panel tests geared at prescribing targeted cancer drugs. We perform ablation studies to show theimportance of training two models at diﬀerent magniﬁcations and to justify the choice of some parameters, suchas the size of the receptive ﬁeld.

1. INTRODUCTION AND PURPOSE

Targeted treatment therapies for various types of cancer rely on DNA analysis of patients’ cancer cells to identifythe drugs that would beneﬁt them the most. The mutations and fusions of genes that cause cancer are detectedwith genomics panel tests run on next generation sequencing devices. Input material for these tests comesfrom several types of biopsies, surgical resections and cell-blocks. The tissue samples are ﬁxed in formalinand embedded in paraﬃn blocks (FFPE). Genomics panel tests based on gene sequencing operations require aminimum ratio of tumor cells to be present in the analyzed tissue sections to provide accurate results. Typically,a set of slide-mounted FFPE, unstained, about 5-micron thick sections are required for the test. For example, inFDA-approved FoundatioOne and OncomineDx Target Test, the overall tumor content (ratio) should be morethan 20%. For tissues where the overall tumor content is between 10% and 20%, micro-dissection and enrichment *corresponding author a r X i v : . [ ee ss . I V ] J a n hould be performed to bring up the overall ratio. Currently pathologists manually estimate the overall tumorcell ratio from hematoxylin-eosin (H&E) stained tissue sections, identify areas of high tumor cell ratios and, ifneeded, micro-dissect tissue fragments to enrich the tumor content to over 20%. This process is highly manualand subjective and could beneﬁt from automation. Furthermore, pathologists’ estimation of the tumor cell ratiohas been shown to be inaccurate. In particular, that study showed that 38% of the estimations would haveled to insuﬃcient ratios (less than 20%), possibly causing false negative results on the genomics panel tests. Inaddition, these genomics tests are very costly and time-consuming, generally taking a few weeks to complete, andare destructive. Therefore, a method for accurate ratio counting over the entire tumor area is needed. Manuallycounting individual cells by pathologists does involve counting tens of thousands of individual cells, making ithighly impractical.Our goal is to provide an easy-to-use whole-slide interactive system that helps a pathologist select portionsof tissue where the ratio of tumor cells is above a safe threshold for use in genomics panel tests. We developeda client-server approach where a browser-based client displays the slide and allows free pan/zoom navigationwithin the slide. The user submits analysis request for the entire slide or of a marked portion thereof. The AIanalysis server processes the requested area and returns the location of all cells and the classiﬁcation as tumoror non-tumor. The client browser then displays the tumor cell ratio information to the user such that she candecide which tissue area(s) to select for use in genomics panel tests. An example screen shot of the browser-basedclient tool is shown in Fig. 1.

Figure 1: Screenshot of our browser-based viewer showing the result of analyzing an entire slideof the test set. The overall ratio of tumor cells is shown on top and individual regions are color-coded to visualize the ratios over areas of the slide. Tumor cells are also overlaid in cyan. Theblack dots are sometimes added by pathologists directly on the glass slide to indicate the tumorarea. Our system is able to automatically detect them and analyze the corresponding area. . PREVIOUS WORK

Automated cell counting techniques have been developed in cytology, immuno-histochemistry (IHC) and immuno-ﬂuorescence (IF). In cytology, the most common test is the pap-smear which is automated and was approvedby the United States Food and Drug Administration (FDA) in 1998.

5, 6

Cytology images exhibit cells ﬂoatingin liquid, making the detection and analysis by computer vision systems easier than in tissue sections, wherethe structure of the tissue interferes with the detection and analysis of the cells. In histology, advanced stainingtechniques such as IHC and IF can make certain mutated cells easy to segment from the background by usingsimple color-based analysis techniques. However, such staining is not appropriate for the task of obtaining theratio of tumor cells because only certain types of mutated cells are highlighted by the stain, while the ratiocalculation requires all cells to be counted.While no cell counting systems exist for H&E stains that have been FDA approved, several published researchstudies have addressed the issues of cell detection, classiﬁcation and segmentation. Two general approaches havebeen used to detect cells: the ﬁrst approach starts by detecting candidate objects on the image using imageanalysis techniques and then use machine-learning techniques, such as support vector machines (SVM) orK-nearest-neighbors (KNN), to classify these objects as cells or non-cells using features . Recently, ”deep-learning” approaches have taken over and shown to be superior to image-analysis by learning features of cancerfrom the data instead of relying on ”hand-crafted” heuristic features. Such models have been demonstratedto estimate where cells are located on an image, directly, without the need for explicit object detection. Pushing further, some methods directly regress the number of cells present in an image patch. Although directregression has the advantage of bypassing the explicit detection of cells, it is completely black-box and does notprovide any way to explain the prediction to the user. We feel that, at a minimum, the user should be ableto see which cells are detected and which are classiﬁed as tumor cells, so as to gain conﬁdence in the system’spredictions. We follow the general object detection approach proposed in Lempitsky et al., teaching a deepconvolutional neural network model to learn a mapping between an input image and a density map. The densitymap is generated from the ground-truth-labeled center of cells’ nuclei.Most recent cell classiﬁcation methods employ a deep convolutional neural network (CNN) trained in asupervised fashion with backpropagation to classify a small input image patch into distinct types of cells suchas epithelial, ﬁbroblast, mitotic ﬁgures , etc. Variants of CNN architectures

13, 18 have been proposed for thiskind of approach. Most methods use image data at a resolution of 0.55 microns per pixel (equivalent to a 20Xoptical magniﬁcation) and perform the analysis on a relatively small receptive ﬁeld. For example Basha et al. and Sirinukunwattana et al. use a 32x32 and 27x27 receptive ﬁeld respectively, which corresponds to about 15microns, three times the size of a nucleus. Such a small receptive ﬁeld makes the learning model focus solely onone cell, ignoring the surrounding context.Segmentation of tumor areas is another related active research topic. Deep learning methods have been shownto produce excellent results on general image segmentation tasks and have recently been applied to histologyimages for nuclei segmentation and tumor area segmentation.

23, 24

The general principle is the same asfor nuclei detection. A model is trained to map an input image to a density map representing the tumor area.For tumor segmentation, the working magniﬁcation should be lower than for nuclei detection. While separatingindividual nuclei requires a high magniﬁcation (20X or 40X) and a smaller ﬁeld of view (50 to 100 microns),segmenting a tumor area necessitates a wider ﬁeld of view (500 to 1000 microns) with a lower resolution (5X or10X). Similarly, pathologists observe specimens at low magniﬁcation to understand the extent of the tumor areaand then zoom in to certain areas to observe ﬁner characteristics such as the morphology of nuclei. We followthis principle in our approach for tumor cell ratio counting. We use a high magniﬁcation for detection of nucleito ensure that all nuclei can be counted, even in dense areas. This is followed by an analysis at low magniﬁcationfor tumor area segmentation to ensure that large features of the tumor, such as deformed glands, can be seenby the model. In addition, we combine high and low resolution features to classify individual cells as tumor ornormal. This allows to properly count normal cells appearing in a tumor area, as well as isolated tumor cells ina normal area.Methods using a larger receptive ﬁeld and a fully-convolutional approach using the U-net CNN architec-ture have been demonstrated for cell segmentation in DAPI and FISH staining. Such methods are ideal ofimage segmentation as they produce a binary segmentation map directly from the image input. We adapt thisegmentation method to cell counting by generating target cell location maps where cells are represented byGaussian peaks rather than their actual shape. From the predicted map, the (x,y) location of cells is obtainedusing only local peak detection. This approach avoids the issue of adjacent cells forming clumps that cannot beeasily separated into individual cells for counting. We further adapt the method by multi-tasking detection andclassiﬁcation on the same model by learning to predict two target maps, one for detection, where all cells aredrawn and one for classiﬁcation, where only the tumor cells are drawn.

3. MATERIAL AND METHODS3.1 Data

One hundred WSI slides were obtained from a cohort of 100 lung cancer cases at Hokkaido University Hospital(Hokkaido, Japan). Fifty-ﬁve cases were categorized as adenocarcinoma (AC) and 45 as squamous-cell carcinoma(SC). The specimens were acquired between 2005 and 2010. Thirty slides contained a tissue micro-array (TMA)specimen (15 AC and 15 SC), while seventy slides contained tissue from surgical resections (40 AC and 30 SC).The tissue specimens were prepared with the Formalin-Fixed Paraﬃn-Embedded (FFPE) method and stainedwith Hematoxylin-Eosin (H&E). The slides were scanned using a Hamamatsu scanner into whole slide image(WSI) ﬁles. 131 regions of interest (ROI) were selected by a pathologist (KCH) for annotations. These ROIswere 1980x1980 pixels at level-0 (40X) magniﬁcation, corresponding to a standard microscope high-power ﬁeld.Two types of annotations were obtained from a pathologist: point location of the center of each cell’s nucleusand freehand traces of the tumor areas. We combine two deep neural network (DNN) models and train them in supervised fashion, one to detect andclassify cells as normal or tumor, the other to segment tumor areas. Following the work of Lempitsky et al., we train the DNN models to learn a mapping between an input RGB image patch and a target map.For the ﬁrst model, two target maps are concatenated to form the target output for training the ﬁrst model.One target map represents the position of the center of all cells’ nuclei present in the input image. This mapwill be used for nuclei detection. The second target map represents the positions of the tumor cells’ nuclei and isused for nuclei classiﬁcation. The maps are built by drawing disks at the (x,y) positions of the nuclei center andthen by transforming the disks into Gaussian peaks by convolving a Gaussian kernel with the map (see left sideof Fig. 2 for an illustration of the two nuclei maps and their corresponding input image and annotations). Thereason for this step is to train the model to produce a smooth peak at a nucleus position, making it easier forpost-inference processing to detect them individually, especially in the case where groups of nuclei are bunched-up together. This way, a simple and fast local peak detector checking a 3x3 neighborhood can detect the peaksdirectly from the DNN output map.For the second model, the target map is generated in a similar way, at lower magniﬁcation, by drawing aﬁlled freehand tumor area against the background. See the right side of Fig. 2 for an illustration of the tumorarea map and its corresponding input image and annotations (dashed contour lines). Our choice for model architecture is fully-convolutional. These models are the natural choice for detection ofmultiple objects as they conserve the image’s 2D relationships throughout the layers. We experimented withboth U-net and Resnet with a fully-convolutional head. These models have been applied successfully in awide variety of image problems. We found that, for our application, U-net has a small performance edge, asimpler architecture and smaller footprint.U-net models have an encoder-decoder architecture with a bottleneck in the middle. The number of featureplanes increases as their size decreases in the encoder, with the reverse happening in the decoder. The convolu-tional units in the decoder are implemented with transposed convolutions which can be seen as the gradient ofthe convolution with respect to its input. A graphical representation of the model is shown in ﬁgure 3, left.We choose the number of convolutional blocks such that the model’s receptive ﬁeld is 188x188 pixels, which,at 40X magniﬁcation (4.4 pixels per microns), corresponds to a patch of 43x43 microns (172x172 microns at igure 2: System overview. The left part shows the high-magniﬁcation process of detectingcells and classifying them as tumor, while the right part shows the low magniﬁcation process ofsegmenting the tumor area. Scores from individual cell classiﬁcation and tumor area segmentationare combined to obtain a tumor score for each cell. The ratio of tumor cells can then be readilycomputed. Ground-truth annotations used to generate the targets are shown as overlays on theinput images (green dots mark normal cells, red dots mark tumor cells, dotted teal lines mark thetumor areas). fully convolutional model,we can size it such as to optimize the number of models that can occupy a GPU memory. For example, for a 2model conﬁguration, an input image of 800x800 pixels allows both models to be loaded in a 1080 GPU (8GB).Individual layers of the U-net model are listed in ﬁgure 3. Both the detection/classiﬁcation model and the segmentation model are trained independently using the sameprocedure. Their outputs are combined to detect and classify the cells. Instead of jointly optimizing the models,we train them separately and only subsequently optimize the detection and classiﬁcation thresholds to achievethe lowest possible error on the validation set. This approach is preferable as the input to the two models arediﬀerent and the loss function may be dominated by one model.We partition the annotated data into three sets. Seventy percent of the data is used to train the model, tenpercent is used for validation of the model and the remaining twenty percent is used only for the ﬁnal evaluation.These subsets are built such that all annotation ROIs from a given slide go into the same subset.H&E stained specimens exhibit shades of two staining agents (blue-purple for hematoxylin which colorsthe nuclei, and reddish-pink for eosin which colors the cytoplasm and extracellular matrix). The amount andproportion of staining agents, the age of the sample and the type of scanner signiﬁcantly aﬀect the ﬁnal colorof the pixels. Hence it is necessary, in order to create a robust model, to make sure there is as much stainingand scanning variation as possible in the training set. Unfortunately, it is diﬃcult to procure samples with suchvariations, as cohorts tend to come from the same institution and therefore have been stained and scanned usingthe same protocol. To compensate for this relative uniformity in our training data and to make sure our models igure 3: Architecture of the U-net fully-convolutional model. The left image shows the graphicalrepresentation of the model (for a 572x572 input image). The middle and right tables show theactual size of the layers for the encoder and decoder block respectively, for an 800x800 inputimage and 2 output maps. The total number of parameters is 28,942,850 and the total size of themodel for inference (not counting training gradients) is about 3GB. will perform adequately on specimens from other institutions, we apply data augmentation to simulate variationsin staining, color shifts and image sharpness. To simulate staining variations, we use an optical-density basedstain projection method and shift the pixel intensities in the projection spaces. Figure 4 shows an example ofstain augmentation using our method. To simulate color shifts due to the scanner light, we apply intensity shiftsin the Hue-Saturation-Luminance color space. To simulate variations in scanner optics and focusing mechanism,we apply small amounts of pixel blur/sharpen. Finally, since image orientation does not aﬀect the labels, wealso increase the number of examples by random rotation and mirroring. Figure 4: Left side: receptive ﬁeld of the model at 10X and 40X magniﬁcation. Right side:example of data augmentation by H&E stain shifts.

The model is trained with the binary cross-entropy loss combined with a sigmoid layer σ ( x ): L BCE = − N N (cid:88) n [ y n ln σ ( x n ) + (1 − y n ) ln(1 − σ ( x n ))] (1)where N is the number of pixels in the output map.We experimented with batch normalization and found it to be useful in providing a smooth learning curve.We use the Pytorch toolchain to train our models using the Adam optimizer and a learning rate of 1e-3.Training a model takes about 3 hours on a GPU. To avoid overﬁtting the model on the training set, training isstopped when the loss on the validation set stops to decrease for 4 epochs. At each epoch the model is trainedwith 4000 examples and achieves convergence in about 50 epochs. Figure 5 shows the training curve of ourmodel. igure 5: Training curve showing the loss at each epoch. In red is the loss over the training set,while in blue is the loss over the validation set. The training is stopped when the loss on thevalidation set ceases to decrease. So far the models have been trained individually to minimize the cross-entropy loss, which is a reconstructionloss on the detection, classiﬁcation and segmentation maps. The real goal, however, is to obtain a list of cells,each with a ( x, y ) location and a tumor score. The detection/classiﬁcation model outputs two ﬂoating-pointmaps: map d and map c , while the segmentation model outputs map s . From map d , the detection of the cells isperformed using local peak detection of 3x3 pixel neighborhood on the detection map (see ﬁgure 6 center). Sincethe model is trained to predict smooth peaks, non-maxima suppression is not necessary and every detected peakis considered a candidate cell. The three intensity values at the map k ( x, y ) locations of cell i is recorded for allthree maps resulting in a feature vector f i = [ I d , I c , I s ].A classiﬁer is then constructed with two thresholds, the detection threshold t d and the classiﬁcation threshold t c . Considering the feature vector f i = [ I d , I c , I s ] of all candidate cells, cells where I d < t d are discarded. Then,cells where αI c + (1 − α ) I s > t c are classiﬁed as tumor cells. α is a hyper-parameter that weighs the classiﬁcationand the segmentation model output to generate a score. We use a value of 0 . The ﬁrst step is to evaluate the detection accuracy of the detection model. We declare a detected cell a matchwhen the detected peak ( x, y ) location is within a distance of 3.2 microns (14 pixels at 40X magniﬁcation - 4.4microns per pixel resolution) of a labeled cell. Each detected cell is matched to its closest unmatched label,in a greedy way. Matched cells are true positives (TP), leftover unmatched cells are false positives (FP) andunmatched labels are false negatives (FN). The detection accuracy is thus computed as

ACC ( det ) = T PT P + F P + F N ,the precision as

P RE ( det ) = T PT P + F P and the recall as

REC ( det ) = T PT P + F N . The detection F score, also knownas the Dice coeﬃcient, is deﬁned as F ( det ) = P RE ( det ) REC ( det ) P RE ( det )+ REC ( det ) . igure 6: Detection output map (center) and its 3D view (right) for the input image (left).Locations of peaks are overlaid on the images. For classiﬁcation, we calculate the accuracy using the matched detected cells only. A matched cell that isclassiﬁed as tumor and which label is also tumor is declared a true positive (TP). A matched cell that is classiﬁedas tumor and which label is normal is declared a false positive (FP). A matched cell that is classiﬁed as normaland which label is also normal is declared a true negative (TN). A matched cell that is classiﬁed as normaland which label is tumor is declared a false negative (FN). The classiﬁcation accuracy is thus computed as

ACC ( cla ) = T P + T NT P + T N + F P + F N . For tumor cells (positives), the precision is calculated as

P RE pos ( cla ) = T PT P + F P and the recall as

REC pos ( cla ) = T PT P + F N . For normal cells (negatives), the precision as

P RE neg ( cla ) = T NT N + F N and the recall as

REC neg ( cla ) = T NT N + F P . The classiﬁcation F score is deﬁned as F ( cla ) = P RP + R where P = P RE pos ( cla )+ P RE neg ( cla )2 and R = REC pos ( cla )+ REC neg ( cla )2 are the average precision and recall over thedetected tumor cells and non-tumor cells.The predicted ratio of tumor cells is deﬁned as (cid:91) T CR = N T N where N T is the number cells classiﬁed as tumorand N is total number of detected cells. We create an evaluation set by combining the training and the validation sets. Using this evaluation set, weperform a grid search for the best threshold pair ( t d , t c ). For the detection threshold t d , the criterion we use isto maximize the detection F score. For classiﬁcation, the criterion is to minimize the mean absolute error onthe predicted ratio: E T CR = 1 N N (cid:88) n (cid:12)(cid:12)(cid:12) (cid:91) T CR − T CR (cid:12)(cid:12)(cid:12) (2)Since we use two distinct criterion, each threshold is searched independently: ﬁrst, the optimal detectionthreshold, followed by the classiﬁcation threshold. We found that jointly searching for both thresholds using asingle classiﬁcation criteria yielded worse results and took longer to perform. Once the optimal thresholds areobtained on the evaluation set, the ﬁnal performance on the test set can be obtained.

4. RESULTS

In a ﬁrst experiment, we investigate the importance of the magniﬁcation level. For the detection of cells,the plot 7.a) shows that higher magniﬁcations are more accurate, with a dramatic drop in accuracy betweenmagniﬁcations 16X and 12X. This is easy to understand intuitively as the cells become closer and increasinglyharder to tell apart at lower magniﬁcations. To show the impact of magniﬁcation on the classiﬁcation of cells, weuse ground-truth cell positions and use the tumor cell density output map to classify them as normal or tumor.he trend on plot 7.b) shows a clear decrease for higher magniﬁcations. The intuition here is that a larger ﬁeldof view is useful to understand the exact extent of the tumor. We then apply detection and classiﬁcation, bothat the same magniﬁcation. The trend on plot 7.c) shows a maximum at around 24X. This can be understoodas a combination of the two previous experiments, exhibiting a sweet spot where each cell’s nucleus can still beaccurately separated from the others, while providing enough context to classify them as tumor or normal.

Figure 7: Importance of magniﬁcation level (x-axis: resizeFactor=1.0 ≡ ≡ In light of the above result and to take full advantage of both the higher detection accuracy at highermagniﬁcations and the higher classiﬁcation accuracy at lower magniﬁcations, we train two separate models atdiﬀerent magniﬁcations. We then combine these models using additive ensembling and report the results in table1. The best result of 5.3% mean absolute error (MAE) on the predicted ratio is obtained by detection at 20X,followed by adding the score from the 20X and 10X tumor cell density maps. This approach can also be seenas conditioning the classiﬁcation of detected cells on a larger context of tissue. We also show that the U-netarchitecture returns slightly better results.

Table 1: Results for 2 model architectures and 2 conﬁgurations of the detection/classiﬁcation(DT+CL) model and the segmentation model (SEG). We report E T CR , the MAE of the predictedtumor cell ratio and the processing speed in mm /sec . We trained these models on 3 diﬀerentrandom partitions of the data and report the mean values and standard deviations. Resnet50+FC head number of model(s) E T CR mm /sec DT+CL@20X 1 6.6% ± ± E T CR mm /sec DT+CL@20X + SEG@10X 2 ± ± F score and table 3 summarizesits processing time on various hardware conﬁgurations. able 2: Cell-level accuracy and F1-score for the U-net DT+CL@10X conﬁguration. We reportthe detection accuracy and F1 score as well as the classiﬁcation accuracy and F1-score. Theclassiﬁcation accuracy and F1-score are obtained on the correctly detected cells. We trained thismodel on 3 diﬀerent random partitions of the data and report the mean values and standarddeviations. Detection accuracy % 92.9 ± ± ± ± Table 3: Left: speed and acceleration factors for various CPU/GPU hardware setups and for a1-model DT+CL@10X conﬁguration (see ﬁgure 7). The speed is given in square millimeters oftissue processed per second. A large resection tissue sample may be as large as 600 mm , whilecore needle samples are usually about 10 mm . Right: timing proﬁle for the fastest conﬁguration. Hardware conﬁguration mm /sec speedup3 GPUs and 9 CPU cores 3.7 46 X3 GPUs and 6 CPU cores 3.3 40 X2 GPUs and 4 CPU cores 2.5 30 X1 GPU and 2 CPU cores 1.7 20 X0 GPU and 4 CPU cores 0.18 2 X0 GPU and 1 CPU core 0.08 1 X Operation % of total timeRead pixels 42Model inference 36Save result 11Peak detect 7Model setup 4

5. CONCLUSIONS

We show that conditioning the classiﬁcation of detected cells on tissue context information provides increasedaccuracy. Hence, we propose a model that combines high magniﬁcation cell detection with low magniﬁcationtumor area segmentation, taking advantage of high resolution to accurately separate cells, while using a largerﬁeld of view to better classify them as normal or tumor cells. We use fully convolutional DNN architecturesand predict density maps from which cell counts can be readily extracted. Finally, we propose a whole-slideapproach to calculate the tumor cell ratio by splitting the tissue into square tiles, processing them in paralleland displaying a heatmap of the tumor cell ratio in a browser-based client. The achieved MAE of 5.3% ± We are currently collecting an extended dataset from diﬀerent hospitals to test oursystem for variations in staining protocols. We expect it to be robust against such variation because of our dataaugmentation approach during the training of the models, which simulates such variations.Our main contributions are: 1) a novel multi-scale DNN tumor cell detection and classiﬁcation model whichtakes advantage of both high magniﬁcation to accurately distinguish individual cells and low magniﬁcation toclassify them as tumor or normal based on a large tissue context; 2) a novel whole-slide tumor cell ratio counterthat is highly accurate at 6% mean absolute error.

REFERENCES

Journal of clinical pathology (11), 923–931 (2014).4] Smits, A. J., Kummer, J. A., De Bruin, P. C., Bol, M., Van Den Tweel, J. G., Seldenrijk, K. A., Willems,S. M., Oﬀerhaus, G. J. A., De Weger, R. A., Van Diest, P. J., et al., “The estimation of tumor cell percentagefor molecular testing by pathologists is not accurate,” Modern Pathology (2), 168–174 (2014).[5] Bergeron, C., Masseroli, M., Ghezi, A., Lemarie, A., Mango, L., and Koss, L. G., “Quality control of cervicalcytology in high-risk women. papnet system compared with manual rescreening.,” Acta cytologica (2),151 (2000).[6] Tench, W. D., “Validation of autopap primary screening system sensitivity and high-risk performance.,” Acta cytologica (2), 296–302 (2002).[7] Cortes, C. and Vapnik, V., “Support vector machine,” Machine learning (3), 273–297 (1995).[8] Cosatto, E., Miller, M., Graf, H. P., and Meyer, J. S., “Grading nuclear pleomorphism on histologicalmicrographs,” in [ ], 1–4, IEEE (2008).[9] Arteta, C., Lempitsky, V., Noble, J. A., and Zisserman, A., “Learning to detect cells using non-overlappingextremal regions,” in [ International Conference on Medical Image Computing and Computer-Assisted In-tervention ], 348–356, Springer (2012).[10] Sertel, O., Kong, J., Shimada, H., Catalyurek, U. V., Saltz, J. H., and Gurcan, M. N., “Computer-aidedprognosis of neuroblastoma on whole-slide images: Classiﬁcation of stromal development,”

Pattern recogni-tion (6), 1093–1103 (2009).[11] Lempitsky, V. and Zisserman, A., “Learning to count objects in images,” in [ Advances in neural informationprocessing systems ], 1324–1332 (2010).[12] Xie, W., Noble, J. A., and Zisserman, A., “Microscopy cell counting and detection with fully convolu-tional regression networks,”

Computer methods in biomechanics and biomedical engineering: Imaging &Visualization (3), 283–292 (2018).[13] Sirinukunwattana, K., Raza, S. E. A., Tsang, Y.-W., Snead, D. R., Cree, I. A., and Rajpoot, N. M.,“Locality sensitive deep learning for detection and classiﬁcation of nuclei in routine colon cancer histologyimages,” IEEE transactions on medical imaging (5), 1196–1206 (2016).[14] Xue, Y., Ray, N., Hugh, J., and Bigras, G., “Cell counting by regression using convolutional neural network,”in [ European Conference on Computer Vision ], 274–290, Springer (2016).[15] LeCun, Y., Bottou, L., Bengio, Y., and Haﬀner, P., “Gradient-based learning applied to document recogni-tion,”

Proceedings of the IEEE (11), 2278–2324 (1998).[16] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning representations by back-propagatingerrors,” nature (6088), 533–536 (1986).[17] Malon, C. D. and Cosatto, E., “Classiﬁcation of mitotic ﬁgures with convolutional neural networks andseeded blob features,” Journal of pathology informatics (2013).[18] Basha, S. S., Ghosh, S., Babu, K. K., Dubey, S. R., Pulabaigari, V., and Mukherjee, S., “Rccnet: Aneﬃcient convolutional neural network for histological routine colon cancer nuclei classiﬁcation,” in [ ], 1222–1227, IEEE(2018).[19] He, K., Gkioxari, G., Doll´ar, P., and Girshick, R., “Mask r-cnn,” in [ Proceedings of the IEEE internationalconference on computer vision ], 2961–2969 (2017).[20] Naylor, P., La´e, M., Reyal, F., and Walter, T., “Segmentation of nuclei in histopathology images by deepregression of the distance map,”

IEEE transactions on medical imaging (2), 448–459 (2018).[21] Graham, S., Vu, Q. D., Raza, S. E. A., Azam, A., Tsang, Y. W., Kwak, J. T., and Rajpoot, N., “Hover-net: Simultaneous segmentation and classiﬁcation of nuclei in multi-tissue histology images,” Medical ImageAnalysis , 101563 (2019).[22] Raza, S. E. A., Cheung, L., Shaban, M., Graham, S., Epstein, D., Pelengaris, S., Khan, M., and Rajpoot,N. M., “Micro-net: A uniﬁed model for segmentation of various objects in microscopy images,” Medicalimage analysis , 160–173 (2019).[23] Lahiani, A., Gildenblat, J., Klaman, I., Navab, N., and Klaiman, E., “Generalising multistain immunohis-tochemistry tissue segmentation using end-to-end colour deconvolution deep neural networks,” IET ImageProcessing (7), 1066–1073 (2019).24] Xu, J., Luo, X., Wang, G., Gilmore, H., and Madabhushi, A., “A deep convolutional neural network forsegmenting and classifying epithelial and stromal regions in histopathological images,” Neurocomputing ,214–223 (2016).[25] Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image seg-mentation,” in [

International Conference on Medical image computing and computer-assisted intervention ],234–241, Springer (2015).[26] Rajkumar, U., Turner, K., Luebeck, J., Deshpande, V., Chandraker, M., Mischel, P., and Bafna, V.,“Ecseg: semantic segmentation of metaphase images containing extrachromosomal dna,” iScience Thirty-ﬁrst AAAI conference on artiﬁcial intelligence ], (2017).[29] Ruifrok, A. C., Johnston, D. A., et al., “Quantiﬁcation of histochemical staining by color deconvolution,”

Analytical and quantitative cytology and histology (4), 291–299 (2001).[30] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,Antiga, L., et al., “Pytorch: An imperative style, high-performance deep learning library,” in [ Advances inneural information processing systems ], 8026–8037 (2019).[31] Zhang, Z., “Improved adam optimizer for deep neural networks,” in [2018 IEEE/ACM 26th InternationalSymposium on Quality of Service (IWQoS)