Hierarchical Graph Representations in Digital Pathology
Pushpak Pati, Guillaume Jaume, Antonio Foncubierta, Florinda Feroce, Anna Maria Anniciello, Giosuè Scognamiglio, Nadia Brancati, Maryse Fiche, Estelle Dubruc, Daniel Riccio, Maurizio Di Bonito, Giuseppe De Pietro, Gerardo Botti, Jean-Philippe Thiran, Maria Frucci, Orcun Goksel, Maria Gabrani
HHierarchical Cell-to-Tissue Graph Representations for Breast Cancer Subtyping inDigital Pathology
Pushpak Pati* , Guillaume Jaume* , Antonio Foncubierta-Rodr´ıguez Florinda Feroce Anna Maria Anniciello Giosue Scognamiglio Nadia Brancati Maryse Fiche Estelle Dubruc Daniel Riccio Maurizio Di Bonito Giuseppe De Pietro Gerardo Botti Jean-Philippe Thiran Maria Frucci Orcun Goksel Maria Gabrani IBM Research Zurich,Switzerland ETH Zurich,Switzerland EPFL,Switzerland IRCCS-Fondazione Pascale,Italy ICAR-CNR,Italy Aurigen,Switzerland CHUV,Switzerland
Abstract —Cancer diagnosis and prognosis for a tissue spec-imen are heavily influenced by the phenotype and topologicaldistribution of the constituting histological entities. Thus,adequate tissue representation by encoding the histologicalentities, and quantifying the relationship between the tissuerepresentation and tissue functionality is imperative for com-puter aided cancer patient care. To this end, several approacheshave leveraged cell-graphs, that encode cell morphology andcell organization, to denote the tissue information, and utilizegraph theory and machine learning to map the representationto tissue functionality. Though cellular information is crucial,it is incomplete to comprehensively characterize the tissue.Therefore, we consider a tissue as a hierarchical compositionof multiple types of histological entities from fine to coarselevel, that depicts multivariate tissue information at multiplelevels. We propose a novel multi-level hierarchical entity-graphrepresentation to depict a tissue specimen, which encodes mul-tiple pathologically relevant entity types, intra- and inter-levelentity-to-entity interactions. Subsequently, a hierarchical graphneural network is proposed to operate on the hierarchicalentity-graph representation to map the tissue structure to tissuefunctionality. Specifically, we utilize cells and tissue regions ina tissue to build a HierArchical Cell-to-Tissue (HACT) graphrepresentation, and HACT-Net, an instance of message passinggraph neural network, to classify histology images. As partof this work, we propose the BReAst Carcinoma Subtyping(BRACS) dataset, a large cohort of Haematoxylin & Eosinstained breast tumor regions-of-interest, to evaluate and bench-mark our proposed methodology against pathologists and state-of-the-art computer-aided diagnostic approaches. Thoroughcomparative assessment and ablation studies demonstrated thesuperior classification efficacy of the proposed methodology.
Keywords -Digital pathology, Breast cancer classification, Hi-erarchical tissue representation, Hierarchical graph neuralnetwork, Breast cancer dataset
I. I
NTRODUCTION
Breast cancer is the second most commonly diagnosedcancer and registers the second-highest number of deathsfor women with cancer [1]. A study by [2] exhibits thatintensive early diagnostic activities have improved 5-yearsurvival to 85% during 2005–09 for breast cancer patients. Early diagnosis of cancer, primarily through manual in-spection of pathology slides by pathologists, enables theacute assessment of risk and facilitates optimal treatmentplan. Though the diagnostic criteria for breast cancer areestablished, the continuum of histologic features phenotypedacross the diagnostic spectrum prevents the distinct demarca-tion. Thus, manual inspection is tedious and time-consumingwith significant intra- and inter-observer variability [3], [4].Increasing incidence rate of breast cancer cases per year [5]and the complications in manual diagnosis demand forautomated computed aided diagnostics.Whole-slide scanning systems empowered rapid digitiza-tion of pathology slides into high-resolution whole-slide im-ages (WSIs), thereby profoundly transforming pathologists’daily practice [6]. Further, they enabled computed aided di-agnostics to leverage artificial intelligence [7], [8], especiallydeep learning, to address various digital pathology tasks inbreast diagnosis [9], such as nuclei segmentation [10], [11],nuclei classification [12], [13], gland segmentation [14],[15], tumor detection [16]–[18], tumor staging [16], [19] etc.Deep learning techniques primarily employ ConvolutionalNeural Networks (
CNN ) [20], [21] to process pathologyimages in a patch-wise manner.
CNN s extract representativepatterns from patches and aggregate the patch represen-tations to perform image-level tasks. However, patch-wiseprocessing suffers from the trade-off between the resolutionof operation and the acquisition of adequate context [22],[23]. Operating at a higher resolution captures local cellularinformation but limits the field-of-view due to computationalburden, thereby limits the accessibility of global informationcapturing the tissue microenvironment. In contrast, operatingat a lower resolution hinders access to the cellular microen-vironment. [22]–[24] have proposed
CNN -based methodsto address the issue with visual context, but the
CNN s,operating on fix-sized input patches, are confined to a fix-sized field-of-view. The fix-sized inputs also restrict the diag-nosis to adapt to regions-of-interest with varying dimensions.Further, the pixel-based processing in
CNN s disregards the a r X i v : . [ c s . C V ] F e b otion of pathologically comprehensible entities [25], suchas cells, tissue types, or glands. The absence of the notion ofrelevant entities limits the interpretability of the CNN ’s inputspace and the utilization of well-established entity-level priorpathological knowledge in
CNN s’ operation. Additionally,
CNN s disregard the structural composition of tissue, wherefine entities hierarchically constitute to form coarse entities,such as, epithelial cells organize to form epithelium, whichfurther constitutes to form glands.In this paper, we address the aforementioned limitationsby shifting the analytical paradigm from pixel to entity-based processing. In an entity paradigm, a histology imageis described as an entity-graph, where nodes and edgesof the graph denote biological entities and inter-entity in-teractions. An entity-graph can be configured in terms ofthe type of entity set, entity attributes, graph topology, etc.by leveraging task-specific prior pathological knowledge.Thus, the graph representation enables pathology-specific in-terpretability and human-machine co-learning. Additionally,the graph representation is memory efficient compared topixelated pathology images and can seamlessly describe alarge tissue region. [26] first introduced cell-graphs usingcells as the entity type. Though a cell-graph efficientlyencodes the cell microenvironment, it cannot extensivelycapture the tissue microenvironment, i . e ., the distributionof tissue regions such as necrosis, stroma, epithelium, etc.Similarly, a tissue-graph comprising of the set of tissueregions cannot depict the cell microenvironment. Therefore,an entity-graph representation using a single type of entityset is insufficient to comprehensively describe the tissuestructure in a histology image. To address the limitation,we propose a multi-level entity-graph representation, i . e .,HierArchical Cell-to-Tissue ( HACT ), consisting of multipletypes of entity sets, i . e ., cells and tissue regions, to encodeboth cell and tissue microenvironment. The multiset ofentities is inherently coupled depicting tissue compositionand tissue information at multiple scales. The HACT graphencodes the individual entity attributes and the intra- andinter-entity relationships among the multiset of entities tohierarchically describe a pathology image from fine cellsto coarse tissue regions. Upon the graph construction, agraph neural network (
GNN ), a deep learning techniqueoperating on graph-structured data, processes the entity-graph to perform image analysis. Specifically, we introducea hierarchical
GNN , HierArchical Cell-to-Tissue Network(
HACT -Net), to sequentially operate on
HACT graph, fromfine-level to coarse-level, to result in a fixed dimensionalembedding for the image. The embedding encodes themorphological and topological distribution of the multisetof entities in the tissue. Interestingly, the proposed method-ology resembles the tissue diagnostic procedure in clinicalpractice, where a pathologist hierarchically analyzes a tissue.We propose a methodology that consists of the
HACT graph construction and
HACT -Net based histology image analysis. We characterize breast tumor regions-of-interest(
TRoI s) to evaluate our methodology. The preliminary ver-sion of this work is presented by [27]. The work has beensubstantially extended by, 1) improving the
HACT repre-sentation and
HACT -Net architecture, 2) dataset extension(2 × ), 3) detailed ablation studies and evaluation on publicdata, and 4) benchmarking against independent pathologists.More specifically, the major contributions of this paper are, • A novel hierarchical entity-graph representation(
HACT ) and hierarchical learning (
HACT -Net)methodology for analyzing histology images. • BReAst Carcinoma Subtyping (BRACS) dataset , alarge cohort of breast TRoI s annotated with sevenbreast cancer subtypes. BRACS includes challengingatypical cases and a wide variety of
TRoI s representinga more realistic breast cancer analysis. • The proposed methodology is benchmarked on theBRACS dataset by comparing with three independentpathologists for breast cancer subtyping. Upon exten-sive assessment, we demonstrate comparable classi-fication performance as the pathologists on per-classand aggregated classification. Also, our methodologyoutperforms several recent
CNN and
GNN approachesfor cancer subtyping.II. R
ELATED WORK
Tumor subtyping in digital pathology:
Several deeplearning algorithms have been proposed to categorize pathol-ogy images or whole-slide images into cancer subtypes [8],[16], [28]–[31]. Most of the algorithms employ
CNN s toaddress the task in a patch-wise manner. [19], [31]–[33] uti-lize
CNN to categorize breast cancer histology images intovarious subtypes. These methodologies employ single streampatch-wise approaches to capture local patch-level context,unify patch-level information via several aggregation strate-gies, and classify the image using aggregated information.However, single-stream approaches do not capture adequatecontext from neighboring tissue microenvironment to aptlyclassify a patch. In [23] this issue is addressed by includ-ing multi-scale information from concentric patches acrossdifferent magnifications. [24] propose a neural image com-pression methodology, where WSIs are compressed using aneural network trained in an unsupervised fashion, followedby a
CNN training on the compressed image representationsto predict image labels. [34] include an attention moduleand auxiliary task to improve neural image compression forclassifying histology images. [35] propose a hybrid convo-lutional and recurrent deep neural network to utilize thespatial correlations among patches for pathological imageclassification. [22] propose a stacked
CNN architecture tocapture large context and perform end-to-end processing The BRACS dataset for breast cancer subtyping will be shortly releasedto the community f large-sized histology images. [36] propose a streaming
CNN to accommodate multi-megapixel images. [37] utilizemultiple instance learning approach to process whole-slideimages in an end-to-end manner. [38] extend the multipleinstance learning to automatically identify sub-regions ofhigh diagnostic value, through an attention mechanism,and improve the classification. Though the aforementionedmethodologies employ different strategies to capture usefulcontext information, they still operate on a square andfix-sized input image. However, in reality,
TRoI s can beof highly varying dimensions and shapes depending onthe cancer subtype and the site of tissue extraction. Ourproposed entity-graph methodology can acquire both localand global context from arbitrary-sized
TRoI s to addressthe aforementioned limitations.
Graphs in digital pathology:
Entity-graph-based tissuerepresentation can effectively describe the phenotypical andstructural properties of tissue by incorporating morphology,topology, and interactions among biologically defined tissueentities. The graph topology is heuristically defined using k-Nearest Neighbors, probabilistic modeling, Waxman modeletc. [39]. Subsequently, a CG is processed by classicalmachine learning techniques in [39] or GNN s in [27], [40]–[42] to map to tissue function. Recently, improved graphrepresentations using patches [43] and tissue regions [27] areproposed to enhance the tissue structure-function mapping.Other graph-based applications in computational pathologyinclude cellular community detection [44], WSI classifi-cation [45], [46] etc. Notably, entity-graph representationconsists of biological entities, which the pathologists canreadily relate to. Thus, graph-based analysis allows to incor-porate pathologically defined task-specific entity-level priorknowledge to construct meaningful tissue representation.Further interesting research is conducted to enable the inter-pretability and explainability of the graph-based networksto pathologists. To this end, [40] analyzes the cluster as-signment of the nodes in CG to group cells according totheir appearance and tissue types. [47] introduce a post-hoc graph-pruning explainer to identify decisive cells andinteractions. [48] employ robust spatial filtering that utilizesan attention-based GNN and node occlusion to highlight cellcontributions. [49] propose a set of quantitative metrics usingpathologically measurable cellular properties to characterizeseveral graph explainability algorithms in CG analysis.III. P RELIMINARIES
A. Notation
We define an attributed undirected entity graph G :=( V, E, H ) as a set of nodes V , edges E , and node features H . Each node v ∈ V is represented by a feature vector h ( v ) ∈ R d , thus, H ∈ R | V | × d . d denotes the numberof features per node, and | . | denotes set cardinality. Anedge between two nodes u, v ∈ V is denoted as e uv .The graph topology is described by a symmetric adjacency matrix A ∈ R | V | × | V | , where A u,v = 1 if e uv ∈ E . Theneighborhood of a node v ∈ V is denoted as N ( v ) := { u ∈ V | v ∈ V, e uv ∈ E } . B. Graph Neural Networks
GNN [50]–[54] defines a class of neural networks thatextend the concept of convolution to operate on graph-structured data. Specifically, we employ message passingbased
GNN in this work. In message passing
GNN [55],the node features h ( v ) , ∀ v ∈ V are iteratively updated in atwo-step procedure, i) AGGREGATE , and ii)
UPDATE . Inthe
AGGREGATE step for node v , the features of neigh-boring nodes N ( v ) are aggregated into a single featurerepresentation. In the UPDATE step, the features of node v is updated by using the current node features and theaggregated representation from the AGGREGATE step. Aseries of T such iterations, i . e ., the number of GNN layers,are employed to obtain updated node features ∀ v ∈ V up to T -hops. Finally, the node features h T ( v ) are pooled in the READOUT step to build a fix-sized graph-level embedding h G . The AGGREGATE , UPDATE and
READOUT functionmust be differentiable to allow back-propagation. Addition-ally, the
AGGREGATE and
READOUT function must bepermutation-invariant such that the aggregated representa-tion is invariant to the node ordering. Formally, the threesteps are presented as, a t + 1 ( v ) = AGGREGATE ( { h t ( u ) : u ∈ N ( v ) } ) h t + 1 ( v ) = UPDATE ( h t ( v ) , a t + 1 ( v ) ) h G = READOUT ( { h T ( v ) : v ∈ V } ) (1)An important aspect of designing a GNN is the char-acterization of its expressive power. A
GNN has a strongexpressive power if it can map two non-isomorphic graphsto two unique graph embeddings, thus imparting an injectivemapping between the graph space and the graph-embeddingspace. A line-of-research exploring the expressive power of
GNN s [52], [56], [57] highlight the connection betweeniterative message passing procedure of
GNN and the popularWeisfeiler-Lehman (WL) [58] test for graph isomorphism.It is established that architectures such as the Graph Isomor-phism Network [52] (
GIN ) can perform as well as the 1-dimensional WL test for countable node feature spaces, i . e .,when the node features are discrete. An example of graphwith discrete node features can be the study of moleculedesign, where the nodes represent atoms that are discretein nature. Recent studies show that in case of continuous node features, e . g ., CNN -based node embeddings, the useof multiple permutation invariant aggregators, such as sum,max, mean etc., can help to build expressive
GNN s [59],[60]. To this end, [60] proposed the Principal NeighbourhoodAggregation (
PNA ) network by using a combination ofaggregators followed by degree-scalers. The series of ag-gregators replace the sum operation in
GIN and the degree- a) GIN (b)
PNA
Figure 1: Overview of
GIN and
PNA layers. h tv and { h tu } denote the representation of node v and its neighbors at layer t . GIN uses sum as the
AGGREGATE function, followed bya sum and multi-layer perceptron (
MLP ) for the
UPDATE function.
PNA uses a set of aggregators (element-wisemean, standard deviation, maximum and minimum) followedby degree-scalers (identity, amplifier and dampener) as the
AGGREGATE function. The
UPDATE function consists ofa concatenation followed by a
MLP .scalers amplify or dampen neighboring aggregated messageaccording to the node’s degree. Overviews of the
GIN and
PNA architectures are presented in Figure 1.IV. M
ETHODOLOGY
In this section, we specify the details of the proposedmethodology, as shown in Figure 2, for hierarchical tissueanalysis. For an input Hematoxylin and Eosin (H&E) stainedpathology
TRoI , first, we apply pre-processing measuresto standardize the input. Subsequently, we identify patho-logically relevant entities and construct a
HACT graphrepresentation for the
TRoI by encoding the morphologicaland topological information of the identified entities. Finally,
HACT -Net, a hierarchical
GNN , is employed to map the
HACT graph representation to the corresponding cancersubtype.
A. Pre-processing
H&E stained tissue specimens exhibit appearance vari-ability due to various reasons, such as specimen prepara-tion technique, staining protocols like temperature of theadopted solutions, fixation characteristics, imaging devicecharacteristics, etc. Such variability in appearance adverselyimpacts the model performance of downstream diagnosisas demonstrated by [61], [62]. To alleviate the appearancevariability, we employ the unsupervised reference-free stainnormalization algorithm proposed by [63]. The algorithm isbased on the principle that the RGB color of each pixel isa linear combination of two unknown stain vectors, Hema-toxylin and Eosin, which need to be estimated. First, the algorithm estimates the stain vectors of the
TRoI by using aSingular Value Decomposition of the non-background pixels.Second, the algorithm applies a correction to account forthe intensity variations due to noise. The algorithm doesnot involve any intermediate step that requires training ofmodel parameters, thus it is computationally inexpensive.Specifically, we employ a scalable and fast pipeline proposedby [64] to conduct the stain normalization.
B. Graph representation
Stain normalized
TRoI s are processed to identify relevantentities and construct the hierarchical entity-graph represen-tations. In this work, we consider nuclei and tissue compo-nents as the entities to build
HACT graph representations.Specifically, a
HACT graph consists of three components, 1)a low-level cell-graph, capturing cell morphology and inter-actions, 2) a high-level tissue-graph, capturing morphologyand spatial distribution of tissue regions, and 3) cells-to-tissue hierarchies, encoding the relative spatial distributionof the cells with respect to the tissue distribution. Thedetails of the components are presented in the followingsubsections.
1) Cell-graph representation:
A cell-graph ( CG ) charac-terizes low-level cell information, where the nodes representcells encoding cell morphology and edges encode cellularinteractions depicting cell topology. A CG is constructedin three steps, i) nuclei detection, ii) nuclei feature extrac-tion, and iii) configuration of CG topology. The steps aredemonstrated in Figure 3.Precise nuclei detection leads to reliable CG representa-tion. To this end, we use HoVer-Net, a nuclei segmentationnetwork proposed by [11], pre-trained on MoNuSeg datasetby [10]. HoVer-Net leverages the instance-rich informationencoded within the vertical and horizontal distances of nu-clear pixels to their centers of mass. These distances are usedto separate clustered nuclei, resulting in an accurate seg-mentation, particularly in areas with overlapping instances.Notably, HoVer-Net identifies several nuclei subtypes despitelarge inter- and intra-subtype variability. The centroids of thesegmented instances are used as the spatial coordinates ofthe nodes in CG .Following nuclei detection, morphological features areextracted by processing patches of size h × w centeredaround nuclei centroids via ResNet [65] architecture pre-trained on ImageNet dataset [66]. Spatial features of thenuclei are extracted as the spatial coordinates of the nucleinormalized by the TRoI dimensions. The morphological andspatial features constitute the nuclei features grouped in anode feature matrix as, H CG ∈ R | V CG | × d CG .To generate CG topology E CG , we build upon the factthat spatially close cells have stronger interactions [67] andshould therefore be connected, and distant cells have weakcellular interactions and should remain disconnected. To thisend, we use the k-Nearest Neighbors (kNN) algorithm toigure 2: Overview of the proposed hierarchical entity-graph based tissue analysis methodology. Post pre-processingmeasures, the methodology constructs a hierarchical entity-graph representation of a tissue, and processes via a hierarchicalgraph neural network to map the tissue composition to respective tissue category.build an initial topology, that we subsequently prune byremoving edges longer than a threshold distance d min . Weuse L -norm between nuclei centroids in the image spaceto quantify the cellular distance. Formally, for each node v ,an edge e vu is built if, u ∈ { w | dist( v, w ) ≤ d k ∧ dist( v, w ) < d min , ∀ w, v ∈ V CG ,, d k = k -th smallest distance in dist( v, w ) } (2)The CG topology is represented by a binary adjacencymatrix, E CG ∈ R | V CG | × | V CG | . Figure 3 presents the CG representation for a sample TRoI . Formally, a CG repre-sentation is formulated as, G CG := { V CG , E CG , H CG } .
2) Tissue-graph representation:
A tissue graph ( TG )depicts a high-level tissue microenvironment, where nodesand edges of TG denote tissue regions and their interactionsrespectively. Similar to a CG , a TG is constructed by firstidentifying tissue regions, i . e ., epithelium, stroma, lumen,necrosis, etc., followed by feature representation of thetissue regions, and TG topology construction. The steps aredemonstrated in Figure 3.Tissue regions are identified in a two-step process. First,we oversegment the tissue at a low magnification to detectnon-overlapping homogeneous superpixels. Operating at lowmagnification avoids noisy pixels and renders computationalefficiency. To this end, we employ the Simple Linear Itera-tive Clustering (SLIC) superpixel algorithm [68]. Formally,SLIC follows an unsupervised approach to associate eachpixel with a feature vector and merges them using a spatiallylocalized version of the standard k-means clustering. Sec-ond, we iteratively merge neighboring superpixels that havesimilar color attributes, i . e ., channel-wise mean and standarddeviation, to create superpixels capturing meaningful tissue information. A sample of tissue region instance map ispresented in Figure 3.We follow a two-step procedure to extract feature rep-resentations of tissue regions. First, we extract CNN -basedfeatures for the oversegmented superpixels. Patches of size h × w centered around the oversegmented superpixel cen-troids are processed by a ResNet. Second, morphologicalfeatures of a tissue region are obtained by averaging thedeep features of its constituting superpixels. Similar to CG ,we include spatial features as the normalized centroids of thetissue region. For a TRoI with a set of V TG tissue regions,we denote the node feature matrix as, H TG ∈ R | V TG | × d TG .To generate TG topology, we assume that adjacent tissueregions biologically interact and should be connected. Tothis end, we construct a Region Adjacency Graph [69]where an edge is built between each adjacent region. The TG topology is denoted by a binary adjacency matrix, E TG ∈ R | V TG | × | V TG | . Formally, a TG representation isformulated as, G TG := { V TG , E TG , H TG } .
3) Hierarchical Cell-to-Tissue graph representation:
Thetissue in histopathology can be considered as a hierarchi-cal organization of biological entities ranging from fine-level, i . e ., cells, to coarse-level, i . e ., tissue regions. Thereexist intra- and inter-level coupling depicting topologicaldistribution and interactions among the entities. Followingthe motivation, we propose HACT , a HierArchical Cell-to-Tissue (
HACT ) graph representation to jointly representlow-level CG and high-level TG . Intra-level topology iscaptured by standalone CG and TG . Inter-level topology ispresented by a binary assignment matrix, i . e ., cell-to-tissuehierarchies matrix, A CG → TG ∈ R | V CG | × | V TG | that utilizesthe relative spatial distribution of the nuclei with respect tothe tissue regions. For the i th nucleus and j th tissue region,igure 3: Overview of hierarchical cell-to-tissue ( HACT ) graph construction for a
TRoI . The
HACT graph representationconsists of a cell-graph, a tissue-graph and cell-to-tissue hierarchies, and encodes the phenotypical and topological distributionof tissue entities to describe the cell and tissue microenvironment.the corresponding assignment is given as, A CG → TG [ i, j ] = 1 , if i th nucleus centroid ∈ j th tissue region A CG → TG [ i, j ] = 0 , otherwise (3)Cell-to-tissue hierarchies for a tissue region are presentedin Figure 3. Note that, each nucleus is assigned to onlyone tissue region. For a segmented nucleus overlapping withmultiple tissue regions, the nucleus is assigned to the tissueregion that maximally overlaps with the nucleus. Formallyfor a given TRoI , a
HACT representation is formulated as, G HACT := { G CG , G TG , A CG → TG } . C. Graph learning
The
HACT graph representation for a
TRoI is processedby a hierarchical
GNN to map the
TRoI composition torespective
TRoI subtype. To this end, we propose a HierAr-chical Cell-to-Tissue Network (
HACT -Net), a hierarchical
GNN architecture as presented in Figure 4.
1) HACT-Net architecture & learning:
HACT -Net in-takes a
HACT representation G HACT and outputs a graph-level representation h HACT ∈ R d HACT . Subsequently, amulti-layer perceptron (
MLP ) classifies h HACT into re-spective subtype. Formally,
HACT -Net consists of two
GNN s, namely Cell-
GNN ( CG - GNN ) and Tissue-
GNN ( TG - GNN ), to hierarchically process the
HACT graph fromfine to coarse level.
HACT -Net can operate with any
GNN architecture pertaining to CG - GNN and TG - GNN . In thiswork, we leverage the recent advances in
GNN s and model
HACT -Net using the Principal Neighbourhood Aggregation(
PNA ) layer [60].First, CG - GNN intakes G CG := { V CG , E CG , H CG } , andapplies T CG PNA layers to build contextualized cell-nodeembeddings. Inline with Equation (1), we iteratively updateeach node embedding h ( t ) ( v ) , ∀ v ∈ V CG as, h (0)CG ( v ) = H CG ( v ) a ( t + 1)CG ( v ) = ⊕ u ∈ N CG ( v ) M ( t )CG (cid:16) h ( t )CG ( v ) , h ( t )CG ( u ) (cid:17) h ( t + 1)CG ( v ) = U ( t )CG (cid:32) h ( t )CG ( v ) , a ( t + 1)CG ( v ) (cid:33) (4)where, t = 0 , . . . , T CG . N CG ( v ) is the set of neigh-borhood cell-nodes of v . The functions U t CG and M t CG are MLP s. ⊕ denotes the combination of multiple degree-scalers and aggregators, i . e .,igure 4: Overview of HACT-Net. ⊕ = (cid:104) I, S ( D, α = 1) , S ( D, α = − (cid:105) ⊗ (cid:104) µ, σ, max , min (cid:105) S ( D, α ) = log ( D + 1) α δδ = 1 | train | (cid:88) i ∈ train log ( d i + 1) (5)where, I is the identity matrix, S is the degree-scaler ma-trix, D is the degree matrix of cell-nodes, [ µ, σ, max , min ] is the list of aggregators, and ⊗ is the tensor product. δ isthe normalization factor computed as the average log-scalecell-node degree from the training dataset. α is a variableparameter that is negative for attenuation, positive for am-plification, or zero for no scaling. The aggregators computethe statistics on the neighboring nodes. The schematic dia-gram of a PNA layer is presented in Figure 1. Following T CG PNA layers, an
LSTM -based jumping knowledgetechnique [70] is employed to adapt to different sub-graphstructures in the CG , given as, h ( T CG +1)CG ( v ) = LSTM (cid:16) (cid:110) h ( t )CG ( v ) (cid:12)(cid:12)(cid:12) t = 1 , . . . , T CG (cid:111) (cid:17) (6)The cell node embeddings h T CG +1CG ( v ) | v ∈ V CG andthe assignment matrix A CG → TG are used as additionalhierarchical information to initialize the tissue-node featuresin the TG , i . e ., h (0)TG ( w ) = CONCAT (cid:16) H TG ( w ) , (cid:88) v ∈ M ( w ) h ( T CG +1)CG ( v ) (cid:17) (7)where, CONCAT denotes concatenation operation. M ( w ) := { v ∈ V CG | A CG → TG ( v, w ) = 1 } is the set ofnodes in G CG mapping to a node w ∈ V TG . Analogous toEquation (4), we process G TG using the TG - GNN , based on
PNA layers, to compute the tissue region node embeddings h ( t )TG ( w ) , ∀ w ∈ V TG . At t = T TG , the embedding of each tissue component node w encodes the cell and tissueinformation up to T TG -hops from w .The tissue-node embeddings are further processed with anLSTM-based jumping knowledge technique to aggregate theintermediate tissue-node representations. Finally, the graph-level embedding h HACT is extracted by summing all thetissue-node representations. An
MLP layer followed bysoftmax operation maps h HACT to the respective
TRoI label. The model is trained end-to-end to minimize thecross-entropy loss between the ground truth
TRoI label andsoftmax output.Following [71], we include a graph normalization (Graph-Norm) layer followed by a batch normalization (BatchNorm)layer after each
PNA layer. Graph normalization scalesthe node feature representation by the number of nodes inthe graph. Intuitively, it prevents the node representationsfrom graphs of different sizes to be at different scales.This normalization helps the network learning discrimina-tive topological patterns when the number of nodes variessignificantly within a class.V. D
ATASETS
BRACS dataset:
As part of this work, we introducea new dataset and term as the BReAst Cancer Subtyp-ing (
BRACS ) dataset. BRACS contains 4391
TRoI s ac-quired from 325 H&E stained breast carcinoma WSIs. TheWSIs are selected from the archives of the Department ofPathology at National Cancer Institute- IRCCS-FondazionePascale, Naples, Italy, and are scanned with an AperioAT2 scanner at 0.25 µ m/pixel resolution. The TRoI s areselected and annotated as, Normal, Benign, Usual ductalhyperplasia (UDH), Atypical ductal hyperplasia (ADH), Flatepithelial atypia (FEA), Ductal carcinoma in situ (DCIS)and Invasive using QuPath software [72]. The
TRoI s areannotated independently by three pathologists.
TRoI s withconflict in annotations are further discussed and annotatedby the consensus of the three pathologists. Note that thepathologists utilize the entire WSI-level context during the
TRoI annotation. Figure 5 presents sample
TRoI s fromigure 5: Overview of class-wise samples in BRACS.Figure 6: Overview of the variability in DCIS category in BRACS.all the cancer subtypes in BRACS. Figure 6 demonstratesa few DCIS samples in BRACS to depict the appearancevariability per category. Figure 6(a,b,c) display the variabil-ity in the
TRoI sizes. Figure 6(d,e) display the variabilityin staining appearance. Figure 6(f,g,h,i) display differentpatterns of low-grade, moderate-grade and high-grade DCISin the dataset, i . e ., Papillary DCIS, Cribriform DCIS, SolidDCIS and Comedo DCIS respectively. Figure 6(j,k) presentDCIS TRoI s with single and multiple glandular regions.Figure 6(l,m,n) presents some notable artifacts in tissue and slide preparation, such as tissue-fold, tears, ink stainsand blur. Similar variability in
TRoI s is maintained forother cancer subtypes in the dataset to represent wholesomerealistic scenarios for each category.Table I presents category-wise statistics of
TRoI s inBRACS. The image statistics demonstrate a high variation inthe
TRoI dimensions. Additionally, we present statistics forthe CG and TG representations that are constructed as partof the proposed framework. These statistics indicate a highvariation in the size of constructed graph representations. Toable I: Key statistics of the BRACS dataset. Metric Normal Benign UDH ADH FEA DCIS Invasive Total I m a g e Number of images 512 758 471 568 783 749 550 4391Number of pixels (in million) 2.8 ± ± ± ± ± ± ± ± C G Number of nodes 994 ±
732 1826 ± ±
910 863 ±
730 470 ±
352 1723 ± ± ± ± ± ± ± ± ± ± ± T G Number of nodes 107 ±
106 217 ±
233 88 ±
93 100 ±
91 45 ±
32 225 ±
217 423 ±
317 172 ± ±
545 1012 ± ±
450 480 ±
474 194 ±
159 1111 ± ± ± I m a g e s p lit Train 342 586 303 405 599 562 366 3163Validation 86 87 88 77 85 97 82 602Test 84 85 80 86 99 90 102 626 W S I s p lit Train 67 86 59 38 37 33 41 198Validation 28 24 24 28 17 21 19 68Test 15 16 20 17 12 16 16 59 evaluate our framework on BRACS, we partition the
TRoI sinto train, validation, and test sets at WSI-level, such that two
TRoI s from the same WSI do not belong to different sets.The WSI-level splitting is performed randomly to maintaina comparable number of
TRoI s per cancer subtype. Thepartitioning ensures a fair development and evaluation ofcomputer-aided diagnostic methods on the dataset.
BACH dataset:
We also evaluate the proposed methodol-ogy on the publicly available microscopy image dataset fromthe Grand Challenge on BreAst Cancer Histology images
BACH [16]. The dataset consists of 400 training and 100test images from four breast cancer subtypes, i . e ., Normal,Benign, DCIS, and Invasive. All images are acquired usinga Leica DM 2000 LED microscope and a Leica ICC50 HDcamera. The provided images are on RGB .tiff format andhave a size of 2048 × µ m × µ m. Notably, the introduced BRACS dataset hasthree major advantages over the BACH dataset. • The train and test sets of BRACS are nearly 10 timesand 6 times the size of the train and test sets ofBACH respectively. The large test set ensures a reliableevaluation of computer-aided diagnostic methods. • BRACS includes diagnostically complex pre-cancerousatypical (ADH, FEA) category that possesses a greatdiagnostic relevance due to its high risk of becomingcancerous. The seven cancer subtypes in BRACS rep-resent a broad spectrum of breast pathology. • The aforementioned high variability in the BRACSdataset in terms of
TRoI appearances and dimensionspresents a realistic scenario for breast cancer subtyping.VI. R
ESULTS
In this section, we comprehensively evaluate the pro-posed methodology for breast cancer subtyping. First, we introduce several state-of-the-art
CNN and
GNN baselinesand their implementation strategies. Second, we performablation studies on BRACS dataset to examine the impact ofvarious components in the proposed methodology. Third, weevaluate the classification performance of our methodologyand compare with the stated baselines on BRACS and BACHdatasets for different classification settings. Finally, we pro-pose a human comparison by analyzing the performance of
HACT -Net against three independent expert-pathologists.
A. Baseline methodologies
The evaluated
CNN and
GNN baselines are: • Single-scale CNN : It processes
TRoI s at a singlemagnification. A
CNN is trained to predict patch-wisecancer subtypes and aggregate the patch-wise predictions toproduce a
TRoI -level prediction. We experiment for threescales, i . e ., 10 × , 20 × and 40 × , and denoted as, CNN (10 × ), CNN (20 × ) and CNN (40 × ). Same network architecture andtraining strategy is employed for all scales. For each scale,we extract patches of size 128 ×
128 pixels with stride 64pixels. The
CNN follows the single-scale training procedurefrom [23] and patch-wise predictions are aggregated usingthe Agg-Penultimate strategy proposed by [19]. We usetransfer learning with a ResNet-50 architecture, pre-trainedon ImageNet dataset, as our
CNN backbone. Following thefeature extraction from ResNet-50, we employ a two-layer
MLP of 128 channels to classify the patches. The ResNet-50 parameters are fine-tuned to improve classification. Adamoptimizer [73] with − learning rate and batch size 16 isused to optimize the categorical cross-entropy objective. • Multi-scale CNN : This baseline processes the
TRoI sat multiple scales. We extract concentric patches of size128 ×
128 pixels from multiple magnifications and followthe “Late fusion with single-stream +
LSTM ” trainingrocedure from [23]. The multi-scale approach uses con-centric patches to acquire context information from multiplemagnifications for improving the patch classification. Weoperate at two settings, i . e ., (10 × + 20 × ) setting and (10 × + 20 × + 40 × ) setting, and denote them as Multi-scale CNN (10 × + 20 × ) and Multi-scale CNN (10 × + 20 × +40 × ). The patch-wise predictions are aggregated using theAgg-Penultimate strategy [19]. Following the concatenatedfeature representation from the LSTM , we employ a two-layer
MLP of 128 channels to classify the patches. The train-ing strategy and hyperparameters are the same as Single-scale CNN. • CGC-Net : We implement the Cell Graph ConvolutionalNetwork (CGC-Net) proposed by [40], the state-of-the-artmethod for classifying CG representations for TRoI s. Weconstruct the CG topology for a TRoI using thresholdedkNN strategy presented in Section IV-B1. We initialize the CG nodes with hand-crafted features, employ the Adap-tive GraphSage-based CGC-Net architecture, and follow thetraining strategy proposed by [40]. • Patch-GNN : This baseline implements the method-ology proposed by [43], the state-of-the-art
GNN methodfor classifying patch-graph representations of
TRoI s. Themethodology incorporates local inter-patch context through a
GNN to construct a graph-level feature representation whichis further processed by a
MLP to classify the
TRoI . Weexperiment with Patch-
GNN at three scales, i . e ., 10 × , 20 × and 40 × , and denote the baselines as, Patch − GNN (10 × ),Patch − GNN (20 × ) and Patch − GNN (40 × ). At each magni-fication, we extract patches of size 128 ×
128 to constructa
TRoI specific patch-graph. We follow the same networkarchitecture and training strategy as proposed by [43]. • CG-GNN : This baseline is implemented to comparethe proposed hierarchical learning with standalone cell-graph based learning. CG - GNN architecture utilizes
PNA layers, an
LSTM -based jumping knowledge, sum readout,and a two-layer
MLP classifier. We follow the same CG representation strategy as proposed in Section IV-B1. • TG-GNN : We implement this baseline to compare theproposed hierarchical learning with standalone tissue graph-based learning. TG - GNN employs the same architecture asthe CG - GNN . The TG node features are directly initializedby H TG instead of Equation (7). • CONCAT-GNN : This baseline is implemented to eval-uate the impact of hierarchical graph representation andlearning.
CONCAT - GNN utilizes standalone CG and TG representations as input to standalone CG - GNN and TG - GNN respectively to produce h CG and h TG graph-levelembeddings. The TRoI level embedding is constructed byconcatenating the graph-level embeddings, i . e ., h CONCAT = CONCAT( h CG , h TG ) . Finally, a two-layer MLP classifies h CONCAT into cancer subtype.
B. Implementation
Resources:
All the experiments are conducted using Py-Torch [74] and Deep Graph Library (DGL) [75]. We useNVIDIA Tesla P100 GPUs with POWER8 processors to runthe experiments.
Graph representation hyperparameters: CG represen-tations (Section IV-B1) use, i) patches of size 72 × CNN : { ResNet-34, ResNet-50 } to initialize nodefeatures. TG representations (Section IV-B2) use, i) patchesof size 144 × CNN : { ResNet-34, ResNet-50 } toinitialize node features. Graph architecture and learning hyperparameters: CG - GNN , TG - GNN , CONCAT - GNN and
HACT -Netshare the same hyperparameters and respective options, i . e .,, • PNA layers in
GNN : [3, 4, 5] • MLP layers in a
PNA layer: 2 • PNA layer
MLP : 64 • Graph-level embedding dimension: 128 • MLP layers in output classifier: 2 • MLP classifier: 128 • Training parameters: Adam optimizer [73] with − learning rate, 16 batch size and a categorical cross-entropy objective. Evaluation metrics:
Considering the imbalanced numberof
TRoI s per class in train, validation and test set (seeTable I), we use weighted F1-score to evaluate the classi-fication performance. The class-wise weights are computedusing the class-wise number of
TRoI s in each set. Duringtraining, networks with the best validation weighted F1-scores are selected as the final trained models. To ensurethe statistical significance of our results, all the models aretrained three times using random weight initializations. Thereported scores are calculated as the mean and standarddeviation over the three runs. Further, we present precision,recall and confusion matrix to indicate the distribution ofthe predictions across different classes.
C. Ablation studies
We conduct thorough ablation studies to evaluate theimpact of three components in the proposed methodologyon
TRoI classification performance. The components are,i) node feature initialization, ii)
GNN layer type, and iii)jumping knowledge technique. Each component is analyzedindividually, while fixing the other components. The ablationstudies are performed on the BRACS dataset for classifyingthe
TRoI s into seven classes.
1) Impact of node feature initialization:
The performanceof
GNN s eminently rely on the initial node features [51].In context of entity-based tissue analysis, we focus on theimpact of initial morphological features embedded in thenodes. To this end, we experiment with three morphologicalfeature initialization settings, given as,able II: Ablation: Impact of node features. Mean andstandard deviation of 7-class weighted F1-scores. Resultsexpressed in % . Weighed F1CG-
GNN : No morphological features 45.24 ± GNN : Hand-crafted morphological features 48.34 ± GNN : CNN morphological features ± TG-
GNN : No morphological features 36.81 ± GNN : Hand-crafted morphological features 51.62 ± GNN : CNN morphological features ± CONCAT-
GNN : No morphological features 47.62 ± GNN : Hand-crafted morphological features 51.55 ± GNN : CNN morphological features ± HACT -Net: No morphological features 48.70 ± HACT -Net: Hand-crafted morphological features 52.46 ± HACT -Net:
CNN morphological features ± • No morphological features:
In this setting, the nodesof an entity-graph representation are initialized with onlyspatial features. Experiments with this setting demonstratethe impact of standalone graph topology on the
GNN performance. • Hand-crafted morphological features:
The entity-graph nodes are initialized with hand-crafted morpholog-ical features, i . e ., i) texture features: average foregroundand background difference; standard deviation, skewnessand mean entropy of intensity; dissimilarity, homogeneity,energy and angular second moment from Gray-Level Co-occurrence Matrix, and ii) shape features: eccentricity, area,maximum and minimum length of axis, perimeter, solidityand orientation. Note that, for CG and TG , the hand-craftedfeatures are computed from the segmented instances of thenuclei and tissue regions respectively. • CNN morphological features:
The morphological fea-tures of the graph nodes are initialized with
CNN features(ResNet-34 pre-trained on ImageNet) extracted from patchesaround the centroids of the nodes.Results of the experiments in Table II indicate that the CG topology alone is more discriminative for cancer subtypingthan TG topology. The combination of CG and TG topolo-gies further improves the discriminability. The best perfor-mance with HACT topology confirms the positive impactof hierarchical representation. Further, including morpho-logical features significantly improves the discriminabilityof the graph representations. The superiority of graphs with
CNN -based morphological features indicates the richness ofmorphological information acquired from
CNN compared tohand-crafted measures.
2) Impact of
GNN layer type:
We investigate the impactof two state-of-the-art
GNN layers, i . e ., GIN and
PNA (Fig-ure 1), on the classification performance. The experimentsuse
CNN -based node feature initialization and
LSTM -based Table III: Ablation: Impact of
GNN layer. Mean andstandard deviation of 7-class weighted F1-scores. Resultsexpressed in % . Weighed F1CG-
GNN : GIN 55.70 ± GNN : PNA ± TG-
GNN : GIN 55.33 ± GNN : PNA ± CONCAT-
GNN : GIN 56.20 ± GNN : PNA ± HACT -Net: GIN 59.73 ± HACT -Net: PNA ± Table IV: Ablation: Impact of
GNN jumping knowledgestrategy. Mean and standard deviation of 7-class weightedF1-scores. Results expressed in % . Weighed F1CG-
GNN : No aggregator 55.53 ± GNN : Concatenation 55.82 ± GNN : LSTM ± TG-
GNN : No aggregator 55.30 ± GNN : Concatenation 56.07 ± GNN : LSTM ± CONCAT-
GNN : No aggregator 57.67 ± GNN : Concatenation 56.28 ± GNN : LSTM ± HACT -Net: No aggregator 49.16 ± HACT -Net: Concatenation 59.78 ± HACT -Net: LSTM ± jumping knowledge to compute the graph-level embedding.The ablation results in Table III demonstrate that GNN with
PNA layer outperform
GNN with
GIN layer for allfour
GNN s in this work. It can be accounted to the higherexpressive power of
PNA layer that operates on graphs withcontinuous node features. The series of aggregators withdegree-scalers in a
PNA layer includes additional networkparameters compared to single aggregator-based
GIN layer,which allows more flexibility and learning ability in thenetwork.
3) Impact of jumping knowledge technique:
To investi-gate the impact of the jumping knowledge technique, weexperiment with three settings, i . e ., no jumping knowl-edge, CONCAT -based, and
LSTM -based.
LSTM -basedtechnique follows Equation (6). The
CONCAT -based tech-nique replaces
LSTM with concatenation operation in Equa-tion (6). The experiments use
CNN -based node feature ini-tialization and
PNA layer in the
GNN s. The ablation resultsin Table IV indicate the positive impact of the jumpingknowledge technique. Compared to
CONCAT -based, theable V: Mean and standard deviation of class-wise F1-scores, and 7-class weighted F1-scores. Results expressed in % . Thebest result is highlighted in bold and the second best is underlined. Method Normal Benign UDH ADH FEA DCIS Invasive Weighted F1 C NN CNN(10 × ) ± ± ± ± ± ± ± ± CNN(20 × ) ± ± ± ± ± ± ± ± CNN(40 × ) ± ± ± ± ± ± ± ± CNN(10 × +20 × ) ± ± ± ± ± ± ± ± CNN(10 × +20 × +40 × ) ± ± ± ± ± ± ± ± G NN CGG-Net 30.83 ± ± ± ± ± ± ± ± GNN(10 × ) ± ± ± ± ± ± ± ± GNN(20 × ) ± ± ± ± ± ± ± ± GNN(40 × ) ± ± ± ± ± ± ± ± O u r s CG-
GNN ± ± ± ± ± ± ± ± GNN ± ± ± ± ± ± ± ± GNN ± ± ± ± ± ± ± ± HACT -Net (Proposed) 61.56 ± ± ± ± ± ± ± ± (a) (b) Figure 7: 7-class. Precision and Recall (mean, standard deviation) for CG-GNN, TG-GNN, CONCAT-GNN and HACT-Net.
LSTM -based technique learns better dependencies between
GNN layers, thus generates more discriminable graph-levelfeature representations.
4) Ablation summary:
The ablation experiments con-clude the following choice of components for designingour methodology, i)
CNN -based initialization of node-levelmorphological features, ii)
PNA layers, and iii) an
LSTM -based jumping knowledge technique.
D. Classification results on BRACS dataset
In this section, we evaluate the proposed methodologyon BRACS dataset for
TRoI classification task andcompare the classification performance with
CNN and
GNN baselines. We analyze the performance for threeclassification settings, i . e ., • Setting 1: 7-class classification:
This setting classifies
TRoI s into 7-classes, i . e ., Normal, Benign, UDH,ADH, FEA, DCIS and Invasive, to differentiate thespectrum of breast cancer subtypes. Table V highlightsthe classification performance of HACT -Net,
CNN and
GNN baselines.
CNN (10 × ) is the best performing single-scale CNN indicating the importance of globalcontext information for
TRoI classification. Multi-scale
CNN s using both global and local context informationoutperforms single scale
CNN s. The gain is significant forADH, FEA and DCIS categories that require both localand global context information for the diagnosis. Multi-scale
CNN s also outperforms CGC-Net and Patch-
GNN baselines. Interestingly for each magnification, Patch-
GNN outperforms single-scale
CNN . It conveys the importanceof relational and topological distribution information ofpatches for
TRoI classification.Analyzing the proposed
GNN s, we observe that CG - GNN significantly beats CGC-Net concluding the superi-ority of
CNN -based node feature initialization over hand-crafted features, and the significance of
GNN with ex-pressive
PNA layers over Adaptive GraphSage in CGC-Net. We notice that CG - GNN and TG - GNN provide over-all comparable performance. However, TG - GNN performsbetter for Normal, Benign, and FEA subtypes indicatingthe utility of tissue microenvironment information for thesecategories. In contrast, CG - GNN surpasses TG - GNN forUDH and ADH that rely on local nuclei-level information.able VI: Mean and standard deviation of class-wise F1-scores, and 4-class weighted F1-scores. Results expressed in % .The best result is highlighted in bold and the second best is underlined. Method Normal Non-cancerous Precancerous Cancerous Weighted F1 C NN CNN(10 × ) ± ± ± ± ± CNN(20 × ) ± ± ± ± ± CNN(40 × ) ± ± ± ± ± CNN(10 × +20 × ) ± ± ± ± ± CNN(10 × +20 × +40 × ) ± ± ± ± ± G NN CGG-Net 34.53 ± ± ± ± ± GNN(10 × ) ± ± ± ± ± GNN(20 × ) ± ± ± ± ± GNN(40 × ) ± ± ± ± ± O u r s CG-
GNN ± ± ± ± ± GNN ± ± ± ± ± GNN ± ± ± ± ± HACT -Net (Proposed) ± ± ± ± ± For DCIS and Invasive, both CG - GNN and TG - GNN perform similar showing that both low-level and high-levelinformation is useful for these categories. Further, both
HACT -Net and
CONCAT - GNN provide overall superiorperformance compared to all
CNN and
GNN baselines.
HACT -Net significantly outperforms
CONCAT - GNN indi-cating the significance of the proposed hierarchical modelingand learning.
CONCAT - GNN produces overall comparableperformance to CG - GNN and TG - GNN , however, in termsof class-wise performance,
CONCAT - GNN seems to utilizecomplementary information from CG and TG representa-tion. The complementary information is effectively utilizedby HACT -Net to improve the per-class and the overall clas-sification performance. In general, all the proposed
GNN scomprehensively outperform the
CNN baselines establishingthe potential of entity-based analysis in digital pathology.Figure 7 presents the per-class precision and recall for CG - GNN , TG - GNN , CONCAT - GNN and
HACT -Net.
HACT -Net is observed to produce the best precision valuesfor most of the classes. The ranking of per-class recall valuesfor CG - GNN and TG - GNN are inconsistent, whereas,
HACT -Net consistently results in better recall values. Fur-ther, the standard deviation of class-wise precision and recallvalues are persistently the lowest for
HACT -Net for mostof the classes. Figure 8 demonstrates the 7-class confusionmatrix of precision and recall values for
HACT -Net. Thenetwork presents high ambiguity in precision between i)Normal and Benign, ii) UDH and ADH, and iii) ADHand DCIS. Additionally, intermediate ambiguity in recallappears for Benign, UDH, and ADH with FEA. Notably,these pair-wise classes bear high pathological ambiguity andare diagnostically very challenging. • Setting 2: 4-class classification:
This setting cat-egorizes
TRoI s into 4-classes based on the risk of at-taining cancerous state. We group the
TRoI s in BRACS Figure 8: 7-class. Precision and Recall (mean ± std) confu-sion matrix for HACT-Net.into four pathologically relevant categories, i . e ., Normal,Non-cancerous (Benign + UDH), Precancerous (ADH +FEA), and Cancerous (DCIS + Invasive). Classificationperformance of HACT -Net,
CNN and
GNN baselines arepresented in Table VI. Single scale
CNN s exhibit the samebehavior as in the 7-class setting. However, combining mul-tiple magnifications in multi-scale
CNN s does not improvethe classification performance over the single scale
CNN s.CGC-Net performs inferior to the
CNN baselines and thebest Patch-
GNN results in comparable performance to the
CNN s. Similar to the 7-class classification setting, CG - GNN , TG - GNN and
CONCAT - GNN provide similar clas-sification performance and outperform the
CNN and
GNN baselines.
HACT -Net significantly outperforms the
CNN and
GNN baselines. Analysis of the per-class performanceindicates that
HACT -Net provides the best classificationperformance for Normal, Precancerous, and Cancerous cat-egories. To highlight,
HACT -Net achieves
F1-scorefor the diagnostically challenging Precancerous category. • Setting 3: Binary classifications:
In this setting, weable VII: Mean and standard deviation of Binary-class weighted F1-scores. Results expressed in % . The best result ishighlighted in bold and the second best is underlined. Method I vs N+B+U vs N vs B vs A+F vs A vsN+B+A+U+F+D A+F+D B+U U D F C NN CNN(10 × ) ± ± ± ± ± ± CNN(20 × ) ± ± ± ± ± ± CNN(40 × ) ± ± ± ± ± ± CNN(10 × +20 × ) ± ± ± ± ± ± CNN(10 × +20 × +40 × ) ± ± ± ± ± ± G NN CGG-Net 91.60 ± ± ± ± ± ± GNN(10 × ) ± ± ± ± ± ± GNN(20 × ) ± ± ± ± ± ± GNN(40 × ) ± ± ± ± ± ± O u r s CG-
GNN (Ours) 94.52 ± ± ± ± ± ± GNN ± ± ± ± ± ± GNN ± ± ± ± ± ± HACT -Net (Proposed) ± ± ± ± ± ± Figure 9: Decision tree used by pathologists to make theirdiagnosis. The 7-class classification is simplified to a seriesof binary decision tasks. While going through the tree, thediagnosis become more and more specific until the leaves, i . e ., the 7-classes, are reached.follow a diagnostic decision tree, presented in Figure 9,to make one diagnosis at a time. It is inspired by theclassification scheme presented by [76] which follows thepathologist’s approach for breast cancer subtyping. Theindividual binary tasks are less constrained compared tomulti-class classification tasks, thereby presenting betterdiscrimination between a selected pair of classes. The binaryclassifiers can assist pathologists in categorizing ambiguouscases while traversing the decision tree. Table VII highlightsthe classification performance of HACT -Net,
CNN and
GNN baselines for six binary classification tasks in thedecision tree. Performance of the networks for individualbinary tasks is consistent with the 7-class and 4-classclassification settings.
HACT -Net consistently outperformsthe
CNN and
GNN baselines. These analyses establishthe superiority of the proposed
HACT representation and
HACT -Net workflow for subtyping breast cancer
TRoI s.
1) Domain expert comparison on BRACS dataset:
Toassess the quality of the BRACS dataset and benchmark the proposed
HACT -Net, we acquire independent pathologist’annotations on the BRACS test set. We include the partic-ipation of three board-certified pathologists (excluding thepathologists providing the initial annotations, namely groundtruth labels) in the domain expert comparison experiment byfollowing the evaluation protocols by [4]. The participatingpathologists are from three medical centers, 1) NationalCancer Institute- IRCCS-Fondazione Pascale, Naples, Italy,2) Lausanne University Hospital, CHUV, Lausanne, Switzer-land, and 3) Aurigen- Centre de Pathologie, Lausanne,Switzerland. They are specialized in breast pathology andhave been in practice for over twenty years. The pathologistsindependently and remotely annotated the
TRoI s in theBRACS test set without having access to the respectiveWSIs. This protocol ensures equal field-of-view for all thepathologists and our proposed methodology.The pathologists’ annotations and
HACT -Net predictionsare compared to the ground truth labels of the BRACStest set to evaluate their classification performances, pre-sented in Table VIII. We include per-class F1-scores, overallweighted F1-score, and accuracy for the individual pathol-ogist. Further, we include the aggregated statistics of thepathologists’ performances for benchmarking
HACT -Netwith respect to the domain expert annotators. Table VIIIindicates that
HACT -Net provides per-class comparableclassification performance to the pathologists. The
HACT -Net actually outperforms the domain experts in the casesthat are typically more difficult to distinguish, namely atypiaand hyperplasias, which was also one of the main aimsof the developed technologies. Also, the per-class standarddeviation conveys that
HACT -Net reduces uncertainty andprovides more reproducible and objective results comparedto the independent domain expert annotators. The indepen-dent pathologists’ have 57% concordance with the groundtruth diagnoses for the 7-class classification task and
HACT -able VIII: Comparison with pathologist. Weighted F1 score on 7-class breast cancer subtyping. Results expressed in % . Normal Benign UDH ADH FEA DCIS Invasive Weighted F1 AccuracyPathologist 1 67.53 53.92 41.90 36.00 19.13 71.59 94.00 55.30 56.71Pathologist 2 47.83 52.94 25.00 35.37 65.22 68.00 94.00 57.07 57.99Pathologist 3 39.66 49.59 49.43 42.29 54.12 65.19 89.47 56.71 56.55Pathologist statistics 51.57 ± ± ± ± ± ± ± ± ± HACT -Net statistics ± ± ± ± ± ± ± ± ± Table IX: Concordance among independent pathologists’annotations. Results expressed in % . Pathologist 1 Pathologist 2 Pathologist 3 Ground truthPathologist 1 - 47.60 50.96 56.71Pathologist 2 - - 64.38 57.99Pathologist 3 - - - 56.55
Net results in a better concordance rate.To benchmark the BRACS dataset with respect to thedataset by [4], we compare the aggregated pathologists’statistics on both datasets for the same set of classes, i . e .,Benign without atypia (Normal + Benign + UDH), Atypia(ADH + FEA), DCIS and Invasive. Note that, the [4]’sdataset consists of 240 breast biopsy slides, and the BRACSdataset consists of 626 TRoI images. For the dataset by [4],the pathologists’ class-wise concordance rates are, 87%,48%, 84% and 96% for the aforementioned classes. For theBRACS dataset, the pathologists’ class-wise concordancerates are respectively, 87%, 50%, 72%, and 90% for theaforementioned classes. The class-wise concordance ratesexhibit a similar trend for both datasets. The difference in thepathologists’ performance in both datasets can be accountedfor by the difference in the field-of-view, i . e ., TRoI andWSI, accessible to the pathologist during the diagnoses.Table IX presents the inter-observer concordance rates forthe BRACS test set. We notice significant differences inconcordance rates between pathologist (1, 2) and pathologist(2, 3), and comparable concordance between pathologist (1,2) and pathologist (1, 3). The observation can be reasonedto the diagnostic differences and practices across differentregions. Pathologist 1 belongs to Naples, Italy, and 2 and 3belong to Lausanne, Switzerland.
2) Classification results on BACH dataset:
We evaluatethe proposed methodology on the public BACH dataset.Considering the small training set, i . e ., 400 images, weemploy different image augmentation techniques for training HACT -Net. To this end, we employ rotation, mirroring,and color augmentations on the training images beforeextracting the
HACT graph representations. We do not useother graph augmentation techniques, such as random nodedropping, random edge dropping, etc., as these augmenta-tions can hamper the meaningful topological distributionof the biological entities. The implementation strategies Table X: BACH dataset results expressed in % . Methods AccuracyEnsemble Chennamsetty et al. [77] 87.00Kwok [78] 87.00Brancati et al. [79] 86.00Single
HACT -Net and hyperparameters from Section VI-B are employed fortraining
HACT -Net. Classification performance of
HACT -Neat and the state-of-the-art results on the BACH datasetare summarized in Table X. Our model predictions areindependently evaluated by the organizers of the BACHchallenge. This ensured fair and independent comparisonwith state-of-the-art methods. The proposed methodologyoutperforms the state-of-the-art performance on the BACHdataset. Notably, our methodology employs a single networkcompared to the ensemble strategy of employing multiplenetworks during the inference.VII. C
ONCLUSION
Pixel-based processing of pathology images suffers fromthe context-resolution trade-off, and misses the notion ofbiological entity and tissue composition. In this work, weproposed an entity-based tissue representation and learn-ing to address the issues with pixel-based processing. Toachieve this, we proposed two novel contributions: (i) ahierarchical entity-graph representation of a tissue imageby incorporating multisets of pathologically intuitive bio-logical entities, and (ii) a hierarchical graph neural networkfor sequentially processing the entity-graph representationand mapping the tissue composition to respective tissuefunctionality. Further, we introduce BReAst Cancer Sub-typing (BRACS), a large cohort of breast tumor regions-of-interest annotated with breast cancer subtypes. BRACSencompasses seven breast cancer subtypes to represent arealistic breast cancer diagnosis scenario. Using BRACSand BACH, a public breast cancer subtyping dataset, wedemonstrate the performance of the proposed methodologyfor classifying breast tumor regions-of-interest into cancersubtypes. Upon assessment through various classificationsettings, our methodology outperformed several state-of-the-art pixel-based and entity-graph based classification method-ologies in terms of classification performance. Further, weenchmarked the performance of our methodology on theBRACS dataset by comparing it to three independent pathol-ogists. Notably, the proposed methodology achieved betterper-cancer subtype and overall aggregated classification per-formance. Although we have evaluated our method for breastcancer classification, the technology is easily extendableto other tissue types and diseases. Notably, the proposedhierarchical graph methodology can be also adapted to otherimaging types, such as natural images, multiplexed images,hyperspectral images, satellite images, and other medicalimaging domains, by incorporating adequate domain andtask-specific entities. R
EFERENCES [1] F. Bray et al. , “Global cancer statistics 2018: Globocanestimates of incidence and mortality worldwide for 36 cancersin 185 countries,” in
Cancer J. Clinicians , vol. 68, 2018, pp.394–424.[2] C. Allemani et al. , “Global surveillance of cancer survival1995-2009: Analysis of individual data for 25 676 887patients from 279 population-based registries in 67 countries(concord-2),” in
The Lancet , vol. 385, no. 9972, 2015, pp.977–1010.[3] D. Gomes et al. , “Inter-observer variability between generalpathologists and a specialist in breast pathology in the di-agnosis of lobular neoplasia, columnar cell lesions, atypicalductal hyperplasia and ductal carcinoma in situ of the breast,”in
Diagnostic Pathology , vol. 9, no. 121, 2014.[4] J. Elmore et al. , “Diagnostic concordance among pathologistsinterpreting breast biopsy specimens,” in
JAMA , vol. 313,no. 11, 2015, pp. 1122–1132.[5] R. Siegel, K. Miller, and A. Jemal, “Cancer statistics, 2020,”in
CA: A Cancer Journal for Clinicians , vol. 70, 2020, pp.7–30.[6] S. Mukhopadhyay et al. , “Whole slide imaging versus mi-croscopy for primary diagnosis in surgical pathology: a multi-center blinded randomized noninferiority study of 1992 cases(pivotal study),” in
Am J Surg Pathol , vol. 42, no. 1, 2017,pp. 39–52.[7] G. Litjens et al. , “A survey on deep learning in medical imageanalysis,” in
Medical Image Analysis , vol. 42, 2017, pp. 60–88.[8] S. Deng et al. , “Deep learning in digital pathology imageanalysis: A survey,” in
Frontiers of Medicine , 2020.[9] A. Ibrahim et al. , “Artificial intelligence in digital breastpathology: Techniques and applications,” in
The Breast ,vol. 49, 2020, pp. 267–273.[10] N. Kumar et al. , “A dataset and a technique for generalizednuclear segmentation for computational pathology,” in
IEEETransactions on Medical Imaging , vol. 36, no. 7, 2017, pp.1550–1560. [11] S. Graham et al. , “Hover-net: Simultaneous segmentation andclassification of nuclei in multi-tissue histology images,” in
Medical Image Analysis , vol. 58, 2019.[12] P. Pati et al. , “Reducing annotation effort in digital pathology:A co-representation learning framework for classificationtasks,” in
Medical Image Analysis , vol. 67, 2021.[13] R. Verma et al. , “Multi-organ nuclei segmentation and clas-sification challenge,” 2020.[14] S. Graham et al. , “Mild-net: Minimal information loss dilatednetwork for gland instance segmentation in colon histologyimages,” in
Medical Image Analysis , vol. 52, 2019, pp. 199–211.[15] T. Binder et al. , “Multi-organ gland segmentation using deeplearning,” in
Frontiers in Medicine , 2019.[16] G. Aresta et al. , “Bach: Grand challenge on breast cancerhistology images,” in
Medical Image Analysis , vol. 56, 2019,pp. 122–139.[17] B. Bejnordi et al. , “Diagnostic assessment of deep learningalgorithms for detection of lymph node metastases in womenwith breast cancer,” in
JAMA , vol. 318, no. 22, 2019, pp.2199–2210.[18] P. Pati et al. , “Deep positive-unlabeled learning for regionof interest localization in breast tissue images,” in
Proc.SPIE 10581, Medical Imaging 2018: Digital Pathology , vol.1058107, 2018.[19] C. Mercan et al. , “From patch-level to roi-level deep featurerepresentations for breast histopathology classification,” in
Proc. SPIE 10956, Medical Imaging 2019: Digital Pathology ,vol. 109560H, 2019.[20] A. Madabhushi and G. Lee, “Image analysis and machinelearning in digital pathology: Challenges and opportunities,”in
Medical Image Analysis , vol. 33, 2016, pp. 170–175.[21] A. Parwani, “Next generation diagnostic pathology: use ofdigital pathology and artificial intelligence tools to augmenta pathological diagnosis,” in
Diagnostic Pathology , vol. 14,no. 138, 2019.[22] B. Bejnordi et al. , “Context-aware stacked convolutionalneural networks for classification of breast carcinomas inwhole-slide histopathology images,” in
Journal of MedicalImaging , vol. 4, no. 4, 2017.[23] K. Sirinukunwattana et al. , “Improving whole slide segmen-tation through visual context - a systematic study,” in
MedicalImage Computing and Computer Assisted Intervention , vol.11071, 2018.[24] D. Tellez et al. , “Neural image compression for gigapixelhistopathology image analysis,” in
IEEE Transactions onPattern Analysis and Machine Intelligence , vol. 58, 2019.[25] M. Hagele et al. , “Resolving challenges in deep learning-based analyses of histopathological images using explanationmethods,” in
Scientific Reports 10 , vol. 6423, 2020.26] C. Demir et al. , “The cell graphs of cancer,” in
Bioinformatics ,vol. 20, 2004, pp. 145–151.[27] P. Pati et al. , “Hact-net: A hierarchical cell-to-tissue graphneural network for histopathological image classification,” in
MICCAI Graphs in Biomedical Image Analysis Workshop ,2020.[28] D. Komura and S. Ishikawa, “Machine learning methodsfor histopathological image analysis,” in
Computational andStructural Biotechnology Journal , 2018, pp. 34–42.[29] C. Srinidhi et al. , “Deep neural network models for computa-tional histopathology: A survey,” in arXiv:1912.12378 , 2019.[30] F. Spanhol et al. , “A dataset for breast cancer histopathologi-cal image classification,” in
IEEE transactions on biomedicalengineering , vol. 63, no. 7, 2016, pp. 1455–1462.[31] T. Araujo et al. , “Classification of breast cancer histologyimages using convolutional neural networks,” in
PloS one ,vol. 12, no. 6, 2005.[32] D. Bardou et al. , “Classification of breast cancer based onhistology images using convolutional neural networks,” in
IEEE Access , vol. 6, 2018, pp. 24 680–24 693.[33] K. Roy et al. , “Patch-based system for classification of breasthistology images using deep learning,” in
Computerized Med-ical Imaging and Graphics , vol. 71, 2019, pp. 90–103.[34] M. Shaban et al. , “Context-aware convolutional neural net-work for grading of colorectal cancer histology images,” in
IEEE Transactions on Medical Imaging , vol. 39, no. 7, 2020,pp. 2395 – 2405.[35] R. Yan et al. , “Breast cancer histopathological image classifi-cation using a hybrid deep neural network,” in
Methods , vol.173, 2020, pp. 52–60.[36] H. Pinckaers, B. van Ginneken, and G. Litjens, “Streamingconvolutional neural networks for end-to-end learning withmulti-megapixel images,” in
IEEE Transactions on MedicalImaging , vol. 39, no. 5, 2020, pp. 1306–1315.[37] G. Campanella et al. , “Clinical-grade computational pathol-ogy using weakly supervised deep learning on whole slideimages,” in
Nature Medicine , vol. 25, no. 8, 2019, pp. 1301–1309.[38] M. Lu et al. , “Data efficient and weakly supervisedcomputational pathology on whole slide images,” in arXiv:2004.09666v2 , 2019.[39] H. Sharma et al. , “A review of graph-based methods for imageanalysis in digital histopathology,” in
Diagnostic Pathology ,2015.[40] Y. Zhou et al. , “CGC-net: Cell graph convolutional networkfor grading of colorectal cancer histology images,” in
Pro-ceedings of the IEEE International Conference on ComputerVision Workshops , 2019.[41] R. Chen et al. , “Pathomic fusion: An integrated frameworkfor fusing histopathology and genomic features for cancerdiagnosis and prognosis,” in
IEEE Transactions on MedicalImaging , 2020. [42] D. Anand et al. , “Histographs: graphs in histopathology,” in
Proc. SPIE 11320, Medical Imaging 2020: Digital Pathology ,vol. 113200O, 2020.[43] B. Aygunes et al. , “Graph convolutional networks for regionof interest classification in breast histopathology,” in
Proc.SPIE 11320, Medical Imaging 2020: Digital Pathology , vol.113200K, 2020.[44] S. Javed et al. , “Cellular community detection for tissue phe-notyping in colorectal cancer histology images,” in
MedicalImage Analysis , vol. 63, 2020.[45] Y. Zhao et al. , “Predicting lymph node metastasis usinghistopathological images based on multiple instance learningwith deep graph convolution,” in
IEEE CVPR , 2020, pp.4837–4846.[46] M. Adnan et al. , “Representation learning of histopathologyimages using graph neural networks,” in
IEEE CVPR Work-shops , 2020, pp. 4254–4261.[47] G. Jaume et al. , “Towards explainable graph representationsin digital pathology,” in
ICML Workshop on ComputationalBiology , 2020.[48] M. Sureka et al. , “Visualization for histopathology im-ages using graph convolutional neural networks,” in arXiv:2006.09464 , 2020.[49] G. Jaume et al. , “Quantifying explainers of graph neuralnetworks in computational pathology,” in arXiv:2011.12646 ,2020.[50] M. Defferrard et al. , “Convolutional neural networks ongraphs with fast localized spectral filtering,” in
NeurIPS ,2016, pp. 3844–3852.[51] T. Kipf and M. Welling, “Semi-supervised classification withgraph convolutional networks,” in
ICLR , 2017.[52] K. Xu et al. , “How powerful are graph neural networks?” in
ICLR , 2019.[53] W. Hamilton et al. , “Inductive representation learning on largegraphs,” in
NeurIPS , 2017, pp. 1024–1034.[54] P. Velickovic et al. , “Graph attention networks,” in
Inter-national inproceedings on Learning Representations, ICLR ,2018.[55] J. Gilmer et al. , “Neural message passing for quantum chem-istry,” in
ICML , vol. 70, 2017, pp. 1263–1272.[56] C. Morris et al. , “Weisfeiler and leman go neural: Higher-order graph neural networks,” in
AAAI , 2018.[57] G. Jaume et al. , “edgnn: a simple and powerful gnn for di-rected labeled graphs,” in
ICLR Workshop on RepresentationLearning on Graphs and Manifolds , 2019.[58] B. Weisfeiler and A. A. Lehman, “A reduction of a graph to acanonical form and an algebra arising during this reduction,”in
Nauchno-Technicheskaya Informatsia , vol. 2, no. 9, 1968,pp. 12–16.59] N. Dehmamy et al. , “Understanding the representation powerof graph neural networks in learning graph topology,” in
NeurIPS , 2019, pp. 15 413–15 423.[60] G. Corso et al. , “Principal neighbourhood aggregation forgraph nets,” in
NeurIPS , 2020.[61] M. Veta et al. , “Breast cancer histopathology image analysis:A review,” in
IEEE Transactions on Biomedical Engineering ,no. 5, 2014, pp. 1400–1411.[62] D. Tellez et al. , “Quantifying the effects of data augmen-tation and stain color normalization in convolutional neuralnetworks for computational pathology,” in
Medical ImageAnalysis , vol. 58, 2019.[63] M. Macenko et al. , “A method for normalizing histologyslides for quantitative analysis,” in
IEEE International Sym-posium on Biomedical Imaging: From Nano to Macro , 2009,pp. 1107–1110.[64] M. Stanisavljevic et al. , “A fast and scalable pipeline for stainnormalization of whole-slide images in histopathology,” in
ECCV Workshop , 2018.[65] K. He et al. , “Deep residual learning for image recognition,”in
IEEE CVPR , 2016.[66] J. Deng et al. , “Imagenet: A large-scale hierarchical imagedatabase,” in
IEEE CVPR , 2009.[67] K. Francis and B. Palsson, “Effective intercellular communi-cation distances are determined by the relative time constantsfor cyto/chemokine secretion and diffusion,” in
Proceedingsof the National Academy of Sciences , vol. 94, no. 23, 1997,pp. 12 258–12 262.[68] R. Achanta et al. , “Slic superpixels compared to state-of-the-art superpixel methods,” in
IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 34, no. 11, 2012, pp.2274–2282.[69] F. Potjer, “Region adjacency graphs and connected mor-phological operators,” in
Mathematical Morphology and itsApplications to Image and Signal Processing. ComputationalImaging and Vision , vol. 5, 1996, p. 111–118.[70] K. Xu et al. , “Representation learning on graphs with jumpingknowledge networks,” in
ICML , 2018.[71] V. P. Dwivedi et al. , “Benchmarking graph neural networks,”in
CoRR,abs/2003.00982 , 2020.[72] P. Bankhead et al. , “Qupath: Open source software for digitalpathology image analysis,” in
Scientific reports , vol. 7, no. 1,2017, pp. 1–7.[73] D. Kingma and J. Ba, “Adam: A method for stochasticoptimization,” in
ICLR , 2015.[74] A. Paszke et al. , “Pytorch: An imperative style, high-performance deep learning library,” in
NeurIPS , 2019, pp.8024–8035. [75] M. Wang et al. , “Deep graph library: Towards efficientand scalable deep learning on graphs,” in
CoRR , vol.abs/1909.01315, 2019.[76] E. Mercan et al. , “Automated diagnosis of breast cancerand pre-invasive lesions on digital whole slide images,” in
ICPRAM , 2018.[77] S. Chennamsetty, M. Safwan, and V. Alex, “Classification ofbreast cancer histology image using ensemble of pre-trainedneural networks,” in
International Conference Image Analysisand Recognition , 2018, pp. 804–811.[78] S. Kwok, “Multiclass classification of breast cancer in whole-slide images,” 2018.[79] N. Brancati, M. Frucci, and D. Riccio, “Multi-classificationof breast cancer histology images by using a fine-tuningstrategy,” in