Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer
Christian Abbet, Inti Zlobec, Behzad Bozorgtabar, Jean-Philippe Thiran
DDivide-and-Rule: Self-Supervised Learning forSurvival Analysis in Colorectal Cancer
Christian Abbet Q , Inti Zlobec ,Behzad Bozorgtabar , , , and Jean-Philippe Thiran , , Signal Processing Laboratory 5, EPFL, Lausanne, Switzerland { firstname.lastname } @epfl.ch Department of Radiology, Lausanne University Hospital, Lausanne, Switzerland Center of Biomedical Imaging, Lausanne, Switzerland TRU Translational Research Unit, Bern, Switzerland
Abstract.
With the long-term rapid increase in incidences of colorectalcancer (CRC), there is an urgent clinical need to improve risk strati-fication. The conventional pathology report is usually limited to onlya few histopathological features. However, most of the tumor microen-vironments used to describe patterns of aggressive tumor behavior areignored. In this work, we aim to learn histopathological patterns withincancerous tissue regions that can be used to improve prognostic strat-ification for colorectal cancer. To do so, we propose a self-supervisedlearning method that jointly learns a representation of tissue regions aswell as a metric of the clustering to obtain their underlying patterns.These histopathological patterns are then used to represent the interac-tion between complex tissues and predict clinical outcomes directly. Wefurthermore show that the proposed approach can benefit from linearpredictors to avoid overfitting in patient outcomes predictions. To thisend, we introduce a new well-characterized clinicopathological dataset,including a retrospective collective of 374 patients, with their survivaltime and treatment information. Histomorphological clusters obtainedby our method are evaluated by training survival models. The experimen-tal results demonstrate statistically significant patient stratification, andour approach outperformed the state-of-the-art deep clustering methods.
Keywords:
Self-supervised learning · Histology · Survival analysis · Col-orectal cancer.
Colorectal cancer is the third leading cause of cancer-related mortality world-wide. Five-year survival rates are low, at 60%. Although standard histopatholog-ical of cancer reporting based on features such as staging and grading identifiespatients with a potentially worse outcome to therapy, there is still an urgentneed to improve risk stratification. Pathologists typically limit their reporting ofcolorectal cancers to approximately ten features, which they describe as singleelements in their report (e.g., depth of invasion, pT; lymph node metastasis, a r X i v : . [ ee ss . I V ] J u l C. Abbet et al. etc.). However, the histopathological (H&E) slide is a snapshot of all occurringtumor-related processes, and their interactions may hold a wealth of informationthat can be extracted to help refine prognostication. These slides can then bedigitized and used as input for computational algorithms to help support pathol-ogists in their decision-making. The distribution of tissue types within the slide,the proximity of cell types or tissue components, and their spatial arrangementthroughout the tissue can identify new patterns not previously detectable to thehuman eye alone.Few studies have performed unsupervised clustering of whole slide images(WSIs) based on patch descriptors. They have been used to address the problemof image segmentation [16] or latent space clustering [4,6]. Among DL-basedsurvival models, a recent study [13] used a supervised CNN for end-to-endclassification of tissues to predict the survival of patients with colorectal can-cer. Similar to our approach, several recent works have proposed unsupervisedmethods [17,22,14] for slide-level survival analysis. In [22], one of the first unsu-pervised approaches, DeepConvSurv has been proposed for survival predictionbased on WSIs. More recently, DeepGraphSurv [14] has been presented to learnglobal topological representations of WSI via graphs. However, they heavily re-lied on noisy compressed features from a pre-trained VGG network. Recently,self-supervised representation learning methods [8,23,2] have been proposed toutilize the pretext task for extracting generalizable features from the unlabeleddata itself. Therefore, the dataset does not need to be manually labeled by qual-ified experts to solve the pretext task.
Contributions.
In this work, we propose a new approach to learn histopatho-logical patterns through self-supervised learning within each WSI. Besides, wepresent a novel way to model the interaction between tumor-related image re-gions for survival analysis and tackle the inherent overfitting problem on tinypatient sets. To this end, we take advantage of a well-characterized, retrospectivecollective of 374 patients with clinicopathological data, including survival timeand treatment information. H&E slides were reviewed, and at least one tumorslide per patient was digitized. To accelerate research we have made our codeand trained models publicly available on GitHub. We first introduce our self-supervised image representation (Sec. 2.2) for thecancerous tissue area identified by our region of interest (RoI) detection scheme(Sec. 2.1). Then, we propose our deep clustering scheme and baseline algorithmsin Sec. 2.3 and Sec. 2.4, respectively. The clustering approach’s usefulness isassessed by conducting survival analysis (Sec. 2.5) to measure if the learnedclusters can contribute to disease prognostication. Finally, we discuss our imple-mentation setup and experimental results in Sec. 3. https://github.com/christianabbet/DnRivide-and-Rule: Self-Supervised Learning for Survival Analysis in CRC 3 Our objective is to learn discriminative patterns of unhealthy tissues of patients.However, WSI does not include information about the cancerous regions or thelocation of the tumor itself. Therefore, we seek a transfer learning approach forthe classification of histologic components of WSIs. To do so, we choose to usethe dataset presented in [12] to train a classifier to discriminate relevant areas.The dataset is composed of 100K examples of tissue from CRC separated intonine different classes. For our task, we choose to retain three classes: lympho-cytes (LYM), cancer-associated stroma (STR), and colorectal adenocarcinomaepithelium (TUM) that show the discriminative evidence for the class-of-interestand have been approved by the pathologist. Note that the presence of a largenumber of lymphocytes around the tumor is an indication of the immune reac-tion and, therefore, possibly linked to a higher survival score. We first train ourclassifier with the ResNet-18 backbone [9]. Then we use the stain normalizationapproach proposed in [15] to match the color space of the target domain andprevent the degradation of the classifier on transferred images. An example ofRoI estimation is presented in Fig. 1. Such a technique allows us to discard alarge part of the healthy tissue regions.
In this paper, we propose a self-supervised transfer colorization scheme to learn amore meaningful feature representation of the tissues and reduce the requirementfor intensive tissue labeling. Unsupervised learning methods such as autoencodertrained by minimizing reconstruction error tend to ignore the underlying struc-ture of the image as the model usually learns the distribution of the color space.To avoid this issue, we use colorization learning as a proxy task. As the inputimage, we convert the original unlabeled image through mapping function ζ ( x )to a two-channel image (hematoxylin and eosin) that describes the nuclei andamount of extracellular material, respectively. To sidestep the memory bottle-neck, we represent the WSI as a set of adjacent/overlapping tiles (image patches) { x i ∈ X } Ni =1 .We define a function ζ : X → X HE that converts the input images to their HEequivalent [15,18]. Then, we train a convolutional autoencoder (CAE) to measurethe per-pixel difference between transformed image(s) and input image(s) usingMSE loss: min φ,ψ L MSE = min φ,ψ (cid:107) x − ψ ◦ φ ◦ ζ ( x ) (cid:107) . (1)The encoder φ : X HE → Z is a convolutional neural network that maps aninput image to its latent representation Z . The decoder ψ : Z → X is an up-sampling convolutional neural network that reconstructs the input image givena latent space representation. As a result, we use a single input branch to takeinto account the tissue’s structural aspect.
C. Abbet et al.
BACKADI DEB LYMMUC MUS NORMSTRTUM LYMSTRTUM R e s n e t R e s n e t R e s n e t B l o c k s - M a x p oo l C o n v U p - s a m p l . R e l u C o n v T a nh ... C o n v ... MUSTUMSTRNORMDEBMUCLYMBACKADIOR I
Fig. 1:
The pipeline of the proposed approach.
Estimation of the regionof interest (a), learning of the embedding space (b-c), fitting of the cluster,assignment of all patient patches, and survival analysis (d-f).
The principle behind our self-supervised learning approach is to represent imagepatches based on their spatial proximity in the feature space, meaning any twoadjacent image patches (positive pairs) are more likely to be close to each otherin the feature space Z than two distant patches (negative pairs). Such character-istics are met for overlapping patches as they share similar histomorphologicalpatterns. We let S i denote the set of patches that overlap with patch i spatially.Besides, we can assume that image patches in which their relative distances aresmaller than a proximity threshold in the feature space should share commonpatterns. We define N i as the set of top- k patches that achieve the lowest cosinedistance to the embedding z i of the image patch i .Firstly, we initialize the network parameters using the self-supervised recon-struction loss in Eq. 1. Then, for each patch embedding i , we label its overlappingset of patches S i as similar patches (positive pairs). Otherwise, we consider any ivide-and-Rule: Self-Supervised Learning for Survival Analysis in CRC 5 distant patches as a negative pair, whose embeddings should be scattered. Mo-tivated by [19], we use a variant of the cross-entropy to compute the instanceloss (Eq. 2): L Divide = − (cid:88) i ∈B inst log ( (cid:88) j ∈ S i p ( j | i )) , p ( j | i ) = exp ( z (cid:62) j z i /τ ) (cid:80) Nk =1 exp ( z (cid:62) k z i /τ ) . (2)where τ ∈ ]0 ,
1] is the temperature parameter and B inst denotes the set ofsamples in the mini-batch.Secondly, we jointly optimize the training of network with reconstructionloss and a Rule loss L Rule that takes into account the similarity of differentimages in the feature space (Eq. 3). We gradually expand the vicinity of eachsample to select its neighbor samples. If samples have high relative entropy, theyare dissimilar and should be considered as individual classes, z ∈ B inst . On thecontrary, if samples have low relative entropy with their neighbors, they shouldbe tied together, z ∈ Z\B inst . In practice, the entropy acts as a threshold todecide a boundary between close and distant samples and is gradually increasedduring training such that we go from easy samples (low entropy) to hard ones(high entropy). Finally, the proposed training loss, L DnR , joins the above losseswith a weighting term λ (see Eq. 4): L Rule = − (cid:88) i ∈Z\B inst log ( (cid:88) j ∈S i ∪N i p ( j | i )) . (3)min φ,ψ L DnR = min φ,ψ L MSE + λ min φ [ L Divide + L Rule ] . (4) Dictionary Learning.
Measuring similarities between samples requires thecomputation of features in the entire dataset for each iteration. The complexitygrows as a function of the number of samples in the dataset. To avoid this, weuse a memory bank, where we keep track and update the dictionary elements asin [23,19].
As our first baseline,we leverage an inherent spatial continuity of WSIs. Spatially adjacent imagepatches (tiles) are typically more similar to each other than distant image patchesin the slide and therefore should have similar feature representation Z . Hence,we force the model to adopt such behavior by minimizing the distance betweenfeature representations of a specific tile z i and its overlapping tiles S i . Deep Cluster Assignment (DCA).
The downside of the first baseline is thatin some cases, two distant image patches may be visually similar, or there may
C. Abbet et al. exist some spatially close patches that are visually different. This introducesnoise in the optimization process. To tackle this issue, we can impose clustermembership as in [17].
Deep Embedded Clustering (DEC).
Unlike the second baseline, the objec-tive of our last baseline is not only to determine the clusters but also to learna meaningful representation of the tiles. Therefore, we consider to jointly learndeep feature representation ( φ, ψ ) and image clusters U . The optimization isperformed through the joint minimization of reconstruction loss and the KL di-vergence to gradually anneal cluster centers by fitting the model to an auxiliarydistribution (see [20] for details). The learned embedding space is assumed tobe composed of a limited number of homogeneous clusters. We fit sphericalKMeans clustering (SPKM) [21] to the learned latent space with K clusters.As a result, every patch within a patient slide will be assigned to a cluster, c k = arg min k ∈{ ...K − } SPKM( x i , µ k ).Our objective is to model the interaction between tumor-related image re-gions (neighbor patches and clusters). To do so, we define a patient descriptor h = [ h C , h T ] ∈ R N × ( K + K ) as: h Ck = p ( s = k ) and h Tj → k = p ( s = k | N ( s ) = j ) , (5)where s is a patch, h Ck denotes the probability that a patch belongs to cluster k and h Tk is the probability transition between a patch and its neighbors N ( s )(e.i. local interactions between clusters within the slide). Survival.
Survival analysis is prone to overfitting as we usually rely on a smallpatient set and a large number of features. To counter this issue, we first applyforward variable selection [10] using log partial likelihood function with tiedtimes [5], L ll , and likelihood-ratio (LR) test to identify the subset of relevantcovariates: LR = − L ll ( β new | h new ) − L ll ( β prev | h prev )] . (6)Here ( h, β ) prev and ( h, β ) new are the previous and new estimated set of co-variates, respectively. To validate that the selected covariates do not overfit thepatient data, we use leave-one-out cross-validation (LOOCV) on the dataset andpredict linear estimators [3] as ˆ η i = h i · β − i and ˆ η = (ˆ η , ˆ η , . . . ˆ η N ) to computeC-Index [7]. Here, β − i is estimated on the whole patient set minus patient i . ivide-and-Rule: Self-Supervised Learning for Survival Analysis in CRC 7 S u r v i v a l p r o b a b ili t y S h C h T h T h T Selected features for DnR (K=16)0.00.51.01.52.02.5 H a z a r d r a t i o s ( H R s ) h T h C h T h T h T h T h T h T h T h T h T h T h T HR null hypothesis
Fig. 2:
Comparison of estimated clusters representation. (a) Survival re-sults and estimated hazard ratios over LOOCV (b-c). For Kaplan-Meier esti-mators, we choose a subset of curves that do not overlap too much for bettervisualization.
Dataset.
We use a set of 660 in-house unlabeled WSIs of CRC stained withhematoxylin and eosin (H&E). The slides are linked to a total of 374 uniquepatients diagnosed with adenocarcinoma. The dataset was filtered such that weexclude cases of mucinous adenocarcinoma in which their features are consideredindependent with respect to standard adenocarcinoma. A set of histopathologicalfeatures (HFs) is associated with each patient entry (e.i. depth of invasion, pT,etc.). The survival time is defined as the period between resection of the tissue(operation) and the event occurrence (death of the patient). We denote D S as thedataset that contains slides images and D S ∩ HF as the dataset that contains bothinformation of the HFs and slides for each patient. Note that |D S ∩ HF | < |D S | as some patients have missing HFs and were excluded. Experimental Settings.
We use ResNet-18 for the encoder where the inputlayer is updated to support 2 input channels. The latent space has dimensions d = 512. The decoder is a succession of convolutional layers, ReLUs, and up-samplings (bicubic). The model was trained with the reconstruction loss L MSE for 20 epochs with early stopping. We use Adam optimizer β = (0 . , . lr = 1e −
3. Then, we add L Divide for an additional 20 epochs with λ = 1e − τ = 0 .
5. Finally, we go through 3 additional rounds using L Rule while raising the entropy threshold between each round.
C. Abbet et al.
Table 1: Multivariate survival analysis for the proposed approach and baselines. K and N feat denote the number of clusters and the number of features thatachieve statistical relevance when performing forward selection ( p < . n denotes the number of patient in each set. Brier and Concordance Index areindicators of the performance. D S ∩ HF ( n = 253) D S ( n = 374)Method K N feat Brier [1] C-Index [7] Brier C-IndexHisto. features (HFs) 8 0.2896 0 . *** - -DCS 8 3 0.2840 0 . + . ** DCA † [17] 8 2 0.2887 0 . ** . *** DEC † [20] 8 4 0.2884 0 . ** . ** DnR w/o L Divide , L Rule . * . *** DnR w/o L Rule . ** . *** DnR (ours) 8 4 0.2854 0 . * . *** DCS 16 9 0.2934 0 . . *** DCA † [17] 16 7 0.2827 0 . + . ** DEC † [20] 16 7 . . ** . *** DnR w/o L Divide , L Rule
16 5 0.2819 0 . * . *** DnR w/o L Rule
16 10 0.3006 0 . + . *** DnR (ours) 16 13 0.2849 . ** . . *** † Autoencoder is replaced with the self-supervised objective function. + p < . ∗ p < . ∗∗ p < . ∗∗∗ p < .
001 (log-rank test).
Clustered Embedding Space.
We fit SPKM with K = 8 and K = 16.The sampled tiles for each cluster are presented in Fig. 2. Clusters demonstratedifferent tumor and stroma interactions ( c , c , c , c ), inflammatory tissues( c ), muscles and large vessels ( c ), collagen and small vessels ( c ), blood andveins ( c ) or connective tissues ( c ). Some clusters do not directly represent thetype of tissue but rather the positioning information such as c , which describethe edge of the WSI. Ablation Study and Survival Analysis Results.
We build our survivalfeatures (Eq. 5) on top of the predicted clusters, and their contribution is eval-uated using Eq. 6. In Tab. 1, we observe that our model outperforms previousapproaches by a safe 5% margin on C-Index [7]. The second step of the learn-ing (DnR w/o L Rule ) tends to decrease the prediction score. Such behavior isto be expected as the additional term ( L Divide ) will scatter the data and fo-cus on self instance representation. When L Rule is then introduced, the modelcan restructure the embedding by linking similar instances. Also, we observe anaugmentation in features, N feat , that achieve statistical relevance for prognosisas we go through our learning procedure (for K = 16), which proves that ourproposed framework can model more subtle patches interactions. We show in ivide-and-Rule: Self-Supervised Learning for Survival Analysis in CRC 9 Fig. 2 the distribution of hazard ratios for all models (from LOOCV) and theKaplan-Meier estimator [11] for a subset of the selected covariates. In the bestcase, we identify 13 features that contribute to the survival outcome of the pa-tients. For example, the interaction between blood vessels and tumor stroma( h T → ) is linked to a lower survival outcome. A similar trend observed in therelation between tumor stroma and connective tissues ( h T → ). We have proposed a self-supervised learning method that offers a new approachto learn histopathological patterns within cancerous tissue regions. Our modelpresents a novel way to model the interactions between tumor-related image re-gions and tackles the inherent overfitting problem to predict patient outcome.Our method surpasses all previous baseline methods and histopathological fea-tures and achieves state-of-the-art results, i.e., in C-Index without any data-specific annotation. Ablation studies also show the importance of different com-ponents of our method and the relevance of combining them. We envision thebroad application of our approach for clinical prognostic stratification improve-ment.
References
1. Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthlyweather review (1), 1–3 (1950)2. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con-trastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020)3. Dai, B., Breheny, P.: Cross validation approaches for penalized cox regression.arXiv preprint arXiv:1905.10432 (2019)4. Dercksen, K., Bulten, W., Litjens, G.: Dealing with label scarcity in compu-tational pathology: A use case in prostate cancer classification. arXiv preprintarXiv:1905.06820 (2019)5. Efron, B.: The efficiency of cox’s likelihood function for censored data. Journal ofthe American statistical Association (359), 557–565 (1977)6. Fouad, S., Randell, D., Galton, A., Mehanna, H., Landini, G.: Unsupervised mor-phological segmentation of tissue compartments in histopathological images. PloSone (11), e0188717 (2017)7. Harrell Jr, F.E., Lee, K.L., Mark, D.B.: Multivariable prognostic models: issuesin developing models, evaluating assumptions and adequacy, and measuring andreducing errors. Statistics in medicine (4), 361–387 (1996)8. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervisedvisual representation learning. In: Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition. pp. 9729–9738 (2020)9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016)10. Hosmer Jr, D.W., Lemeshow, S., May, S.: Applied survival analysis: regressionmodeling of time-to-event data, vol. 618. John Wiley & Sons (2011)0 C. Abbet et al.11. Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations.Journal of the American statistical association (282), 457–481 (1958)12. Kather, J.N., Halama, N., Marx, A.: 100,000 histological images of human colorec-tal cancer and healthy tissue (Apr 2018). https://doi.org/10.5281/zenodo.1214456, https://doi.org/10.5281/zenodo.1214456
13. Kather, J.N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C.A.,Gaiser, T., Marx, A., Valous, N.A., Ferber, D., et al.: Predicting survival fromcolorectal cancer histology slides using deep learning: A retrospective multicenterstudy. PLoS medicine (1), e1002730 (2019)14. Li, R., Yao, J., Zhu, X., Li, Y., Huang, J.: Graph cnn for survival analysis onwhole slide pathological images. In: International Conference on Medical ImageComputing and Computer-Assisted Intervention. pp. 174–182. Springer (2018)15. Macenko, M., Niethammer, M., Marron, J.S., Borland, D., Woosley, J.T., Xiao-jun Guan, Schmitt, C., Thomas, N.E.: A method for normalizing histology slidesfor quantitative analysis. In: 2009 IEEE International Symposium on BiomedicalImaging: From Nano to Macro. pp. 1107–1110 (June 2009)16. Moriya, T., Roth, H.R., Nakamura, S., Oda, H., Nagara, K., Oda, M., Mori, K.:Unsupervised pathology image segmentation using representation learning withspherical k-means. In: Medical Imaging 2018: Digital Pathology. vol. 10581, p.1058111. International Society for Optics and Photonics (2018)17. Muhammad, H., Sigel, C.S., Campanella, G., Boerner, T., Pak, L.M., B¨uttner,S., IJzermans, J.N., Koerkamp, B.G., Doukas, M., Jarnagin, W.R., et al.: Unsu-pervised subtyping of cholangiocarcinoma using a deep clustering convolutionalautoencoder. In: International Conference on Medical Image Computing andComputer-Assisted Intervention. pp. 604–612. Springer (2019)18. Vahadane, A., Peng, T., Sethi, A., Albarqouni, S., Wang, L., Baust, M., Steiger, K.,Schlitter, A.M., Esposito, I., Navab, N.: Structure-preserving color normalizationand sparse stain separation for histological images. IEEE Transactions on MedicalImaging (8), 1962–1971 (Aug 2016)19. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition. pp. 3733–3742 (2018)20. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clusteringanalysis. In: International conference on machine learning. pp. 478–487 (2016)21. Zhong, S.: Efficient online spherical k-means clustering. In: Proceedings. 2005 IEEEInternational Joint Conference on Neural Networks, 2005. vol. 5, pp. 3180–3185.IEEE (2005)22. Zhu, X., Yao, J., Zhu, F., Huang, J.: Wsisa: Making survival prediction from wholeslide histopathological images. In: Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition. pp. 7234–7242 (2017)23. Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learningof visual embeddings. In: Proceedings of the IEEE International Conference onComputer Vision. pp. 6002–6012 (2019)ivide-and-Rule: Self-Supervised Learning for Survival Analysis in CRC 11 A Additional Figures and Results
Fig. 3: Schematic of proposed lower-dimensional embedding representations.From left to right - DCS, DCA, DEC.Table 2: Multivariate survival analysis comparison between self-supervised whentraining RGB → RGB and HE → RGB. We can observe that the model performsbetter when we impose the color conversion from the HE space. D S ∩ HF ( n = 253) D S ( n = 374)Method K N feat Brier C-Index Brier C-IndexMSE
RGB → RGB . ** . * MSE HE → RGB . * . *** MSE
RGB → RGB
16 0 0.2893 0 . . HE → RGB
16 5 . . * . . ***+ p < . ∗ p < . ∗∗ p < . ∗∗∗ p < .
001 (log-rank test).2 C. Abbet et al.
Table 3: Hazard ratios (HRs) with confidence intervals (CIs) based onhistopathological features for n = 374 patients with adenocarcinoma (241 rightcensored samples). Results are given for the univariate Cox model. If entry isnon-binary, we apply one-vs.-all test. Characteristics Subcategories HR (95% CI) p -valueGender ( n = 374) Male ( n = 220)Female ( n = 154) 0.90 (0.64 - 1.28) 0.5139T category ( n = 373) pT1-2 ( n = 78)pT3-4 ( n = 295) 1.67 (1.00 - 2.79) 0.0479 * N category ( n = 366) pN0 ( n = 186)pN1-2 ( n = 180) 2.65 (1.83 - 3.82) < * M category ( n = 374) pM0 ( n = 326)pM1 ( n = 48) 1.76 (1.11 - 2.78) 0.0162 * Tumor grade ( n = 369) G1-2 ( n = 324)G3 ( n = 45) 1.54 (0.96 - 2.46) 0.0716Lymphatic invasion ( n = 351) L0 ( n = 141)L1 ( n = 210) 3.35 (2.14 - 5.23) < * Vascular invasion ( n = 352) V0 ( n = 193)V1-2 ( n = 159) 1.52 (1.07 - 2.16) 0.0198 * Tumor pushing ( n = 276) <
25% ( n = 109) ≥
25% ( n = 167) 0.59 (0.40 - 0.87) 0.0077 * Tumor location ( n = 352) Left ( n = 164) 1.28 (0.90 - 1.83) 0.1702Rectum ( n = 58) 0.83 (0.50 - 1.36) 0.4535Right ( n = 130) 0.85 (0.58 - 1.24) 0.3967TNM stage ( n = 372) I ( n = 64) 0.45 (0.24 - 0.83) 0.0109 * II ( n = 114) 0.50 (0.33 - 0.75) 0.0010 * III ( n = 118) 1.57 (1.11 - 2.22) 0.0102 * IV ( n = 76) 2.03 (1.39 - 2.96) 0.0003 ** Indicates statistical relevance ( α = 0 ..