[PDF] Deep Learning based NAS Score and Fibrosis Stage Prediction from CT and Pathology Data

Abstract

Non-Alcoholic Fatty Liver Disease (NAFLD) is becoming increasingly prevalent in the world population. Without diagnosis at the right time, NAFLD can lead to non-alcoholic steatohepatitis (NASH) and subsequent liver damage. The diagnosis and treatment of NAFLD depend on the NAFLD activity score (NAS) and the liver fibrosis stage, which are usually evaluated from liver biopsies by pathologists. In this work, we propose a novel method to automatically predict NAS score and fibrosis stage from CT data that is non-invasive and inexpensive to obtain compared with liver biopsy. We also present a method to combine the information from CT and H\&E stained pathology data to improve the performance of NAS score and fibrosis stage prediction, when both types of data are available. This is of great value to assist the pathologists in computer-aided diagnosis process. Experiments on a 30-patient dataset illustrate the effectiveness of our method.

Full PDF

DDeep Learning based NAS Score and Fibrosis StagePrediction from CT and Pathology Data

Ananya Jana*

Computer Science Dept.Rutgers University

New Brunswick, [email protected]

Hui Qu*

Computer Science Dept.Rutgers University

New Brunswick, [email protected]

Puru Rattan

Division of Gastroenterology and HepatologyRutgers Robert Wood Johnson Medical School

New Brunswick, [email protected]

Carlos D. Minacapelli

Division of Gastroenterology and HepatologyRutgers Robert Wood Johnson Medical School

New Brunswick, [email protected]

Vinod Rustgi

Division of Gastroenterology and HepatologyRutgers Robert Wood Johnson Medical School

New Brunswick, [email protected]

Dimitris Metaxas

Computer Science Dept.Rutgers University

New Brunswick, [email protected]

Abstract —Non-Alcoholic Fatty Liver Disease (NAFLD) is be-coming increasingly prevalent in the world population. Withoutdiagnosis at the right time, NAFLD can lead to non-alcoholicsteatohepatitis (NASH) and subsequent liver damage. The diagno-sis and treatment of NAFLD depend on the NAFLD activity score(NAS) and the liver ﬁbrosis stage, which are usually evaluatedfrom liver biopsies by pathologists. In this work, we propose anovel method to automatically predict NAS score and ﬁbrosisstage from CT data that is non-invasive and inexpensive toobtain compared with liver biopsy. We also present a method tocombine the information from CT and H&E stained pathologydata to improve the performance of NAS score and ﬁbrosisstage prediction, when both types of data are available. Thisis of great value to assist the pathologists in computer-aideddiagnosis process. Experiments on a 30-patient dataset illustratethe effectiveness of our method.

Index Terms —NAFLD activity score, Liver ﬁbrosis, Deeplearning.

I. I

NTRODUCTION

Nonalcoholic fatty liver disease (NAFLD) is now the mostcommon form of chronic liver disease in the world. The preva-lence of this disease has been estimated to be over 25% in thegeneral worldwide population [1]. This disease encompassesa spectrum of changes in the liver related to fat deposition.The changes range from non-alcoholic fatty liver (NAFL) withsimple steatosis ( > liver fat content) with minimal or noinﬂammation, to a progressive form of the disease called non-alcoholic steatohepatitis (NASH). NASH is characterized bysteatosis, inﬂammation and hepatocellular injury, with eventualprogression to various stages of ﬁbrosis [2]. As the progressiveform of the disease, NASH is associated with increased mor-bidity and mortality; therefore, determination of this disease isimportant during diagnosis. Liver biopsy is the gold standardfor determining the ﬁbrosis stage and diagnosing NASH fromNAFLD activity score (NAS) to differentiate it from simplesteatosis. However, the cost of an invasive procedure such as * equal contribution liver biopsy combined with the possibility of complicationssuch as bleeding, infections, and rarely death, rule out itsroutine use in clinical practice [3]. In addition, intra-observervariance along with sampling variance add signiﬁcant error tothe manual interpretation of liver biopsy histopathology [4],[5]. Therefore, it is of great value to develop computationalmethods for NAS score prediction and ﬁbrosis staging fromnon-invasive imaging modality data. Besides, when the biopsydata is available, it is meaningful to build computationalmodels to predict scores from the biopsy data, which canassist pathologists in the diagnosis process by saving time andreducing the unreliability of less-experienced doctors.Recently, deep learning has been immensely successful inimage analytics tasks such as image classiﬁcation [6] andsemantic segmentation [7]. The ability to learn meaningfulfeatures automatically makes deep learning superior to tradi-tional machine learning approaches. There have been manydeep learning based works on ﬁbrosis staging and NASscore prediction. These methods cover various types of data,such as CT scans [8]–[10], Magnetic Resonance Imaging(MRI) [11]–[13], pathology images from liver biopsy [14]–[16] and others [17], [18]. CT and MRI scans are non-invasive,thus it is beneﬁcial for patients if one can obtain accurateﬁbrosis stage and NAS scores from these scans. Comparedwith CT and MRI data, pathology images are more informativeabout the disease, and are used as gold standard for diagnosis.For ﬁbrosis staging, Yasaka et al. [8] explored the use of adeep convolutional neural network (CNN) for staging liverﬁbrosis on magniﬁed CT images of the liver surface. Theyconcatenated the age and sex information of the subject atone of the fully connected layers. The scores found from themodel were moderately correlated with the liver ﬁbrosis stageas found from histopathology. Jin et al. [9] proposed a ﬁbrosisprediction mechanism where the liver region in the image wasﬁrst segmented using a segmentation network and then clas-siﬁed using a CNN. Yu et al. [15] investigated and compared a r X i v : . [ ee ss . I V ] S e p esNet-18(without fc) fc fc fcavgpoolSegmentationnetwork (U-net) Raw CT slices Hounsﬁeld windowed images Liver MasksLiver (segmented) ... ... ......

ResNet-18(without fc)ResNet-18(without fc) ... (a) Feature extractor ...

Features (b) Classiﬁer prediction

Fig. 1. CT data preprocessing and Baseline network structure. the performance of liver ﬁbrosis stage classiﬁcation usingdifferent deep learning algorithms and other machine learningalgorithms. Fu et al. [14] explored ﬁbrosis identiﬁcation withthe help of image segmentation. But their method didn’tpredict the exact ﬁbrosis stage. Heinemann et al. [16] trained aCNN to predict ﬁbrosis by using histology images at differentscales. For NAS score prediction, pathology data is more oftenused. Heinemann et al. [16] trained a CNN to predict thethree individual NAS scores - NAS steatosis, NAS lobularinﬂammation, NAS ballooning. NAS score prediction from CTimages has rarely been explored. Hence, an important questionis: can we predict NAS scores directly from CT images?Besides, when the paired pathology data is also available,can we utilize both CT and pathology data to improve theprediction performance? In practice, the diagnoses using CTscans and pathology slides are performed by radiologists andpathologists, respectively. If we can combine the informationfrom both types of doctors (i.e., train a model that collectinformation from both types of data), it could be beneﬁcial tothe ﬁnal results.In this work, we propose to predict the individual NASscores (NAS steatosis, NAS lobular inﬂammation, NAS bal-looning) directly from CT images using deep learning. The3D CT volumes are divided into 2D slices, in which the liverportion is segmented. Features are extracted from these slicesby a pretrained ResNet [19], and then aggregated by a classiﬁerfor prediction. Besides, we also combine the informationfrom CT scans and pathology images to further improvethe performance, considering that they may contain differentlevels of information about the disease. The CT and pathology images are fed into two separate networks to produce featuresthat are related to ﬁbrosis or NAS. Then the two types offeatures are fused to get the ﬁnal classiﬁcation result. Weexplore different feature fusion strategies and loss functions.The proposed method can achieve a good performance on a30-patient dataset that contains paired CT and pathology data.In summary, our main contributions include: • We propose a novel method to predict NAS scoresdirectly from CT images. • We design a network to predict ﬁbrosis stage and NASscore from CT and pathology images, and we achievebetter performance than any single type of data. • As far as we know, we are the ﬁrst to perform deeplearning based NAS score prediction using CT imagesand to use multiple-modality data for ﬁbrosis staging andNAS score prediction.II. M

ETHOD

The overview of our proposed method is shown in Fig. 1and Fig. 2. It consists of two main steps: (1) data preprocessingon CT scans and pathology slides, (2) network training usingthe preprocessed data.

A. Data preprocessing

The raw CT and pathology data cannot be directly fed intoCNNs for training. We perform preprocessing for each typeof data.

1) 2D liver segmentation (CT data):

The raw CT scans are3D volumetric data. To be compatible with the 2D pathologyimages during training, we extract 2D slices from each 3D a) Baseline network (c) Late-fusion Single-loss(b) Mid-fusion Single-loss (d) Mid-fusion Multi-loss (e) Late-fusion Multi-loss ℎ  ℎ    ℎ ℎ  ℎ    ℎ Feature extractorClassiﬁerEnd classiﬁer (fc)CT featuresPathology features

Fig. 2. Baseline network and four different joint networks. scan. Hounsﬁeld windowing with the range [ − , isperformed on the 2D CT slices to increase the image contrast.As we only focus on the liver part in the CT images, weperform 2D liver segmentation to extract liver regions beforethe ﬁbrosis stage and NAS score prediction tasks. All otherpixels are set to zero to avoid the negative effect of otherorgans (shown in Fig. 1). The segmentation model is ﬁrstpretrained on the ISBI2012 dataset [20]. Then we annotatea small part of the 2D slices in our dataset and ﬁne-tunethe segmentation model with the annotated images, utilizingtransfer learning. We had made a train test split on theannotated liver dataset. The dice score of the segmentationnetwork was 0.9421 on the test liver dataset. After liversegmentation, the slices with an average pixel value belowa threshold (5 in our experiments) are discarded. This ensuresthat slices containing very small liver portions and those sliceswhich do not contain liver portion at all are discarded. Thenumber of CT slices after preprocessing is 2595.

2) Patch generation (pathology data):

The original pathol-ogy whole slide images (WSIs) have very high resolution(10,000 to > ×

224 from each WSI at × magniﬁcation.Patches that have a mean pixel value greater than 220 areconsidered as background and are removed. The number ofpathology patches after preprocessing is 7775. B. Baseline network for CT images

This network is designed to predict the NAS scores basedon CT data. The architecture, shown in Fig. 1 and Fig. 2(a),consists of a feature extractor and a classiﬁer. The featureextractor aims to obtain a feature vector that represents theinput 2D slice. We use the ResNet-18 [19] (without the ﬁnalfully-connected layers) as our feature extractor because of tworeasons - (1) the more complex models would overﬁt ourtraining data which is limited to 30 patients, (2) the trainingis done at patient level, i.e., all the pathology patches andCT 2D slices are fed to the network for training, making ithard to use larger models due to GPU memory limitations.The classiﬁer consists of fully-connected layers and an averagepooling layer. The ﬁrst two fc layers have 512 and 128 neuronsrespectively, which further process the extracted features of allinput CT slices of a patient. The subsequent average poolinglayer is used to obtain the global feature vector from localfeature vectors of 2D slices. The ﬁnal fc layer predicts theclassiﬁcation result from the global feature. Each individualscore (ﬁbrosis, NAS steatosis, NAS lobular, NAS ballooning)is trained with one network.The baseline network can be also used to predict NAS scoresand ﬁbrosis stage from histpathology images by just replacingthe input 2D slices with the extracted patches from a WSI.

C. Joint network for both data

The structure of the joint network for both CT and pathologydata are shown in Fig. 2. There are two separate baselinenetworks and a joint classiﬁer. The two baseline networks takeas input the CT images and the pathology images, respectively.The joint classiﬁer takes as input the fused features from the

ABLE IO

RIGINAL AND COMBINED F IBROSIS STAGE AND

NAS

SCORE DISTRIBUTIONS IN THE

SUBJECT DATASET .Fibrosis NAS steatosis NAS lobular NAS ballooningScore 0 1 2 3 3.5 4 0 1 2 3 0 1 2 3 0 1 2Original 7 6 4 3 2 8 2 9 11 8 9 10 8 3 8 11 11Combined 7 10 13 11 19 9 10 11 8 11 11 two baseline networks, and outputs the prediction. We explorethe effects of two different feature fusion strategies and twotypes of loss functions, resulting in four different architectures.

1) Mid-fusion vs. late-fusion:

In the mid-fusion architec-tures (Fig. 2(b) and Fig. 2(d)), the outputs of the two baselinenetworks’ feature extractors are concatenated and fed to thejoint classiﬁer, which has the same structure as the classiﬁer inthe baseline network. In this case, local features of all 2D CTslices and pathology patches are stacked together to producea global feature representation of both types of data.In the late-fusion architectures (Fig. 2(c) and Fig. 2(e)), thefusion is done after the average pooling layers of the twobaseline networks. That’s to say, we concatenate the globalfeatures (1 ×

128 each) obtained from both types of data toform a single feature (1 ×

2) Single-loss vs. multi-loss:

For the single-loss architec-tures (Fig. 2(b) and Fig. 2(c)), we only compute the loss withregard to the output of the joint classiﬁer, i.e., L = L joint .It only cares about whether the prediction from both types ofdata is correct or not.For the multi-loss architectures (Fig. 2(d) and Fig. 2(e)), theﬁnal loss L is the summation of the joint loss and losses fromeach individual baseline network, i.e., L = L joint + L CT + L patho . It requires all three classiﬁers (CT, pathology, joint)to make correct predictions. During testing, the output fromthe joint classiﬁer is treated as the ﬁnal prediction.III. E XPERIMENTS

A. Dataset and evaluation metrics1) Dataset:

The dataset used in our experiments consistsof CT volumes and H&E stained histopathology whole slideimages of 30 subjects, in particular one CT volume and oneslide image per subject. All data are private data from ourcollaborative partner and are de-identiﬁed. The ground-truthﬁbrosis stage and NAS scores are provided by a pathologistwith manual examination on the WSIs. We randomly splitthe 30 patients into three groups and perform 3-fold crossvalidation in all experiments.The ﬁbrosis stage ranges from 0 to 4. The total NAS scoreis made up of three individual scores - NAS steatosis, NASlobular inﬂammation and NAS ballooning, which have 4, 4 and3 different values, respectively. The original score distributionin the 30 subjects is shown in the Table I.

2) Label generation:

The original stage/scores cannot beused for training because the number of patients in somestages/scores are too small (e.g., 3 patients in ﬁbrosis stage 3, 2 patients in NAS steatosis score 0). Based on our collaboratingclinical doctor’s input, we divide the ﬁbrosis stages into threeclasses, NAS steatosis scores into two classes, NAS lobularinto three classes and NAS ballooning into three classes.The distribution of the new classes are shown in Table 1(‘Combined’ row). In each combined class, there are relativelyenough data for training.

3) Evaluation metrics:

We use the Area Under ROC Curve(AUC) to evaluate the classiﬁcation performance in our ex-periments. The ROC curve is created by plotting the truepositive rate (TPR) against the false positive rate (FPR) atvarious threshold settings. AUC tells how much a modelis capable of distinguishing between two classes. We alsocompute the 95% conﬁdence interval for AUC using thebootstrapping method with 1000 iterations. For the two-classproblem (NAS steatosis), AUC values are averaged over thethree folds to give the mean AUC. For the three-class problems(NAS lobular, NAS ballooning and ﬁbrosis), the AUC valueof each individual class is computed by treating the other twoclasses as one class. The AUC of each fold is the average ofAUC values of the three classes. The ﬁnal mean AUC of anexperiment is the average of three folds.

B. Implementation Details

We implement our method using the PyTorch [21] library.During training, the ResNet-18 without fc layer (the featureextractor) is initialized with pretrained weights from Ima-geNet [22]. To avoid overﬁtting, only the last residual blockis updated. For all experiments, the models are trained withthe Adam optimizer for 30 epochs. The learning rate, batchsize and weight decay are 0.0001, 4 and 0.01, respectively.The best model of the 30 epochs is selected for test. We haveused the cross entropy loss in our experiments.

C. Results and discussion

For each of the four prediction experiments (NAS steatosis,NAS lobular, NAS ballooning, ﬁbrosis), we report the resultsusing single type of data (CT or pathology), and both typesof data on the four different architectures: mid-fusion withsingle loss, late-fusion with single loss, mid-fusion with multi-loss, late-fusion with multi-loss. The results are shown inTable II and Fig. 3. We also compare our method with alatest related work [16]. In that work, Inception-V3 (pretrainedon ImageNet) backbone network is used to predict ﬁbrosisand NASH score from rat/mouse liver pathology images. Theresults of their method on our data are shown in Table II.

1) Comparison of results using single modality data:

When using only CT data, our baseline method can predictthe ﬁbrosis stage with 76.35 AUC score, but the results of

ABLE IIM

EAN

AUC

VALUES OF FIBROSIS STAGE AND

NAS

SCORES PREDICTION USING DIFFERENT METHODS (T HREE - FOLD CROSS VALIDATION ).Method Fibrosis NAS steatosis NAS lobular NAS ballooningCT 76.35 ± ± ± ± ± ± ± ± ± ± ± ± Mid-fusion Multi-loss 85.74 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± (a) Fibrosis (b) NAS steatosis(c) NAS lobular (d) NAS ballooning Fig. 3. AUC values and 95% conﬁdence intervals for different folds.

NAS scores predictions are not good enough (AUC 61.82-67.88). The lobular imﬂammation and the ballooning areboth microscopic structures, which are often analyzed in thepathology images. Therefore, using H&E slides demonstratea higher performance score than using CT. Actually, H&Eoutperforms CT on all four tasks, because H&E images havemore details about the cell and tissue structure.

2) Comparison of results using multi-modality data:

Themulti-modality based architecture (Mid-fusion with single-loss) achieves better results than any of the single modality(CT or H&E) based architecture. The mean AUC values onﬁbrosis, NAS lobular, NAS ballooning are better than thosein single modality. The NAS steatosis result is similar asthat of H&E because they are pretty high, and the roomfor improvement is limited. As for NAS lobular and NASballooning, the fused model can learn some useful features from CT data, although it is hard for diagnostic radiologists toﬁnd them in CT data. These results prove that the combinationof CT and pathology data is indeed beneﬁcial to train a morerobust model. a) Mid-fusion VS. late-fusion:

The results using mid-fusion strategy are better than those of late-fusion for bothtypes of loss functions. Mid-fusion gathers the local featuresfrom each type of data while late-fusion combines the globalfeatures. The local features have more information about theimages than global features, which could be the reason thatmid-fusion works better than late-fusion. b) Single-loss VS. multi-loss:

Whatever the feature fu-sion strategy is, single-loss achieves higher AUC value thanmulti-loss, indicating that the learning of single modalitybranches may have negative effects on the joint classiﬁer. Theoverall performance may be improved if we use a more so-histicated method to adjust the weights of different branches.This is a topic of future work.

3) Comparison to the latest related work:

As we can seefrom the table, Heinemann’s [16] work performs worse onour dataset for both ﬁbrosis and NASH scoring. One of thereasons for the poor performance of Heinemann’s work on ourdataset could be due to the slightly different scoring procedureused in their work. Heinemann’s work gives individual classlabels to each patch from a single WSI slide which meanstwo different patches from the same WSI slide can potentiallyget two completely different class labels. Secondly, the ruleto determine the ﬁbrosis and NASH score of a patch alsovaries slightly from the macroscopic scoring. An example isthe scoring of NASH ballooning where scores are given basedon the rule - if the patch does not have a ballooning cellthen its class label is 0, else the class label is 1 which isa little different from the macroscopic scoring where classesare assigned based on the presence of (1) None (2) Few or(3) Many ballooning cells. In our work, we have patient levelﬁbrosis and NASH scores. All the patches from a patient havethe same ﬁbrosis and NASH score as the patient’s WSI slide.IV. C

ONCLUSION AND F UTURE WORK

This work explores the use of deep learning techniquesto predict NAS scores directly from CT images, and tocombine data from two different modalities for improvingthe estimation of ﬁbrosis stage and the prediction of NASscore. Based on our positive results, in the future we plan toexplore how this method scales when using more than twocategories of data and also how it performs with other typesof data, e.g., ultra sonograpy, MRI, MRE. It will be betterfor clinical applications if we can increase performance usingseveral types of non-invasive imaging data, such as CT andMRI. We also plan future research on how the weights ofdifferent losses will affect the ﬁnal prediction result.R

EFERENCES[1] Sumeet K Asrani, Harshad Devarbhavi, John Eaton, and Patrick SKamath. Burden of liver diseases in the world.

Journal of hepatology ,70(1):151–171, 2019.[2] Pierre Bedossa. Pathology of non-alcoholic fatty liver disease.

LiverInternational , 37:85–89, 2017.[3] Naga Chalasani, Zobair Younossi, Joel E Lavine, Michael Charlton,Kenneth Cusi, Mary Rinella, Stephen A Harrison, Elizabeth M Brunt,and Arun J Sanyal. The diagnosis and management of nonalcoholicfatty liver disease: practice guidance from the american association forthe study of liver diseases.

Hepatology , 67(1):328–357, 2018.[4] Vlad Ratziu, Fr´ed´eric Charlotte, Agn`es Heurtier, Sophie Gombert,Philippe Giral, Eric Bruckert, Andr´e Grimaldi, Fr´ed´erique Capron,Thierry Poynard, LIDO Study Group, et al. Sampling variability ofliver biopsy in nonalcoholic fatty liver disease.

Gastroenterology ,128(7):1898–1906, 2005.[5] Omid Pournik, Seyed Moayed Alavian, Leila Ghalichi, BahramSeiﬁzarei, Leila Mehrnoush, Azam Aslani, Soghra Anjarani, and SaeidEslami. Inter-observer and intra-observer agreement in pathologicalevaluation of non-alcoholic fatty liver disease suspected liver biopsies.

Hepatitis monthly , 14(1), 2014.[6] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, SanjeevSatheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,Michael Bernstein, et al. Imagenet large scale visual recognitionchallenge.

International journal of computer vision , 115(3):211–252,2015. [7] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld,Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, andBernt Schiele. The cityscapes dataset for semantic urban scene under-standing. In

Proceedings of the IEEE conference on computer visionand pattern recognition , pages 3213–3223, 2016.[8] Koichiro Yasaka, Hiroyuki Akai, Akira Kunimatsu, Osamu Abe, andShigeru Kiryu. Deep learning for staging liver ﬁbrosis on ct: a pilotstudy.

European radiology , 28(11):4578–4585, 2018.[9] Kyu Jin Choi, Jong Keon Jang, Seung Soo Lee, Yu Sub Sung, Woo HyunShim, Ho Sung Kim, Jessica Yun, Jin-Young Choi, Yedaun Lee, Bo-Kyeong Kang, et al. Development and validation of a deep learningsystem for staging liver ﬁbrosis by using contrast agent–enhanced ctimages in the liver.

Radiology , 289(3):688–697, 2018.[10] Verena C Obmann, Nando Mertineit, Annalisa Berzigotti, ChristinaMarx, Lukas Ebner, Roland Kreis, Peter Vermathen, Johannes T Hever-hagen, Andreas Christe, and Adrian T Huber. Ct predicts liver ﬁbrosis:Prospective evaluation of morphology-and attenuation-based quantitativescores in routine portal venous abdominal scans.

PloS one , 13(7), 2018.[11] Koichiro Yasaka, Hiroyuki Akai, Akira Kunimatsu, Osamu Abe, andShigeru Kiryu. Liver ﬁbrosis: deep convolutional neural networkfor staging by using gadoxetic acid–enhanced hepatobiliary phase mrimages.

Radiology , 287(1):146–155, 2017.[12] Sudhakar K Venkatesh, Meng Yin, Naoki Takahashi, James F Glockner,Jayant A Talwalkar, and Richard L Ehman. Non-invasive detectionof liver ﬁbrosis: Mr imaging features vs. mr elastography.

Abdominalimaging , 40(4):766–775, 2015.[13] Rahul Rustogi, Jeanne Horowitz, Carla Harmath, Yi Wang, HamidChalian, Daniel R Ganger, Zongming E Chen, Bradley D Bolster Jr,Saurabh Shah, and Frank H Miller. Accuracy of mr elastography andanatomic mr imaging features in the diagnosis of severe hepatic ﬁbrosisand cirrhosis.

Journal of Magnetic Resonance Imaging , 35(6):1356–1364, 2012.[14] Xiaohang Fu, Tong Liu, Zhaohan Xiong, Bruce H Smaill, Martin KStiles, and Jichao Zhao. Segmentation of histological images and ﬁbrosisidentiﬁcation with a convolutional neural network.

Computers in biologyand medicine , 98:147–158, 2018.[15] Yang Yu, Jiahao Wang, Chan Way Ng, Yukun Ma, Shupei Mo, ElizaLi Shan Fong, Jiangwa Xing, Ziwei Song, Yufei Xie, Ke Si, et al. Deeplearning enables automated scoring of liver ﬁbrosis stages.

Scientiﬁcreports , 8(1):16016, 2018.[16] Fabian Heinemann, Gerald Birk, and Birgit Stierstorfer. Deep learningenables pathologist-like scoring of nash models.

Scientiﬁc Reports ,9(1):1–10, 2019.[17] Alex Treacher, Daniel Beauchamp, Bilal Quadri, David Fetzer, AbhinavVij, Takeshi Yokoo, and Albert Montillo. Deep learning convolutionalneural networks for the estimation of liver ﬁbrosis severity from ultra-sound texture. In

Medical Imaging 2019: Computer-Aided Diagnosis ,volume 10950, page 109503E. International Society for Optics andPhotonics, 2019.[18] Ilias Gatos, Stavros Tsantis, Stavros Spiliopoulos, Dimitris Karnabatidis,Ioannis Theotokas, Pavlos Zoumpoulis, Thanasis Loupas, John D Hazle,and George C Kagadis. Temporal stability assessment in shear waveelasticity images validated by deep learning neural network for chronicliver disease ﬁbrosis stage assessment.

Medical physics , 46(5):2298–2309, 2019.[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deepresidual learning for image recognition. In

Proceedings of the IEEEconference on computer vision and pattern recognition , pages 770–778,2016.[20] Albert Cardona, Stephan Saalfeld, Stephan Preibisch, Benjamin Schmid,Anchi Cheng, Jim Pulokas, Pavel Tomancak, and Volker Hartenstein. Anintegrated micro-and macroarchitectural analysis of the drosophila brainby computer-assisted serial section electron microscopy.

PLoS biology ,8(10), 2010.[21] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Brad-bury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein,Luca Antiga, et al. Pytorch: An imperative style, high-performance deeplearning library. In

Advances in Neural Information Processing Systems ,pages 8024–8035, 2019.[22] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEEconference on computer vision and pattern recognition